The number of these things is the numerator of a ratio that is empirically optimal at about 20, the Chinchilla point. Computations involving the Q, K, and V vectors generated from these things are stored in a KV cache, which improves a metric called the “time to the first” one of these things. The process of converting an input into these things is commonly done using byte-pair encoding. During the prefill stage, IDs corresponding to these things are embedded into a vector that is then passed through multiple transformer layers. The number of these things that can be processed at once is the context window. The cost of AI compute is often reported as the price per one million of these things. For 10 points, name these units of text that large language models split queries into and try to predict the next one of. ■END■
ANSWER: tokens [accept tokenization or tokenizers; accept time to first token; accept next token prediction or answers referring to trying to predict the next token; prompt on token-to-parameter ratio or tokens-per-parameter ratio; prompt on words or subwords or symbols or characters by asking “those are converted into what things?”]
<GC, Other Science>
= Average correct buzz position