Question
Many “large” models of this faculty predict the most likely token given some context window. For 10 points each:
[10e] Name this faculty modeled by LLaMA (“llama”) and GPT-4. Computational tasks associated with this faculty include part-of-speech tagging and machine translation.
ANSWER: natural language [accept natural language processing or NLP]
[10h] Scaling-up transformer context lengths is limited by the quadratic memory cost of this operation. Its usefulness for modeling sequential data led Vaswani et al. to declare that this operation “is all you need.”
ANSWER: attention [accept self-attention; accept multi-head attention; accept “attention is all you need”]
[10m] Words passed to a transformer are embedded into these objects, whose usage is exemplified by “king minus man plus woman equals queen.” The spaces of these mathematical objects are closed under addition and scalar multiplication.
ANSWER: word vectors [or word vector embeddings; accept vector spaces]
<Other Science>
Summary
2023 ACF Winter @ Columbia | 11/11/2023 | Y | 9 | 15.56 | 78% | 78% | 0% |
Data
Columbia A | Penn B | 10 | 0 | 10 | 20 |
Columbia B | Columbia C | 10 | 0 | 0 | 10 |
Haverford | Cornell C | 10 | 0 | 10 | 20 |
Princeton A | NYU B | 0 | 0 | 10 | 10 |
Yale A | Penn A | 10 | 0 | 0 | 10 |
Vassar | Princeton B | 10 | 0 | 10 | 20 |
Rutgers A | Rowan A | 10 | 0 | 10 | 20 |
NYU A | Yale B | 10 | 0 | 10 | 20 |
Rutgers B | Yale C | 0 | 0 | 10 | 10 |