Question

Many “large” models of this faculty predict the most likely token given some context window. For 10 points each:
[10e] Name this faculty modeled by LLaMA (“llama”) and GPT-4. Computational tasks associated with this faculty include part-of-speech tagging and machine translation.
ANSWER: natural language [accept natural language processing or NLP]
[10h] Scaling-up transformer context lengths is limited by the quadratic memory cost of this operation. Its usefulness for modeling sequential data led Vaswani et al. to declare that this operation “is all you need.”
ANSWER: attention [accept self-attention; accept multi-head attention; accept “attention is all you need”]
[10m] Words passed to a transformer are embedded into these objects, whose usage is exemplified by “king minus man plus woman equals queen.” The spaces of these mathematical objects are closed under addition and scalar multiplication.
ANSWER: word vectors [or word vector embeddings; accept vector spaces]
<Other Science>

Back to bonuses

Summary

2023 ACF Winter @ Columbia11/11/2023Y915.5678%78%0%

Data

Columbia APenn B1001020
Columbia BColumbia C100010
HaverfordCornell C1001020
Princeton ANYU B001010
Yale APenn A100010
VassarPrinceton B1001020
Rutgers ARowan A1001020
NYU AYale B1001020
Rutgers BYale C001010