Question
Q, K, and V vectors are multiplied together, then the results are concatenated with each other, and are finally applied to the matrix of these quantities in the (*) "multi-head" form of a certain mechanism. These quantities are uniformly sampled from the range negative to positive inverse square root of input number in a technique unusually named for its developer's first name, Xavier initialization. These quantities are updated during runtime in the "attention" mechanism that is central to transformer models. These quantities are the coefficients in a sum that is fed into a function like softmax or ReLU (“rel-you”) The biases or, more commonly, these quantities are updated by performing gradient descent on the loss function through backpropagation. For 10 points, name these quantities in a neural network that represent the connection strength between neurons. ■END■
Buzzes
Player | Team | Opponent | Buzz Position | Value |
---|---|---|---|---|
Richard Niu | The Aum-Wein Drinchard by Amogh Tutuola | 1.g4 Test Mixture | 97 | -5 |
Kai Smith | 1.g4 Test Mixture | The Aum-Wein Drinchard by Amogh Tutuola | 133 | 10 |