Answer the following about DeepMind researchers training humanoid robots to play one-on-one soccer, for 10 points each.
[10e] The team modeled the soccer environment using extensions of stochastic processes named for this mathematician, in which event probabilities depend only on the current state.
ANSWER: Andrey Markov [or Andrey Andreyevich Markov; accept Markov processes or (discounted partially observable) Markov decision processes]
[10h] The researchers trained the robot agents to learn this function using an actor–critic algorithm. Agents learn this function, which maps a state to an action, in reinforcement learning.
ANSWER: policy [or policies]
[10m] The agents controlled their movement on the field using 20 servomotors, which generally belong to either the rotary or linear type of this class of devices. These devices convert an input signal into mechanical motion or force.
ANSWER: actuators [accept rotary actuators or linear actuators] (The paper is “Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning.”)
<Other Science>