actors and critics - 2024 ARGOS @ McMaster

Question

Answer the following about the methods used by Google’s DeepMind to train AlphaStar, an agent developed to play StarCraft II that reached the highest rank of Grandmaster in 2019. For 10 points each:

[10h] Two answers required. The reinforcement learning procedure used by AlphaStar is based on a policy gradient algorithm in a framework named for these two entities. A popular RL algorithm is named for “Asynchronous Advantage” and these two entities, where policy and value functions are simultaneously learned and updated.

ANSWER: actors and critics [accept (Asynchronous) Advantage Actor-Critic; prompt on A2C or A3C]

[10m] The supervised and reinforcement stages of AlphaStar combined losses using this optimizer. Momentum and RMSProp are precursors to this often-default ML optimization method that has a four-letter acronym.

ANSWER: Adam algorithm [or Adaptive Moment Estimation]

[10e] The multi-agent stage of AlphaStar avoids solely using naive self-play because of its tendency to chase these constructs, leading to an infinite loop. In graphs, these constructs are paths that have the same first and last vertex.

ANSWER: cycles [or circuits]

Back to bonuses

Summary


2024 ARGOS @ McMaster	11/17/2024	Y	5	14.00	100%	40%	0%
2024 ARGOS @ Brandeis	03/22/2025	N	3	16.67	67%	67%	33%
2024 ARGOS @ Chicago	11/23/2024	N	6	15.00	83%	50%	17%
2024 ARGOS @ Christ's College	12/14/2024	N	3	6.67	33%	33%	0%
2024 ARGOS @ Columbia	11/23/2024	N	3	10.00	67%	33%	0%
2024 ARGOS @ Stanford	02/22/2025	N	3	10.00	67%	33%	0%
2024 ARGOS Online	03/22/2025	N	3	13.33	100%	33%	0%

Data


I'd prefer to have the team name be Christensen et al. than anything that Erik cooks up	Simpson Agonistes: The Crisis of Donut	0	10	10
You cannot go to Aarhus to see his peat-brown head / With eyes like ripening fruit	Moderator Can't Neg me While in Alpha	10	10	20
The Only Existing Manuscript from A Clockwork Orange	Ryan Wesley Routh's 10 000 NATO-trained Afghan Quizbowlers	0	10	10
Communism is Soviet power plus the yassification of the whole country	She Dicer On My Argonaute Till I RNA Interfere	10	10	20
as rational as the square root of two power bottoms	Tensei Shitara Flashcard Data Ken	0	10	10