A3C: Asynchronous Advantage Actor-Critic
A3C is a deep reinforcement learning method that consists of mainly 3 elements.
Element 1: Asynchronous
Instead of only having one agent trying to get to the desired destination, this paper has multiple agents exploring the stage in parallel while sharing each other’s experience, hence the term asynchronous.
Element 2: Actor-Clinic
The architecture is mostly the same as a Deep Convolutional Q-Learning model(Applying convolution, pooling, and flattening to connect to a neural network and have the network output a certain action).
The difference is that instead of outputting only the next action(the ACTOR), it also outputs another head which stores the value of the agent’s current state(the CRITIC) in the final layer. The multiple agents share their experiences by sharing the CRITIC.
Element 3: Advantage
When calculating the loss for the ACTOR(called the POLICY LOSS in the paper), it considers a value called “ADVANTAGE”; Defined as (“Action You Chose To Take” – “Current State”). By examining this, the agent can understand how “BETTER” the action being taken is compared to the current state.
Reference: Asynchronous Methods for Deep Reinforcement Learning