Actor-critic with experience replay
Actor-critic with experience replay (ACER) [1]. Uses batch off-policy updates to improve stability. Trust region updates can be enabled with
--trust-region. Currently uses full trust region instead of "efficient" trust region (see issue #1).
Run with
python main.py. To run asynchronous advantage actor-critic (A3C) [2] (but with a Q-value head), use the
--on-policyoption.
To install all dependencies with Anaconda run
conda env create -f environment.ymland use
source activate acerto activate the environment.
[1] Sample Efficient Actor-Critic with Experience Replay
[2] Asynchronous Methods for Deep Reinforcement Learning