Policy Gradient is all you need! A step-by-step tutorial for well-known PG methods.
<!-- ALL-CONTRIBUTORS-BADGE:END --> This is a step-by-step tutorial for Policy Gradient algorithms from A2C to SAC, including learning acceleration methods using demonstrations for treating real applications with sparse rewards. Every chapter contains both of theoretical backgrounds and object-oriented implementation. Just pick any topic in which you are interested, and learn! You can execute them right away with Colab even on your smartphone.
Please feel free to open an issue or a pull-request if you have any idea to make it better. :)
If you want a tutorial for DQN series, please see Rainbow is All You Need.
Reference: OpenAI gym Pendulum-v0
Num | Observation | Min | Max
0 | cos(theta) | -1.0| 1.0 1 | sin(theta) | -1.0| 1.0 2 | theta dot | -8.0| 8.0
Num | Action | Min | Max
0 | Joint effort | -2.0| 2.0
The precise equation for reward:
-(theta^2 + 0.1*theta_dt^2 + 0.001*action^2)
Theta is normalized between -pi and pi. Therefore, the lowest cost is
-(pi^2 + 0.1*8^2 + 0.001*2^2) = -16.2736044, and the highest cost is
0. In essence, the goal is to remain at zero angle (vertical), with the least rotational velocity, and the least effort. Max steps per an episode is 200 steps.
This repository is tested on Anaconda virtual environment with python 3.6.1+
$ conda create -n pg-is-all-you-need python=3.6.9 $ conda activate pg-is-all-you-need
First, clone the repository.
git clone https://github.com/MrSyee/pg-is-all-you-need.git cd pg-is-all-you-need
Secondly, install packages required to execute the code. Just type:
Install packages required to develop the code:
make devIf you want to check the difference of jupyter files that you modified, use nbdime:
Thanks goes to these wonderful people (emoji key): <!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section --> <!-- prettier-ignore-start --> <!-- markdownlint-disable -->
Jinwoo Park (Curt)