by kvfrans

kvfrans /openai-cartpole

random search, hill climbing, policy gradient

130 Stars 64 Forks Last release: Not found 8 Commits 0 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

Random search, hill climbing, policy gradient for CartPole

Simple reinforcement learning algorithms implemented for CartPole on OpenAI gym.

This code goes along with my post about learning CartPole, which is inspired by an OpenAI request for research.

Algorithms implemented

Random Search: Keep trying random weights between [-1,1] and greedily keep the best set.

Hill climbing: Start from a random initialization, add a little noise evey iteration and keep the new set if it improved.

Policy gradient Use a softmax policy and compute a value function using discounted Monte-Carlo. Update the policy to favor action-state pairs that return a higher total reward than the average total reward of that state. Read my post about learning CartPole for a better explanation of this.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.