31 projects in the framework of Deep Reinforcement Learning algorithms: Q-learning, DQN, PPO, DDPG, TD3, SAC, A2C and others. Each project is provided with a detailed training log.
No Data
Here you can find several projects dedicated to the Deep Reinforcement Learning methods.
The projects are deployed in the matrix form: [env x model], where env is the environment
to be solved, and model is the model/algorithm which solves this environment. In some cases,
the same environment is resolved by several algorithms. All projects are presented as
a jupyter notebook containing training log.
The following environments are supported:
AntBulletEnv, BipedalWalker, CarRacing, CartPole, Crawler, HalfCheetahBulletEnv, HopperBulletEnv,
LunarLander, LunarLanderContinuous, Markov Decision 6x6, Minitaur, Minitaur with Duck, MountainCar,
MountainCarContinuous, Pong, Navigation, Reacher, Snake, Tennis, Waker2DBulletEnv.
Four environments (Navigation, Crawler, Reacher, Tennis) are solved in the framework of the
Udacity Deep Reinforcement Learning Nanodegree Program.
AntBulletEnv, Soft Actor-Critic (SAC)
BipedalWalker, Twin Delayed DDPG (TD3)
BipedalWalker, PPO, Vectorized Environment
BipedalWalker, Soft Actor-Critic (SAC)
BipedalWalker, A2C, Vectorized Environment
CarRacing with PPO, Learning from Raw Pixels
CartPole, Policy Based Methods, Hill Climbing
CartPole, Policy Gradient Methods, REINFORCE
HalfCheetahBulletEnv, Twin Delayed DDPG (TD3)
HopperBulletEnv, Twin Delayed DDPG (TD3)
HopperBulletEnv, Soft Actor-Critic (SAC)
LunarLanderContinuous-v2, DDPG
Markov Decision Process, Monte-Carlo, Gridworld 6x6
MinitaurBulletEnv, Soft Actor-Critic (SAC)
MinitaurBulletDuckEnv, Soft Actor-Critic (SAC)
MountainCarContinuous, Twin Delayed DDPG (TD3)
MountainCarContinuous, PPO, Vectorized Environment
Pong, Policy Gradient Methods, PPO
Pong, Policy Gradient Methods, REINFORCE
Udacity Project 1: Navigation, DQN, ReplayBuffer
Udacity Project 2: Continuous Control-Reacher, DDPG, environment Reacher (Double-Jointed-Arm)
Udacity Project 2: Continuous Control-Crawler, PPO, environment Crawler
Udacity Project 3: Collaboration_Competition-Tennis, Multi-agent DDPG, environment Tennis
Walker2DBulletEnv, Twin Delayed DDPG (TD3)
Walker2DBulletEnv, Soft Actor-Critic (SAC)
### Projects with Soft Actor-Critic (SAC)
* AntBulletEnv
* BipedalWalker
* HopperBulletEnv
* MinitaurBulletEnv
* MinitaurBulletDuckEnv
* Walker2dBulletEnv
### BipedalWalker, different models
How does the Bellman equation work in Deep Reinforcement Learning?
A pair of interrelated neural networks in Deep Q-Network
Three aspects of Deep Reinforcement Learning: noise, overestimation and exploration