Need help with DQN_pytorch?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

136 Stars 27 Forks 2 Commits 4 Opened issues


Vanilla DQN, Double DQN, and Dueling DQN implemented in PyTorch

Services available


Need anything else?

Contributors list

No Data

Vanilla DQN, Double DQN, and Dueling DQN in PyTorch


This repo is a PyTorch implementation of Vanilla DQN, Double DQN, and Dueling DQN based off these papers.

Starter code is used from Berkeley CS 294 Assignment 3 and modified for PyTorch with some guidance from here. Tensorboard logging has also been added (thanks here for visualization during training in addition to what the Gym Monitor already does).


Deep Q-networks use neural networks as function approximators for the action-value function, Q. The architecture used here specifically takes inputs frames from the Atari simulator as input (i.e., the state) and passes these frames through two convolutional layers and two fully connected layers before outputting a Q value for each action.

Human-level control through deep reinforcement learning introduced using a experience replay buffer that stores past observations and uses them as training input to reduce correlations between data samples. They also used a separate target network consisting of weights at a past time step for calculating the target Q value. These weights are periodically updated to match the updated, latest set of weights on the main Q network. This reduces the correlation between the target and current Q values. Q target is calculated as below.

Noting that vanilla DQN can overestimate action values, Deep Reinforcement Learning with Double Q-learning proposes an alternative Q target value that takes the argmax of the current Q network when inputted with the next observations. These actions, together with the next observations, are passed into the frozen target network to yield Q values at each update. This new Q target is shown below.

Finally, Dueling Network Architectures for Deep Reinforcement Learning proposes a different architecture for approximating Q functions. After the last convolutional layer, the output is split into two streams that separately estimate the state-value and advantages for each action within the state. These two estimations are then combined together to generate a Q value through the equation below. The architecture is also shown here in contrast to traditional Deep Q-Learning networks.



  • Execute the following command to train a model on vanilla DQN:
$ python train --task-id $TASK_ID

From the Atari40M spec, here are the different environments you can use: *

: BeamRider *
: Breakout *
: Enduro *
: Pong *
: Qbert *
: Seaquest *
: Spaceinvaders

Here are some options that you can use: *

: id of the GPU you want to use (if not specified, will train on CPU) *
: 1 to train with double DQN, 0 for vanilla DQN *
: 1 to train with dueling DQN, 0 for vanilla DQN



Sample gameplay


Sample gameplay


Sample gameplay

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.