Need help with AlphaZero_Gomoku?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

2.4K Stars 796 Forks MIT License 44 Commits 64 Opened issues


An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)

Services available


Need anything else?

Contributors list


This is an implementation of the AlphaZero algorithm for playing the simple board game Gomoku (also called Gobang or Five in a Row) from pure self-play training. The game Gomoku is much simpler than Go or chess, so that we can focus on the training scheme of AlphaZero and obtain a pretty good AI model on a single PC in a few hours.

1. AlphaZero: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm 2. AlphaGo Zero: Mastering the game of Go without human knowledge

Update 2018.2.24: supports training with TensorFlow!

Update 2018.1.17: supports training with PyTorch!

Example Games Between Trained Models

  • Each move with 400 MCTS playouts:


To play with the trained AI models, only need: - Python >= 2.7 - Numpy >= 1.11

To train the AI model from scratch, further need, either: - Theano >= 0.7 and Lasagne >= 0.1
or - PyTorch >= 0.2.0
or - TensorFlow

PS: if your Theano's version > 0.7, please follow this issue to install Lasagne,
otherwise, force pip to downgrade Theano to 0.7

pip install --upgrade theano==0.7.0

If you would like to train the model using other DL frameworks, you only need to rewrite

Getting Started

To play with provided models, run the following script from the directory:

You may modify to try different provided models or the pure MCTS.

To train the AI model from scratch, with Theano and Lasagne, directly run:

With PyTorch or TensorFlow, first modify the file, i.e., comment the line
from policy_value_net import PolicyValueNet  # Theano and Lasagne
and uncomment the line ```

from policyvaluenet_pytorch import PolicyValueNet # Pytorch


from policyvaluenet_tensorflow import PolicyValueNet # Tensorflow

and then execute: ``python``  (To use GPU in PyTorch, set ``use_gpu=True`` and use ``return loss.item(), entropy.item()`` in function train_step in if your pytorch version is greater than 0.5)

The models (best_policy.model and current_policy.model) will be saved every a few updates (default 50).

Note: the 4 provided models were trained using Theano/Lasagne, to use them with PyTorch, please refer to issue 5.

Tips for training:

  1. It is good to start with a 6 * 6 board and 4 in a row. For this case, we may obtain a reasonably good model within 500~1000 self-play games in about 2 hours.
  2. For the case of 8 * 8 board and 5 in a row, it may need 2000~3000 self-play games to get a good model, and it may take about 2 days on a single PC.

Further reading

My article describing some details about the implementation in Chinese:

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.