A Hearthstone AI based on Monte Carlo tree search and neural nets written in modern C++.
The is an AI for the card game HearthStone which originally motivated by AlphaGo! This work combines Monte Carlo tree search (with extensions for imperfection game), deep neural network, and a high-performance game engine.
Use TensorFlow for training and prediction. The neural network model is defined inMCTS agent by the following steps:
A simple example shows the neural network can greatly boost MCTS play strength: * A mid-level person knows the arcane missiles should generally not be played in the first turn. * If using random default policy, it takes more than 300k iterations (8G+ RAM) to realize this. * If using neural network as default policy, it only takes < 15k iterations (less than 5 seconds) to realize this.
Similar to AlphaZero proposed by DeepMind, a reinforcement learning pipeline is also implemented. The pipeline works, but requires intensive computation resource to do a great job. Some results are also outlined.
The goal of the neural network is to guess who is going to win this game, by looking at only the current board. Several improvements could be done:
Hope we can have a better accuracy than current result (~79%, which also aligned to the result of AAIA'17 Data Mining Challenge: Helping AI to Play Hearthstone (https://knowledgepit.fedcsis.org/mod/page/view.php?id=1022)).
I have tried to embedding the card id to encode the battlecry and deathrattle features for each different card. Maybe we need to find a better way to generate game data automatically, so the neural network can learn the embeddings separately and hopefully more accurately.
This is now dealt with by a separated repository on github.
Why wide? Due to randomness, the branch factor is quite large (~4000 when drawing a card). So, there are many tree nodes in the game tree.
Why deep? The neural network by itself are not strength enough. We need to think ahead more steps to overcome the weakness in simulation.
In a naive implementation of MCTS, all the children nodes must be expanded before we use UCB formula to choose a child node and continues in selection stage. Few ideas here: 1. A fixed possibility to continue in selection stage. Even not all children are expanded. 2. A dynamic possibility based on rest of thinking time and current expansion progress.
Even if there are only one card is different, we still need two tree nodes. Otherwise, we will fuse the strategy decision in Monte-Carlo tree search. However, this does not means that, we cannot share information between nodes. On the contrary, AMAF (all-move-as-first) and RAVE (rapid action value estimation) are based on this basic idea.
Right now, Just refer to the move the AI suggested, and do it manually on the game client.
Just me. Any idea/help is always welcomed.
Latest GPL license is applied to this project.
Some third party libraries are used, please also obey to their licenses.