Fast AlphaZero General

An implementation of AlphaZero for any game, inspired by alpha-zero-general. It uses multiprocessing and cython to greatly increase the performance. Due to certain optimizations, the project requires a gpu. Also note, .pyx files are cython files that are very similar to python (identical in some cases).

This project only supports Pytorch models because of Pytorch's support for multiproccesing.

To use a game of your choice, subclass the classes in Game.py implement its functions. Example implementations for Othello can be found in othello/OthelloGame.pyx.

You may want to edit NNetArchitecture.py or NNetWrapper.py to implement different neural networks for various games.

Coach.py contains the core training loop and MCTS.pyx performs the Monte Carlo Tree Search. The parameters for the self-play can be specified in main.py. Additional neural network parameters are in NNetWrapper.py (learning rate, number of filters, depth of resnet).

Installation

Install pytorch for gpu following their guide. Then run:

pip install -r requirements.txt

Execution

To start training a model for Connect4:

python main.py

Choose your game in main.py.

Performance

After training on Connect4 for 200 iteration(~1 day) on my laptop(Intel i7-8750H, GTX 1060 6GB), the ai has definitely improved a ton:

1. iteration-0200 with 2.70 rating
2. iteration-0190 with 2.53 rating
3. iteration-0180 with 2.49 rating
4. iteration-0170 with 2.35 rating
5. iteration-0160 with 2.14 rating
6. iteration-0150 with 1.75 rating
7. iteration-0140 with 1.63 rating
8. iteration-0130 with 1.43 rating
9. iteration-0120 with 1.14 rating
10. iteration-0110 with 0.95 rating
11. iteration-0100 with 0.50 rating
12. iteration-0090 with 0.21 rating
13. iteration-0080 with -0.13 rating
14. iteration-0070 with -0.66 rating
15. iteration-0060 with -0.90 rating
16. iteration-0050 with -1.55 rating
17. iteration-0040 with -1.97 rating
18. iteration-0030 with -2.48 rating
19. iteration-0020 with -3.10 rating
20. iteration-0010 with -3.67 rating
21. iteration-0000 with -5.36 rating

(Rating Diff, Winrate) -> (0.5, 62%), (1, 73%), (2, 88%), (3, 95%), (5, 99%)

These ratings aren't exactly elo, but they give a good sense of how much the AI has improved. From basically being random with a monte carlo tree search in iteration 0 to winning about 100% of the time against random with a montecarlo tree search by the end. On top of that, when iteration 190 and 200 play each other, they tie over half of the games.

When I tell iteration 200 to play itself optimally, it wins every single game that it is the first play. This means that it plays perfectly because in perfect play, the first play wins every game.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
connect4		connect4
gobang		gobang
othello		othello
pytorch_classification		pytorch_classification
tictactoe		tictactoe
.gitignore		.gitignore
Arena.pyx		Arena.pyx
Coach.py		Coach.py
Game.py		Game.py
GenericPlayers.py		GenericPlayers.py
LICENSE		LICENSE
MCTS.pyx		MCTS.pyx
NNetArchitecture.py		NNetArchitecture.py
NNetWrapper.py		NNetWrapper.py
README.md		README.md
SelfPlayAgent.pyx		SelfPlayAgent.pyx
main.py		main.py
pit-multi.py		pit-multi.py
pit.py		pit.py
requirements.txt		requirements.txt
roundrobin.py		roundrobin.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fast AlphaZero General

Installation

Execution

Performance

About

Releases

Packages

Languages

License

starovp/fast-alphazero-general

Folders and files

Latest commit

History

Repository files navigation

Fast AlphaZero General

Installation

Execution

Performance

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages