This is my personal project to practice reinforcement learning. You can download the application and play with it.
Renju is a professional variant of gomoku(five in a row) by adding following restrictions to black stones to weaken the advantages of the first player in the game.
- Double three – Black cannot place a stone that builds two separate lines with three black stones in unbroken rows (i.e. rows not blocked by white stones).
- Double four – Black cannot place a stone that builds two separate lines with four black stones in a row.
- Overline – six or more black stones in a row.
- AI is implemented using residual Convolutional Neural Network and Monte Carlo Tree Search.
- AI is not told how to play game. AI learns how to play the game by playing with itself (a.k.a Self Player).
- Algorithm design is modified from Alpha Go Zero
- The application is developed in Rust language to avoid performance bottleneck in Python for MCTS.
- Self-play is much slower than training. Hence Self-Play is opitimized to use quantization. It shows better performance than GPU on Mac M1.
- A novel lock-free tree implementation for MCTS.
The following graph shows the policy-value network from AlphaGo Zero.
It has been simplified here:
- Number of residual blocks is reduced from 39 to 19.
- Residual block width is narrowed from 256 filters down to 64 filters.
- Since width is reduced to 1/4 and dying ReLU problem was encountered in first attempt, hence activation function ReLU of last dense layer is replaced with ELU.
- Input is simplified to a
(1, 4, 15, 15)
NCHW tensor with four 15x15 planes.- The first plane represents stones of current player
- The second plane represents stones of opponent player
- The third plane represents the position of last move
- The fourth plane are filled with ones if current player is black; or zeros if white.
- Loss function