A Reinforcement-Learning based Renju game

This is my personal project to practice reinforcement learning. You can download the application and play with it.

Renju is a professional variant of gomoku(five in a row) by adding following restrictions to black stones to weaken the advantages of the first player in the game.

Double three – Black cannot place a stone that builds two separate lines with three black stones in unbroken rows (i.e. rows not blocked by white stones).
Double four – Black cannot place a stone that builds two separate lines with four black stones in a row.
Overline – six or more black stones in a row.

Overview

AI is implemented using residual Convolutional Neural Network and Monte Carlo Tree Search.
AI is not told how to play game. AI learns how to play the game by playing with itself (a.k.a Self Player).
Algorithm design is modified from Alpha Go Zero
- AlphaGo: How it works technically?
- AlphaZero - A step-by-step look at Alpha Zero and Monte Carlo Tree Search
- AlphaGo Zero — a game changer. (How it works?)
- Lessons From Alpha Zero 1 2 3 4 5 6
The application is developed in Rust language to avoid performance bottleneck in Python for MCTS.
Self-play is much slower than training. Hence Self-Play is opitimized to use quantization. It shows better performance than GPU on Mac M1.
A novel lock-free tree implementation for MCTS.

Neural Network

The following graph shows the policy-value network from AlphaGo Zero.

It has been simplified here:

Number of residual blocks is reduced from 39 to 19.
Residual block width is narrowed from 256 filters down to 64 filters.
Since width is reduced to 1/4 and dying ReLU problem was encountered in first attempt, hence activation function ReLU of last dense layer is replaced with ELU.
Input is simplified to a (1, 4, 15, 15) NCHW tensor with four 15x15 planes.
- The first plane represents stones of current player
- The second plane represents stones of opponent player
- The third plane represents the position of last move
- The fourth plane are filled with ones if current player is black; or zeros if white.
Loss function

$$ l = (z-v)^{2}-\pi ^{T}ln(p)+c\left || \theta \right ||^{2} $$

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.vscode		.vscode
game		game
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
alphazero.png		alphazero.png
apple_arch64.sh		apple_arch64.sh
model.png		model.png
ui.png		ui.png
windows_x64.sh		windows_x64.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Reinforcement-Learning based Renju game

Overview

Neural Network

About

Releases 2

Packages

Languages

License

wangjia184/renju

Folders and files

Latest commit

History

Repository files navigation

A Reinforcement-Learning based Renju game

Overview

Neural Network

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages