Skip to content

wangjia184/renju

Repository files navigation

A Reinforcement-Learning based Renju game

This is my personal project to practice reinforcement learning. You can download the application and play with it.

Renju is a professional variant of gomoku(five in a row) by adding following restrictions to black stones to weaken the advantages of the first player in the game.

  • Double three – Black cannot place a stone that builds two separate lines with three black stones in unbroken rows (i.e. rows not blocked by white stones).
  • Double four – Black cannot place a stone that builds two separate lines with four black stones in a row.
  • Overline – six or more black stones in a row.

User Interface

Overview

Neural Network

The following graph shows the policy-value network from AlphaGo Zero.

Alpha Zero

It has been simplified here:

  1. Number of residual blocks is reduced from 39 to 19.
  2. Residual block width is narrowed from 256 filters down to 64 filters.
  3. Since width is reduced to 1/4 and dying ReLU problem was encountered in first attempt, hence activation function ReLU of last dense layer is replaced with ELU.
  4. Input is simplified to a (1, 4, 15, 15) NCHW tensor with four 15x15 planes.
    • The first plane represents stones of current player
    • The second plane represents stones of opponent player
    • The third plane represents the position of last move
    • The fourth plane are filled with ones if current player is black; or zeros if white.
  5. Loss function

$$ l = (z-v)^{2}-\pi ^{T}ln(p)+c\left || \theta \right ||^{2} $$