[WIP] Implement the RENTS algorithm for tree exploration #1418

kiudee · 2020-09-10T10:56:49Z

Description

This is a WIP branch where I will implement the RENTS algorithm as proposed by Tuan et al. 2020 [1]. Currently, it is not working yet. I will remove this note as soon as you can run it.

Policies are continually recomputed using the following weighted softmax:

To ensure enough exploration, the uniform distribution is mixed in:

with the following exploration term:

The initialization of the Q values is crucial. It is possible to use the existing policy head to come up with a semi-decent initialization, but preliminary experiments in a0lite have shown that initializing using the value head is much better. For that we would need a value head which outputs Q values for every move.

Pros and Cons

Pros:

Faster convergence than PUCT
Potentially better scaling behaviour
Potentially easier collection of batches

Cons:

More computational overhead during backpropagation (may need to come up with a clever updating scheme, or update only rarely)
Unclear yet, if it is better for chess (the variant TENTS could work better)

Ablations

Here I will record SPRT tests of different design decisions.

To do

Implementation:

Implement weighted softmax function
Initialize Q values
Update Q values and policies during backpropagation
Select moves based on policies
Additional parameters to control exploration, softmax temperature etc
Ensure correct handling of Q values
Check how it interacts with cache

Test:

Performance using fixed nodes
...

References

[1] Dam, T., D'Eramo, C., Peters, J., and Pajarinen, J., “Convex Regularization in Monte-Carlo Tree Search”, arXiv e-prints, 2020.

…t the same time

kiudee added 3 commits September 10, 2020 12:17

Add function which computes the relative entropy softmax and policy a…

6d8bf53

…t the same time

Add initial_q_ field to Edge and add it to GetQ

103f090

Compute RENTS value and policy during backpropagation

1c0cca2

kiudee added enhancement New feature or request wip Work in progress not for merge Experimental code which is not intended to be merged into the master labels Sep 10, 2020

kiudee added 14 commits September 11, 2020 12:38

Make softmax and logsumexp numerically stable

0faa765

Add exploration factor to UCI parameters

4bb37fb

Add randomized move selection based on policy

ce4fb0d

Correctly apply policy softmax temperature

f548a17

Set lambda_s to 1.0 if the node has no visits yet

3eb08d6

Fix crash by always updating the best child edge

715fc12

Correctly backpropagate the value on the parent node

be57f04

Sort VerboseMoveStats by policy

934d62a

Select best move based on policy

eca76ea

Introduce policy attribute and use it correctly in RENTS

dd76b0b

Fix softmax computation

6f3a44c

Correctly track Q values

267954b

Widen parameter range of RENTS temperature

be79ba5

Fix backpropagation

1e3f593

Naphthalin mentioned this pull request Oct 30, 2020

Implementing an analysis mode for Lc0 to allow forward/backward analysis without losing the tree. #1455

Closed

Naphthalin mentioned this pull request Apr 21, 2022

Extracting parts of Lc0's search into classes would help future development #1734

Open

Naphthalin added demo Code/concept demonstration. Implies not for merge, won't be closed without consulting author. and removed wip Work in progress not for merge Experimental code which is not intended to be merged into the master labels Nov 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Implement the RENTS algorithm for tree exploration #1418

[WIP] Implement the RENTS algorithm for tree exploration #1418

kiudee commented Sep 10, 2020 •

edited

Loading

[WIP] Implement the RENTS algorithm for tree exploration #1418

Are you sure you want to change the base?

[WIP] Implement the RENTS algorithm for tree exploration #1418

Conversation

kiudee commented Sep 10, 2020 • edited Loading

Description

Pros and Cons

Ablations

To do

References

kiudee commented Sep 10, 2020 •

edited

Loading