Transform Q into logit space when determining Q+U best child #925

AlexisOlson · 2019-08-17T01:56:17Z

Edit:
The idea here is that when Q is near +1 or -1, the U term dominates the search since a small change in Q corresponds to a large change in the chance of winning/losing.

This PR transforms Q into logit space (logit is the function that converts log-odd to probability) before adding the U term.

In sumary, instead of Q + U, I use logit(Q) + U.

See also @Naphthalin's explanation on Discord

The problem it tries to help with: if near Q=1 or Q=-1 (so winning or losing), the search goes super wide because the Q differences are very small. Between 0.97 and 0.98, for example, this wide search is a problem for accurate evaluation of sharper lines. Assume there is a move that in depth 4 increases Q from 0.9 to 0.99. However, the highest policy move there loses momentum, reducing Q from 0.9 to 0.6.

To get an accurate eval of >0.95 for this line the better line has to get at least 5x the visits of the other line. If we approach 1.0, this problem increases: if a move would increase Q from 0.98 to 0.99 while an alternative drops it to 0.9, it would need 10x the nodes. If the cpuct would decrease once we approach Q=1.0 it would help with this problem because the PUCT search would spend more visits on the top choices, therefore giving a more accurate eval.

So the originally proposed idea was to effectively reduce cpuct at higher Q. However, especially with including fpu and for symmetry reasons it is better to replace Q+U by something which behaves as an effective Q, which means being always in [-1,1] where -1 and 1 mean definite results. This is done by doing the addition of Q and U in the logit space which is in our case the proper way of thinking about odds of winning/losing. To do this, Q is transformed back into logit (which is what the NN calculates somewhere), adding the exploration term and FPU, and transforming it back by a tanh to get a winrate. The calculation can be simplified a bit so tanh needs to be called only once.

The expected outcome is no different behavior near Q=0 while being more selective near Q=1 and Q=-1 which (hopefully) favors tactical lines where lc0 can already see the progress over staying in a comfortable position because the good line has a higher weight than it would have now. (Also one important detail: one might think about doing the averaging of Q values also in the logit space. This, however, is most likely to be avoided since the Q values are about statistics, and otherwise having a single 1.0 eval somewhere in the tree would destroy the Q.)

An example at extreme Q (PR925 on bottom):
4Rbk1/5p2/p2Q2p1/7p/6N1/2p4P/PP3PPK/8 w - - 0 1

Original:

Scaling factor for U term: 2 * Q / ln( (1 + Q) / (1 - Q) )

Graph from Desmos

Cpuct probably needs to be re-tuned along with this change.

It may or may not make more sense to use the root/parent Q value instead.

efficient propagation of certainty, two-fold draw scoring, mate display and more. =1 suitable for training =2 for play Currently negabound search depth is one. Improves play in positions with many certain positions (nrear endgame TBs, mates). Sees repetitions faster and scores positions more accurately.

…ersion. Increasing threads (e.g. 4 or 6) will get to masters speed now. Further speed fixes (move generator) possible....

…with lto, this yields a speed up by 30-50% in backend=random. In order to fully use CP please use 4 threads+. Changed default temporarily to 4 threads with this commit, to collect more scaling data.

…ds instant play of certain winning moves and avoidance of loosing moves regardless of visits. CP=3 now adds advanced pruning.

- exposed depth parameter (0 is no-look-ahead) - only two modes CP=1 for training and CP=2 for play Todo: - change option from int to choiceoption - use info.mate to communicate mate scores

- Certainty Propagation is a bool option now, just on or off (default = off). - Cleanup code and comments - Threads default = 2, but if certainty propagation is turned on please use 4 threads.

…r certain losses over terminal losses.

src/mcts/search.cc

mooskagh · 2019-09-13T20:21:24Z

Could you check nps with random backend

before this change
after this change with LogitQEnabled=false, and
after the change with LogitQEnabled=true

e.g. with $ ./lc0 benchmark --backend=random

UPD: did that myself:

235306 nps
245927 nps
235812 nps

which is a bit suspicious that it became faster. :)

UPD2:
First run had a bad day, reruns show ~246knps

src/mcts/params.cc

src/mcts/node.h

src/mcts/params.cc

src/mcts/node.h

src/mcts/params.cc

src/mcts/node.h

Videodr0me added 30 commits August 12, 2018 20:24

Efficient and informative depth computation.

54a199a

Update comments and replace tabs with spaces

b6c88dc

Merge remote-tracking branch 'upstream/master'

db3b419

Merge remote-tracking branch 'upstream/master'

fbebb38

Fixes compiler warnings/errors on -pedantic.

5f8aff0

Resolve merge conflicts

c664fb5

Resolve Merge Conflicts 2

560e413

Merge: unexpected behaviour when go infinite fixed

4078af0

Speed fix. Reading non-cached parameters was slow. Now using cached v…

4fb5522

…ersion. Increasing threads (e.g. 4 or 6) will get to masters speed now. Further speed fixes (move generator) possible....

Speed fix. Used reserve in pseudo legal move generation. If compiled …

84113e5

…with lto, this yields a speed up by 30-50% in backend=random. In order to fully use CP please use 4 threads+. Changed default temporarily to 4 threads with this commit, to collect more scaling data.

Fix for CP=2, CP=2 (default for play) is now more conservative and ad…

aa266eb

…ds instant play of certain winning moves and avoidance of loosing moves regardless of visits. CP=3 now adds advanced pruning.

Bugfixes, codecleanup minor changes:

f087312

- exposed depth parameter (0 is no-look-ahead) - only two modes CP=1 for training and CP=2 for play Todo: - change option from int to choiceoption - use info.mate to communicate mate scores

Rename ClearEdge

fdbe61a

change build-cuda to latest

ffee98f

use optional info.mate to display mate scores

4920e74

display 0.00 for tablebase draws when syzygy filtered

5e726b4

Finalize this WIP PR:

aba68a7

- Certainty Propagation is a bool option now, just on or off (default = off). - Cleanup code and comments - Threads default = 2, but if certainty propagation is turned on please use 4 threads.

merge with master

fec72f0

Merge remote-tracking branch 'upstream/master'

2a47958

Merge remote-tracking branch 'upstream/master'

2c7e465

Merge branch 'certainty-propagation-negabound'

b7379f4

merge with master

1b7ac55

Prefer terminal wins over certain wins, to avoid delaying mate. Prefe…

9f21d9d

…r certain losses over terminal losses.

update to master

27bdf3a

basic certainty propagation - part 2 of PR 487

5343415

fix off by one mate count

f503c15

Make Two Fold Draw Scoring optional to make everybody happy

01bb096

fixed typos, comments and formatting

2f63b78

fixed more typos and renamed GetCertaintyStatus to GetCertaintyState

0a7cfa5

AlexisOlson added 2 commits September 11, 2019 18:59

0.99999994 float

cd14689

Logit Q overridden to be false

ba94de4

Tilps reviewed Sep 12, 2019

View reviewed changes

src/mcts/search.cc Outdated Show resolved Hide resolved

src/mcts/search.cc Outdated Show resolved Hide resolved

AlexisOlson added 9 commits September 12, 2019 09:18

Simplify nested ternary clauses

13aa779

Stick logit logic into GetQ

0db1448

Make 2nd GetQ argument optional defaulted to false

c37e37e

Use new GetQ with logit boolean argument

beb5fbe

Missing semicolon

5e0e983

Removed extra parenthesis

6842fb5

Fix GetQ in sort section

51339d6

Another missing semicolon

4000d23

Use already defined logit_q variable

fb9ea47

mooskagh reviewed Sep 13, 2019

View reviewed changes

src/mcts/params.cc Outdated Show resolved Hide resolved

mooskagh reviewed Sep 13, 2019

View reviewed changes

src/mcts/node.h Outdated Show resolved Hide resolved

src/mcts/params.cc Outdated Show resolved Hide resolved

src/mcts/node.h Show resolved Hide resolved

src/mcts/params.cc Outdated Show resolved Hide resolved

mooskagh removed the wip Work in progress label Sep 13, 2019

AlexisOlson mentioned this pull request Sep 13, 2019

Switch centipawn to logit and refactor score calculations. #881

Closed

AlexisOlson added 6 commits September 13, 2019 16:21

Shorten parameter name and update description

aab571a

Shorten LogitQ parameter name

70de151

Shorten LogitQ parameter name

d4c3d96

Shorten LogitQ parameter name

4159862

Shorten LogitQ parameter name

ce8d794

Edit and comment on scaling constant

7af32e5

mooskagh approved these changes Sep 14, 2019

View reviewed changes

src/mcts/node.h Show resolved Hide resolved

mooskagh merged commit 226cc1b into LeelaChessZero:master Sep 14, 2019

Naphthalin mentioned this pull request Sep 17, 2019

Logit Q keeps exploration focused when winrate near 0 or 1 leela-zero/leela-zero#2496

Open

dtracers mentioned this pull request Sep 19, 2019

Implement equilibrium based U formula #918

Closed

3 tasks

Naphthalin mentioned this pull request Sep 23, 2019

PR918+925+956 #959

Closed

AlexisOlson mentioned this pull request Aug 3, 2020

ScaleQ - Updated implementation of LogitQ idea (WIP - not for merge) #1408

Closed

Naphthalin mentioned this pull request Apr 21, 2022

Extracting parts of Lc0's search into classes would help future development #1734

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transform Q into logit space when determining Q+U best child #925

Transform Q into logit space when determining Q+U best child #925

AlexisOlson commented Aug 17, 2019 •

edited

Loading

mooskagh commented Sep 13, 2019 •

edited

Loading

Transform Q into logit space when determining Q+U best child #925

Transform Q into logit space when determining Q+U best child #925

Conversation

AlexisOlson commented Aug 17, 2019 • edited Loading

mooskagh commented Sep 13, 2019 • edited Loading

AlexisOlson commented Aug 17, 2019 •

edited

Loading

mooskagh commented Sep 13, 2019 •

edited

Loading