WDL Conversion for more realistic WDL and contempt #1791

Naphthalin · 2022-11-13T09:47:21Z

The current contempt implementation is based on drawscore, which works to some extent but is ultimately flawed because the WDL distribution (especially the draw rate) coming from training games doesn't take the Elo difference into account, and ultimately the stronger side is avoiding not only short draws, but also positions Leela can likely hold against herself.

Similarly, in high level matches D is usually too low (and the winning chances of the disadvantaged side absurdly high), which again makes the WDL inaccurate, as DivP opponents won't make the same frequent inaccuracies as training game Leela.

Third, Leela would be much more interesting for opening game preparation if she would suggest different lines in the opening depending on the Elo difference between black and white and the time control (and playing level) dependent expected draw rate. Current drawscore contempt absolutely fails at this, as it basically never influences the opening choice.

This PR adds a general WDL conversion which hopefully serves both as an accuracy rescaling and as an expected inaccuracy based contempt. I will explain the general mechanics and the derivation of the theoretical model in a separate post.

…et's see if this compiles

Naphthalin · 2022-11-14T15:20:42Z

Explaining the general mechanics

The formulas used for WDL conversion aren't as random as they look, and instead are the result of an underlying mathematical model. The current parametrization reflects this inner model, but it would probably much more accessible if one could enter playing level and Elo difference, or simply white and black Elo.

The mathematical model

We model a chess game (or, really any other game where there is a theoretically best move) as players taking turns with playing inaccuracies drawn from an inaccuracy distribution. The current position gets an "objective" value, from which a random walk is started, and if at some point the value is above (or below) a certain value, the game is scored as a win (loss); if it ultimately ends up between these values, it's a draw. The WDL distribution at any point should predict the expected outcomes of the three results based on the skill level of both players.

Assumptions for simplification:

the separation values between win, draw and loss are irrelevant, as everything can be rescaled linearly to two arbitrary values. We therefore adopt +1 and -1 as the two bounds.
the random walk is assumed to take place in an arbitrary one-dimensional space, without any assumptions how it relates to cp evals
the expected outcome is represented as a continuous distribution over true values, and WDL refers to the area above +1, in [-1,+1], and below -1
in principle, one would need to know the distribution of inaccuracies by both players in all phases of the game, and neither player can play better than the objective truth. However, when fixing the shape of the distribution to any distribution described through first and second moment alone (like normal or logistic), we only need the respective moments of both players' inaccuracy distribution, and ignore the fact that they are supposed to be cut off at 0. Since the logistic distribution has a nice explicit cdf and is also used in the Elo calculation, that's a natural choice.
the respective inaccuracy distribution is applied either on every move, or at least equally often for white and black.
Lc0's self-play RL training also applies such a reasonable distribution, assuming deblunder is active and working as intended; rescaling the WDL thus ultimately means relating the target precision to the training precision.

Derivation of the formulas:

The areas above +1 and below -1 of a logistic curve with mean µ and scale sare simple expressions, and can be inverted to get simple equations for 2/s and 2µ/s from W and L, which then give the mean and variance of the "true" logistic distribution.

Applying a random walk with n steps for both sides results in the mean and variance
µ = µ_0 - n * E(sidetomove) + n * E(opponent)
var = n * var(sidetomove) + n * var(opponent)

Since we know mean and variance of all 4 involved players (white/black, reference/target players), we can approximate n from the WDL distribution given by the NN (as that reflects the statistical outcomes of training games post deblunder), and calculate the new µ and var for the target precisions. This is then used to calculate the scale s, and the cdf of the transformed logistic distribution is evaluated at +1 and -1 to again get W and L values.

However, in this transformation not all 8 parameters are neded. In fact, as training is between equal opponents anyway, scaling E and var by a constant factor only changes the internal value of n, and only the difference in the mean inaccuracy and the sum of the variances is required anyway, this can be reduced to just 2 parameters, the ratio of (sum of variances) between match and training, and the inaccuracy difference, thus the two parameters ratio and diff.

src/mcts/params.cc

Etcaqab · 2022-11-19T11:07:55Z

src/mcts/search.cc

+  } else {
+    node_to_process->v = v;
+    node_to_process->d = d;
+  }


I think that this new code should be a function, instead of an inlined block.

I definitely agree! However, I am not yet worried about the optimization and code style, and only about the functionality. Also, I was wondering whether it is a good idea to do the conversion here, which is also executed on NN cache hits, instead of doing it in a way that the converted values are cached.

I would like to see this conversion pushed "deeper" so that search is getting already adjusted NNEvals. This should also make merge with DAG easier.

What do you consider "deeper"? Right now, it is as deep as it can get without touching the NN output itself; whenever a new node is created, it gets the (unaltered) WDL from the NN (either by an actual NN GPU eval or by a cache hit), and applies the conversion. Do you suggest to alter the NN WDL output, so we don't have to calculate the conversion multiple times on cache hits?

Yes, I suggest to do the WDL adjustment before or on cache insertion so that we don't have to do that (multiple times) while fetching NN evals in search.

I have thought about it a bit more, and I don't overly like storing it in the NN cache, as I can easily imagine restarting search with different parameters without clearing the cache. However, the conversion formula now really is lightweight, equivalent to calculating the PUCT score for maybe 5 nodes, but only performed once per addition of a new node. If there is any measureable CPU delay at all and (with DAG) the ratio of (new nodes requesting NN evals) / (cache insertions) is much bigger than 1, it is worth reconsidering, though that would likely mean touching multiple backends, and somehow transferring search parameters there.

…lues

…he parameters are meaningless like that

…anywhere else

…instead of 0. Also added a few comments and turned magic numbers into meaningful const float variables

mooskagh · 2023-05-27T08:39:28Z

src/mcts/params.cc

+        throw Exception("Invalid default contempt: " + entry);
+      }
+    } else if (parts.size() == 2) {
+      if (std::search(name.begin(), name.end(), parts[0].begin(),


This is to check that parts[x] is a substring of name, right? (case insensitive).
(are you sure that you need a substring rather than a full match)

Yes; common practice in TCEC and CCC seems to be to add some versioning etc to the name, which we won't know in advance.

IIRC the original reasoning was to catch something like "Dragon" and "Komodo Dragon".

Adds an Elo based WDL transformation of the NN value head output. Helps with more accurate play at high level (WDL sharpening), more aggressive play against weaker opponents and draw avoiding openings (contempt), piece odds play. Also adds a new ScoreType `WDL_mu` which follows the new eval convention, where +1.00 means 50% white win chance. (cherry picked from commit 53b31ae)

Naphthalin · 2023-05-29T13:47:29Z

src/mcts/search.cc

+    } else if (score_type == "WDL_mu") {
+      // Reports the WDL mu value whenever it is reasonable, and defaults to
+      // centipawn otherwise.
+      const float centipawn_fallback_threshold = 0.99f;


Threshold currently has the fallback if W or L drops below 0.5%, which seems to be a bit soon. Should probably be changed to 0.996 or so, which would make it act if W or L drop below 0.2%

yuzisee · 2023-06-04T04:15:03Z

If using Lc0 in analysis mode on an already-completed armageddon game between two (human players, is there a combination of values to WDLEvalObjectivity and Contempt and ContemptPerspective that could be used to determine "who is winning, and by how much" at any given point in the game?

Naphthalin · 2023-06-04T09:06:23Z

Examples on how to use the PR, pt 3

Armageddon games

Armageddon games are usually played under the following conditions:

between human GMs (so, around 2700)
in rapid/blitz time control (slightly faster TC than the WDLCalibrationElo assumes)
with white having a significant time advantage (i.e. black is modeled as maybe 50-100 Elo weaker)

This could be for example achieved by the following settings:

"WDLDrawRateReference": 0.58, --> for T80
"WDLCalibrationElo": 2700,
"ContemptPerspective": "white",
"Contempt": "100", --> actual white advantage + TC dependent white bonus
"WDLContemptAttenuation": 1.0,
"WDLEvalObjectivity": 0.0, --> highly relevant for contempt: we want to see the internally used WDL
"DrawScoreWhite": -100,
"DrawScoreBlack": 100,
"ScoreType": "centipawn_with_drawscore" or  "WDL_mu"

note: a position with equal chances for white and black will show as approx. 0.00 when using centipawn_with_drawscore, or +1.00 when using WDL_mu. Alternatively there is also the scoretype Q which directly shows the expected score, from +100 (white win) to -100 (draw or black win).

I used these settings to kibitz the KGA Armageddon game between Hikaru and Magnus, and from a "practical chances" perspective the KGA didn't seem like a too bad choice for an Armageddon game :) Just curious, is this also the game you were interested in @yuzisee ?

PS: In the near future, the drawscore implementation will be simplified and only require 1 setting instead of 2 for Armageddon.

yuzisee · 2023-06-04T14:56:01Z

Just curious, is this also the game you were interested in

How did you know 🙂

Naphthalin and others added 11 commits November 12, 2022 22:30

wrote down WDL convserion formulas without defining parameters yet, l…

08dbc62

…et's see if this compiles

fixed typo, renamed q to v

b575994

pi still isn't a constant apparently

771b13f

made wdl rescale ratio and diff configurable

b464c54

made wdl rescale ratio and diff configurable

98ab28b

made wdl rescale ratio and diff configurable

07757b8

made wdl rescale ratio and diff configurable

0529619

fixed sign in backwards calculation

e03039b

accidentally named both options the same

a75fd5d

and forgot to actually use the new variance

a307012

Merge branch 'master' into WDLconversion

b066c57

add perspective parameter, doesn't yet do anything

405e88d

Naphthalin added the wip Work in progress label Nov 15, 2022

Naphthalin added 6 commits November 15, 2022 10:20

fixed mistake in definition

5d300b2

modified rescale to take perspective into account

a4f946b

fixed wrong logic

38ce8ab

fixed wrong logic

52195ac

fixed inconsistency with white/black perspective

8e31ccf

fixed inconsistency with white/black perspective

7db1418

Naphthalin added testing required Feature/bug fix needs more testing. Implies not for merge. unoptimized Functional, but not optimized for performance or code style. Implies not for merge. labels Nov 15, 2022

Naphthalin added 2 commits November 15, 2022 12:05

trivial attempt at safeguarding the logistic function

2259a8e

syntax error fix

f9b9a80

almaudoh reviewed Nov 18, 2022

View reviewed changes

src/mcts/params.cc Outdated Show resolved Hide resolved

Etcaqab reviewed Nov 19, 2022

View reviewed changes

Naphthalin added 4 commits November 20, 2022 22:24

had to deactive MLH because it interacts weirdly with extreme diff va…

c98d455

…lues

extracted WDL conversion to fastmath.h

3c4cb77

extracted WDL conversion to fastmath.h

d705138

mismatched variable names

6f3638b

Naphthalin and others added 6 commits April 19, 2023 17:28

decided to remove the option of setting ratio and diff directly, as t…

8840ad0

…he parameters are meaningless like that

Merge branch 'master' into WDLconversion

47cfc28

clang format

be95a23

made WDLRescaleParams a property of SearchParams, as it isn't needed …

e53aabe

…anywhere else

Merge branch 'master' into WDLconversion

7e71460

Merge branch 'master' into WDLconversion

f876733

Naphthalin removed the rfc Request for comments label May 8, 2023

Naphthalin and others added 3 commits May 21, 2023 23:42

Merge branch 'master' into WDLconversion

1e00ff0

GetContempt now defaults to UCI_RatingAdv if no name match is found, …

eda30a5

…instead of 0. Also added a few comments and turned magic numbers into meaningful const float variables

renamed threshold_centipawn_fallback to centipawn_fallback_threshold

32585d9

borg323 mentioned this pull request May 24, 2023

add the UCI_Opponent option #949

Closed

mooskagh approved these changes May 27, 2023

View reviewed changes

Naphthalin merged commit 53b31ae into LeelaChessZero:master May 28, 2023

Naphthalin commented May 29, 2023

View reviewed changes

yuzisee mentioned this pull request Aug 29, 2023

How does the graph differ from Lichess's? rooklift/nibbler#242

Closed

yuzisee mentioned this pull request Sep 30, 2023

Draw faint shaded areas around the eval line to indicate likeliness of a decisive result rooklift/nibbler#248

Open

Naphthalin mentioned this pull request Jan 12, 2024

Make the sharpness limit in WDLRescale configurable, and fix the Elo --> Contempt calculation #1941

Merged

Naphthalin mentioned this pull request Oct 11, 2024

Change centipawn fallback to account for sharper WDL with high WDLCalibrationElo #2075

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WDL Conversion for more realistic WDL and contempt #1791

WDL Conversion for more realistic WDL and contempt #1791

Naphthalin commented Nov 13, 2022

Naphthalin commented Nov 14, 2022

Etcaqab Nov 19, 2022

Naphthalin Nov 20, 2022

Etcaqab Nov 27, 2022

Naphthalin Nov 27, 2022

Etcaqab Nov 27, 2022

Naphthalin Dec 2, 2022

mooskagh May 27, 2023

Naphthalin May 27, 2023

borg323 May 27, 2023

Naphthalin May 29, 2023

yuzisee commented Jun 4, 2023 •

edited

Loading

Naphthalin commented Jun 4, 2023 •

edited

Loading

yuzisee commented Jun 4, 2023

WDL Conversion for more realistic WDL and contempt #1791

WDL Conversion for more realistic WDL and contempt #1791

Conversation

Naphthalin commented Nov 13, 2022

Naphthalin commented Nov 14, 2022

Explaining the general mechanics

The mathematical model

Assumptions for simplification:

Derivation of the formulas:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuzisee commented Jun 4, 2023 • edited Loading

Naphthalin commented Jun 4, 2023 • edited Loading

Examples on how to use the PR, pt 3

Armageddon games

yuzisee commented Jun 4, 2023

yuzisee commented Jun 4, 2023 •

edited

Loading

Naphthalin commented Jun 4, 2023 •

edited

Loading