Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WDL Conversion for more realistic WDL and contempt #1791

Merged
merged 135 commits into from
May 28, 2023

Conversation

Naphthalin
Copy link
Contributor

The current contempt implementation is based on drawscore, which works to some extent but is ultimately flawed because the WDL distribution (especially the draw rate) coming from training games doesn't take the Elo difference into account, and ultimately the stronger side is avoiding not only short draws, but also positions Leela can likely hold against herself.

Similarly, in high level matches D is usually too low (and the winning chances of the disadvantaged side absurdly high), which again makes the WDL inaccurate, as DivP opponents won't make the same frequent inaccuracies as training game Leela.

Third, Leela would be much more interesting for opening game preparation if she would suggest different lines in the opening depending on the Elo difference between black and white and the time control (and playing level) dependent expected draw rate. Current drawscore contempt absolutely fails at this, as it basically never influences the opening choice.

This PR adds a general WDL conversion which hopefully serves both as an accuracy rescaling and as an expected inaccuracy based contempt. I will explain the general mechanics and the derivation of the theoretical model in a separate post.

@Naphthalin
Copy link
Contributor Author

Explaining the general mechanics

The formulas used for WDL conversion aren't as random as they look, and instead are the result of an underlying mathematical model. The current parametrization reflects this inner model, but it would probably much more accessible if one could enter playing level and Elo difference, or simply white and black Elo.

The mathematical model

We model a chess game (or, really any other game where there is a theoretically best move) as players taking turns with playing inaccuracies drawn from an inaccuracy distribution. The current position gets an "objective" value, from which a random walk is started, and if at some point the value is above (or below) a certain value, the game is scored as a win (loss); if it ultimately ends up between these values, it's a draw. The WDL distribution at any point should predict the expected outcomes of the three results based on the skill level of both players.

Assumptions for simplification:
  • the separation values between win, draw and loss are irrelevant, as everything can be rescaled linearly to two arbitrary values. We therefore adopt +1 and -1 as the two bounds.
  • the random walk is assumed to take place in an arbitrary one-dimensional space, without any assumptions how it relates to cp evals
  • the expected outcome is represented as a continuous distribution over true values, and WDL refers to the area above +1, in [-1,+1], and below -1
  • in principle, one would need to know the distribution of inaccuracies by both players in all phases of the game, and neither player can play better than the objective truth. However, when fixing the shape of the distribution to any distribution described through first and second moment alone (like normal or logistic), we only need the respective moments of both players' inaccuracy distribution, and ignore the fact that they are supposed to be cut off at 0. Since the logistic distribution has a nice explicit cdf and is also used in the Elo calculation, that's a natural choice.
  • the respective inaccuracy distribution is applied either on every move, or at least equally often for white and black.
  • Lc0's self-play RL training also applies such a reasonable distribution, assuming deblunder is active and working as intended; rescaling the WDL thus ultimately means relating the target precision to the training precision.
Derivation of the formulas:

The areas above +1 and below -1 of a logistic curve with mean µ and scale sare simple expressions, and can be inverted to get simple equations for 2/s and 2µ/s from W and L, which then give the mean and variance of the "true" logistic distribution.

Applying a random walk with n steps for both sides results in the mean and variance
µ = µ_0 - n * E(sidetomove) + n * E(opponent)
var = n * var(sidetomove) + n * var(opponent)

Since we know mean and variance of all 4 involved players (white/black, reference/target players), we can approximate n from the WDL distribution given by the NN (as that reflects the statistical outcomes of training games post deblunder), and calculate the new µ and var for the target precisions. This is then used to calculate the scale s, and the cdf of the transformed logistic distribution is evaluated at +1 and -1 to again get W and L values.

However, in this transformation not all 8 parameters are neded. In fact, as training is between equal opponents anyway, scaling E and var by a constant factor only changes the internal value of n, and only the difference in the mean inaccuracy and the sum of the variances is required anyway, this can be reduced to just 2 parameters, the ratio of (sum of variances) between match and training, and the inaccuracy difference, thus the two parameters ratio and diff.

@Naphthalin Naphthalin added the wip Work in progress label Nov 15, 2022
@Naphthalin Naphthalin added testing required Feature/bug fix needs more testing. Implies not for merge. unoptimized Functional, but not optimized for performance or code style. Implies not for merge. labels Nov 15, 2022
src/mcts/params.cc Outdated Show resolved Hide resolved
} else {
node_to_process->v = v;
node_to_process->d = d;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this new code should be a function, instead of an inlined block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely agree! However, I am not yet worried about the optimization and code style, and only about the functionality. Also, I was wondering whether it is a good idea to do the conversion here, which is also executed on NN cache hits, instead of doing it in a way that the converted values are cached.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see this conversion pushed "deeper" so that search is getting already adjusted NNEvals. This should also make merge with DAG easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you consider "deeper"? Right now, it is as deep as it can get without touching the NN output itself; whenever a new node is created, it gets the (unaltered) WDL from the NN (either by an actual NN GPU eval or by a cache hit), and applies the conversion. Do you suggest to alter the NN WDL output, so we don't have to calculate the conversion multiple times on cache hits?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I suggest to do the WDL adjustment before or on cache insertion so that we don't have to do that (multiple times) while fetching NN evals in search.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have thought about it a bit more, and I don't overly like storing it in the NN cache, as I can easily imagine restarting search with different parameters without clearing the cache. However, the conversion formula now really is lightweight, equivalent to calculating the PUCT score for maybe 5 nodes, but only performed once per addition of a new node. If there is any measureable CPU delay at all and (with DAG) the ratio of (new nodes requesting NN evals) / (cache insertions) is much bigger than 1, it is worth reconsidering, though that would likely mean touching multiple backends, and somehow transferring search parameters there.

@Naphthalin Naphthalin removed the rfc Request for comments label May 8, 2023
throw Exception("Invalid default contempt: " + entry);
}
} else if (parts.size() == 2) {
if (std::search(name.begin(), name.end(), parts[0].begin(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to check that parts[x] is a substring of name, right? (case insensitive).
(are you sure that you need a substring rather than a full match)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; common practice in TCEC and CCC seems to be to add some versioning etc to the name, which we won't know in advance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the original reasoning was to catch something like "Dragon" and "Komodo Dragon".

@Naphthalin Naphthalin merged commit 53b31ae into LeelaChessZero:master May 28, 2023
PikaCat-OuO pushed a commit to official-pikafish/px0 that referenced this pull request May 28, 2023
Adds an Elo based WDL transformation of the NN value head output. Helps with more accurate play at high level (WDL sharpening), more aggressive play against weaker opponents and draw avoiding openings (contempt), piece odds play.

Also adds a new ScoreType `WDL_mu` which follows the new eval convention, where +1.00 means 50% white win chance.

(cherry picked from commit 53b31ae)
} else if (score_type == "WDL_mu") {
// Reports the WDL mu value whenever it is reasonable, and defaults to
// centipawn otherwise.
const float centipawn_fallback_threshold = 0.99f;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Threshold currently has the fallback if W or L drops below 0.5%, which seems to be a bit soon. Should probably be changed to 0.996 or so, which would make it act if W or L drop below 0.2%

@yuzisee
Copy link

yuzisee commented Jun 4, 2023

If using Lc0 in analysis mode on an already-completed armageddon game between two (human players, is there a combination of values to WDLEvalObjectivity and Contempt and ContemptPerspective that could be used to determine "who is winning, and by how much" at any given point in the game?

@Naphthalin
Copy link
Contributor Author

Naphthalin commented Jun 4, 2023

Examples on how to use the PR, pt 3

Armageddon games

Armageddon games are usually played under the following conditions:

  • between human GMs (so, around 2700)
  • in rapid/blitz time control (slightly faster TC than the WDLCalibrationElo assumes)
  • with white having a significant time advantage (i.e. black is modeled as maybe 50-100 Elo weaker)

This could be for example achieved by the following settings:

"WDLDrawRateReference": 0.58, --> for T80
"WDLCalibrationElo": 2700,
"ContemptPerspective": "white",
"Contempt": "100", --> actual white advantage + TC dependent white bonus
"WDLContemptAttenuation": 1.0,
"WDLEvalObjectivity": 0.0, --> highly relevant for contempt: we want to see the internally used WDL
"DrawScoreWhite": -100,
"DrawScoreBlack": 100,
"ScoreType": "centipawn_with_drawscore" or  "WDL_mu"

note: a position with equal chances for white and black will show as approx. 0.00 when using centipawn_with_drawscore, or +1.00 when using WDL_mu. Alternatively there is also the scoretype Q which directly shows the expected score, from +100 (white win) to -100 (draw or black win).

I used these settings to kibitz the KGA Armageddon game between Hikaru and Magnus, and from a "practical chances" perspective the KGA didn't seem like a too bad choice for an Armageddon game :) Just curious, is this also the game you were interested in @yuzisee ?

PS: In the near future, the drawscore implementation will be simplified and only require 1 setting instead of 2 for Armageddon.

@yuzisee
Copy link

yuzisee commented Jun 4, 2023

Just curious, is this also the game you were interested in

How did you know 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants