-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More accurate winning chance weight based on lichess data #11148
Conversation
Desmos Link for the new equation n(x) (blue curve) is the new weight. o(x) (red curve) is the current lichess weight (-0.004). |
* master: (73 commits) new winning chances multiplier for server and client sides use typed storage convenience functions better type ui/common/storedProp, no functional change intended add titles to coordinates time control buttons add more simul data to API output - closes lichess-org#11137 compute API simul JSONs sequentially, as it can be expensive fix new puzzle hotkey bypasses streak end - closes lichess-org#11157 remove unnecesary "." from the list fix insights header height color insight active filters fix insight CSS (:has not available) tweak insight CSS remove more title="undefined" from multiple select remove "undefined" titles from multiple select escape HTML attribute add multiple-select.ts to ui/insight tweak Paginator builder use better error code delete our copy of minified jquery rewrite multiple-select and replace jquery with cash ...
Thanks, I integrated the new multiplier. Could you share more about how you produced it? Ideally we could re-run the experiment, maybe trying out different filters. |
Yes, I will post the code here within 24 hours, I'm a little busy at the moment :) |
Training: # %%
import math
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from matplotlib import cm
from matplotlib.colors import ListedColormap, LinearSegmentedColormap
# %%
MARKER_SIZE = 0.01
xs = []
ys = []
wdls = ["b", "d", "w"]
with open('data.txt', 'r') as f:
for line in f:
eval_, wdl, elo, ply = line.split(",")
if int(ply) > 10 and 1000 < int(elo) < 4000:
xs.append(float(eval_))
ys.append(wdls.index(wdl) - 1)
# Hardcoded values to ensure compatibility with lichess models
xs.append(0)
ys.append(0.0)
xs.append(1000)
ys.append(0.98)
xs.append(-1000)
ys.append(-0.98)
xs.append(500)
ys.append(0.88)
xs.append(-500)
ys.append(-0.88)
n = len(xs)
xs_np = np.array(xs)
ys_np = np.array(ys)
def model_func(x, k):
return 2 / (np.exp(-k * x) + 1) - 1
sigma = np.ones_like(xs_np)
sigma[n-5] = 0.05
sigma[n-4] = 0.15
sigma[n-3] = 0.15
sigma[n-2] = 0.1
sigma[n-1] = 0.1
opt, pcov = curve_fit(model_func, xs_np, ys_np, sigma=sigma)
k, = opt
print(f"exp(-{k} * x)") |
Prep: use std::{env, fs::File, io};
use std::io::Write;
use pgn_reader::{BufferedReader, RawComment, RawHeader, SanPlus, Skip, Visitor};
use shakmaty::{Chess, Position};
use regex::Regex;
#[macro_use]
extern crate lazy_static;
lazy_static! {
pub static ref RE: Regex = Regex::new(r"\[%eval (-?\d+\.\d*)]").unwrap();
}
#[derive(Copy, Clone, Debug, PartialEq, Eq)]
enum WDL {
White,
Draw,
Black,
}
impl WDL {
fn format(&self) -> &'static str {
match self {
WDL::White => "w",
WDL::Draw => "d",
WDL::Black => "b",
}
}
fn val(&self) -> i32 {
match self {
WDL::White => 1,
WDL::Draw => 0,
WDL::Black => -1,
}
}
}
#[derive(Debug)]
struct DataPoint {
eval: i32,
wdl: WDL,
avg_elo: u32,
ply: u32,
}
impl DataPoint {
fn format(&self) -> String {
format!("{},{},{},{}", self.eval, self.wdl.format(), self.avg_elo, self.ply)
}
}
struct Validator {
games: usize,
board: Chess,
is_valid: bool,
white_elo: u32,
black_elo: u32,
result: WDL,
data: Vec<DataPoint>,
}
impl Validator {
fn new() -> Validator {
Validator {
games: 0,
board: Chess::default(),
is_valid: true,
white_elo: 0,
black_elo: 0,
result: WDL::Draw,
data: Vec::with_capacity(1024),
}
}
}
impl Visitor for Validator {
type Result = ();
fn begin_game(&mut self) {
self.games += 1;
self.board = Chess::default();
self.is_valid = true;
self.white_elo = 0;
self.black_elo = 0;
self.result = WDL::Draw;
}
fn header(&mut self, key: &[u8], value: RawHeader<'_>) {
// Support games from a non-standard starting position.
match key {
b"FEN" => {
self.is_valid = false;
}
b"WhiteElo" => {
self.white_elo = String::from_utf8_lossy(value.0).parse().unwrap();
}
b"BlackElo" => {
self.black_elo = String::from_utf8_lossy(value.0).parse().unwrap();
}
b"Result" => {
self.result = match value.0 {
b"1-0" => WDL::White,
b"0-1" => WDL::Black,
_ => WDL::Draw,
};
}
b"Termination" => {
let r = String::from_utf8_lossy(value.0).to_ascii_lowercase();
self.is_valid = r != "time forfeit" && r != "abandoned";
}
b"TimeControl" => {
let t = String::from_utf8_lossy(value.0).split('+').next().unwrap().parse::<i32>();
if let Ok(t) = t {
if t < 8 * 60 {
self.is_valid = false;
}
} else {
self.is_valid = false;
}
}
_ => {}
}
}
fn end_headers(&mut self) -> Skip {
Skip(!self.is_valid)
}
fn san(&mut self, san_plus: SanPlus) {
if self.is_valid {
self.board.play_unchecked(&san_plus.san.to_move(&self.board).unwrap());
self.data.push(DataPoint {
eval: 0,
wdl: self.result,
avg_elo: (self.white_elo + self.black_elo) / 2,
ply: u32::from(self.board.fullmoves()),
})
}
}
fn comment(&mut self, comment: RawComment<'_>) {
if self.is_valid {
let comment_s = String::from_utf8_lossy(comment.0);
let comment_s = comment_s.trim();
let mt = RE.captures(comment_s);
if let Some(r) = mt {
let e: f32 = r[1].parse().unwrap();
if !(-15.0..=15.0).contains(&e) {
self.data.pop();
} else {
self.data.last_mut().unwrap().eval = (e * 100.0) as i32;
}
} else {
self.data.pop();
}
}
}
fn begin_variation(&mut self) -> Skip {
Skip(true)
}
fn end_game(&mut self) -> Self::Result {}
}
fn main() -> Result<(), io::Error> {
for arg in env::args().skip(1) {
let file = File::open(&arg).expect("fopen");
let uncompressed: Box<dyn io::Read> = if arg.ends_with(".bz2") {
Box::new(bzip2::read::MultiBzDecoder::new(file))
} else {
Box::new(file)
};
let data_file = File::create("./data.txt").unwrap();
let mut data_writer = io::BufWriter::new(data_file);
let mut reader = BufferedReader::new(uncompressed);
let mut stats = Validator::new();
const GAMES: usize = 500000;
const WRITE_EVERY: usize = 8000;
for g in 1..=GAMES {
reader.read_game(&mut stats)?;
if g % WRITE_EVERY == 0 {
for d in stats.data.iter() {
data_writer.write_all(d.format().as_bytes()).unwrap();
data_writer.write_all(b"\n").unwrap();
}
stats.data.clear();
}
}
}
Ok(())
} |
I can't figure out where the data is, or the methodology used to give the linked equations. I may not be at the right place to ask this. And statistical data analysis behind code design is not subject to open source code transparency spirit. right. between open data and open source code their is a hole, where as long as the code can be implemented, its math or statistical foundation can be just refered to, and the conclusion summarized, without obligation to provide enough methodolical information to independently reproduce the results or just examine for oneself. This is a general question about data analysis and scientific process I have about many things I have seen in chess engine development. So it may not really apply here, but i formalize it anyway. I may have jumped in the middle of the data analysis chain of communicatoins.. can you redirect me.. and confirm my assumptions about the possible hole between open data and open source code, from a scientific reproducibility perspective.. if you care.. not a real github issue i guess. the code does not help me; i am not a real coder... or it would be a lot of work just to get to the basic data analysis. i did not try to read the code.. should i? I read training somewhere. Was there some machine learning set up? should i induce it from the code? To help you. I am most curious about the relationship or definitions involving winning odds of a position given a game and a pair of ratings for that game ( i see you have fixed both time interval of data set, and average pair rating, right). Now, how is the question posed in terms of position SF score, and game outcome. how are games being sampled for their positions, are all positions of a game part of the equation construction (where does training come into the picture?). Anything of that nature that would not need me to decrypt the code, or even if the code contains such information.. I don't know where i am asking in the whole pipeliine here.. please forgive me if hair in the soup.... |
and lastly. do people really prefer equation with numbers in them instead of symbolic parameters that could be retraced thourhg the data analysis protocol for their meaning. Again, this is not only from here that I ask myself that... don't take it too specific. just that computer need numbers in the end, and that seems to make getting the math picture really difficult and often a forgotten concern in equation sharing.... but I think i am missing how things are usually done. missing a central place where to expect some packaged description of the math level procedure... self contained. dev.s must know where to go to gather that picture... i am trying to get to it here. |
I agree a full scientific reproducible method would be nice, although not strictly necessary, and I'm already very grateful to @SnowballSH for their work. If you like, you can use the open lichess games database to come up with your own method and winning chances equations, then publish it for everyone else to reproduce. |
I totally understand your feeling. The current mean square loss is about 0.6 which can definitely be reduced. |
>
Can you explain what that code does? I understand Python, but I have no idea what even is this second language lmao.
I have some ideas how to do a regression and I want to compare that. |
You don't have to worry about it -- it just extracts eval and WDL data from compressed PGN files downloaded from lichess database. |
If @SandroMartens or anyone else wants to try improving the regression, ive made a much shorter python script that will keep games with computer analysis. It's fast enough to be bottlenecked by your internet speed. Simply run: from sys import stdin
from time import perf_counter as t
nl, kept, count = 0, 0, 0
game = ""
total = 92629656
keep = True
start = t()
f = open("out", "w")
while line := stdin.readline():
nl += line == "\n"
game += line
if line[:9] in ("[WhiteElo", "[BlackElo"):
# example of filtering by rating
keep = keep and abs(int(line.split('"')[1]) - 2000) < 100
elif line[:12] == "[TimeControl":
# example of filtering by time control
keep = keep and line == '[TimeControl "180+0"]\n'
if nl == 2:
if keep and "%eval" in game:
f.write(game)
kept += 1
nl = 0
game = ""
keep = True
count += 1
if not count % 10000:
print(
f"\r{count/(t()-start):.2f} games/s {count*100/total:.2f}% {kept*100/count:.3f}% kept",
end="",
)
f.close() |
I think you should analyze games with increments since time trouble s a big factor in playing strength/accuracy. |
Also, filter out games where players lost on time (see my rust code for the filters I used). A eval of 500cp but lost on time is probably not a good data. |
I did a quick test and found that >80% of games are played in 3+0, so i dont think it makes much of a difference. At least i didn't include time control as a feature. Gonna post the rest tomorrow i think |
@SnowballSH good points, I did grab all 3+2 games too because i think those have the best quantity-quality ratio on lichess. But I don't plan to analyze the data right now, so you're welcome to try getting more and better data and trying it yourself. Also I recommend filtering out games where the rating diff after the game is >20, thats only 5% of the games and will be players with unstable ratings. |
For anyone interested, I think you should use my rust code if you have rust on your machine since the library is created by Lila devs (forgot which) and parses directly from the file without needing to extract it. And it's faster. |
However I think there is a better way to compute winning chance than this formula. Maybe lila should use a neural network model that takes in elo, phase, (maybe time) too. For anyone ambitious enough to write a neural network model :) |
Here's my try:
I think it's a good idea to keep the model simple so its easy to explain. |
@SnowballSH excellent work! Would it be possible to update your analysis with a lower rating cutoff ? for example 2000 instead of 2300. My experience (looking at server analysis of my games), is that the formula compresses scores too much for lower rated players (>99% of population), and i would like it to follow centipawns more. Sure, there should be compression: a score of +8 doesn't double win proba compared to +4. But, it makes a big difference in practice at amateur level. When I go from +8 to +4 by missing the free piece that was given, I want to see it flagged as an error (or blunder), not an inaccurary. On the other hand, I get nitpicked by some obscure difference between -0.2 and +0.5 in openings flagged as inaccuracies, which is confusing at my level (1750 blitz), where these moves are just fine. |
It is surely possible - unfortunately it has been about 10 months since I last worked on this script so I probably forgot the entire setup and everything I did. The method probably very outdated as well as it is very simple. |
I looked into this a bit. The question arises, that the definition of "inaccuracy" is always subjective to the players, so we have to use an objective method to determine it. When you realize your win chance decreases as an amateur after you drop a piece in +8 position, another amateur might have seen a line where the sacrifice might lead to a easier win for humans. There seems to be too many variables for this particular issue, so I don't think it is very practical to implement.
|
This new weight for the winning chance formula is trained with
scipy.optimize.curve_fit
on 75k positions appeared in 2300+ ELO rated rapid games. I filtered out all abandoned games and time forfeits, as well as quick losses/draws (<= 4 moves).Data is chosen from the 2022 June rated database.
-0.00368208
is used instead of original-0.004
.Although the winning chance formula is modified, I don't think we need to change the accuracy formula.
Meaning to the players:
Less sensitivity to "engine-dislike" opening choices (i.e. +0.2 -> +0.8), and more sensitivity to slightly-winning (+-100 to +-300) positions. This encourages players to find good moves even when they are in a +2 position.
Note:
This value will be closer and closer to -0.002 as the ELO decreases and closer to -0.0038 when elo increases.
This value is trained with scipy, but Tensorflow with MSE also gives a similar result.
Todo:
https://lichess.org/page/accuracy still uses the -0.004 weight. If this PR is merged, it should be changed.