Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More accurate winning chance weight based on lichess data #11148

Merged
merged 3 commits into from
Jul 9, 2022

Conversation

SnowballSH
Copy link
Contributor

@SnowballSH SnowballSH commented Jul 7, 2022

This new weight for the winning chance formula is trained with scipy.optimize.curve_fit on 75k positions appeared in 2300+ ELO rated rapid games. I filtered out all abandoned games and time forfeits, as well as quick losses/draws (<= 4 moves).

Data is chosen from the 2022 June rated database.

-0.00368208 is used instead of original -0.004.

Although the winning chance formula is modified, I don't think we need to change the accuracy formula.

Meaning to the players:
Less sensitivity to "engine-dislike" opening choices (i.e. +0.2 -> +0.8), and more sensitivity to slightly-winning (+-100 to +-300) positions. This encourages players to find good moves even when they are in a +2 position.

Note:
This value will be closer and closer to -0.002 as the ELO decreases and closer to -0.0038 when elo increases.
This value is trained with scipy, but Tensorflow with MSE also gives a similar result.

Todo:
https://lichess.org/page/accuracy still uses the -0.004 weight. If this PR is merged, it should be changed.

@SnowballSH
Copy link
Contributor Author

SnowballSH commented Jul 7, 2022

Desmos Link for the new equation

n(x) (blue curve) is the new weight. o(x) (red curve) is the current lichess weight (-0.004).

* master: (73 commits)
  new winning chances multiplier for server and client sides
  use typed storage convenience functions
  better type ui/common/storedProp, no functional change intended
  add titles to coordinates time control buttons
  add more simul data to API output - closes lichess-org#11137
  compute API simul JSONs sequentially, as it can be expensive
  fix new puzzle hotkey bypasses streak end - closes lichess-org#11157
  remove unnecesary "." from the list
  fix insights header height
  color insight active filters
  fix insight CSS (:has not available)
  tweak insight CSS
  remove more title="undefined" from multiple select
  remove "undefined" titles from multiple select
  escape HTML attribute
  add multiple-select.ts to ui/insight
  tweak Paginator builder
  use better error code
  delete our copy of minified jquery
  rewrite multiple-select and replace jquery with cash
  ...
@ornicar ornicar merged commit 02242e9 into lichess-org:master Jul 9, 2022
@ornicar
Copy link
Collaborator

ornicar commented Jul 9, 2022

Thanks, I integrated the new multiplier.

Could you share more about how you produced it? Ideally we could re-run the experiment, maybe trying out different filters.

@SnowballSH
Copy link
Contributor Author

Yes, I will post the code here within 24 hours, I'm a little busy at the moment :)

@SnowballSH
Copy link
Contributor Author

SnowballSH commented Jul 10, 2022

Training:

# %%
import math

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from matplotlib import cm
from matplotlib.colors import ListedColormap, LinearSegmentedColormap

# %%

MARKER_SIZE = 0.01

xs = []
ys = []

wdls = ["b", "d", "w"]

with open('data.txt', 'r') as f:
    for line in f:
        eval_, wdl, elo, ply = line.split(",")
        if int(ply) > 10 and 1000 < int(elo) < 4000:
            xs.append(float(eval_))
            ys.append(wdls.index(wdl) - 1)

# Hardcoded values to ensure compatibility with lichess models
xs.append(0)
ys.append(0.0)

xs.append(1000)
ys.append(0.98)

xs.append(-1000)
ys.append(-0.98)

xs.append(500)
ys.append(0.88)

xs.append(-500)
ys.append(-0.88)

n = len(xs)

xs_np = np.array(xs)
ys_np = np.array(ys)


def model_func(x, k):
    return 2 / (np.exp(-k * x) + 1) - 1


sigma = np.ones_like(xs_np)

sigma[n-5] = 0.05
sigma[n-4] = 0.15
sigma[n-3] = 0.15
sigma[n-2] = 0.1
sigma[n-1] = 0.1

opt, pcov = curve_fit(model_func, xs_np, ys_np, sigma=sigma)
k, = opt
print(f"exp(-{k} * x)")

@SnowballSH
Copy link
Contributor Author

Prep:

use std::{env, fs::File, io};
use std::io::Write;

use pgn_reader::{BufferedReader, RawComment, RawHeader, SanPlus, Skip, Visitor};
use shakmaty::{Chess, Position};

use regex::Regex;

#[macro_use]
extern crate lazy_static;

lazy_static! {
    pub static ref RE: Regex = Regex::new(r"\[%eval (-?\d+\.\d*)]").unwrap();
}

#[derive(Copy, Clone, Debug, PartialEq, Eq)]
enum WDL {
    White,
    Draw,
    Black,
}

impl WDL {
    fn format(&self) -> &'static str {
        match self {
            WDL::White => "w",
            WDL::Draw => "d",
            WDL::Black => "b",
        }
    }

    fn val(&self) -> i32 {
        match self {
            WDL::White => 1,
            WDL::Draw => 0,
            WDL::Black => -1,
        }
    }
}

#[derive(Debug)]
struct DataPoint {
    eval: i32,
    wdl: WDL,
    avg_elo: u32,
    ply: u32,
}

impl DataPoint {
    fn format(&self) -> String {
        format!("{},{},{},{}", self.eval, self.wdl.format(), self.avg_elo, self.ply)
    }
}

struct Validator {
    games: usize,
    board: Chess,
    is_valid: bool,
    white_elo: u32,
    black_elo: u32,
    result: WDL,
    data: Vec<DataPoint>,
}

impl Validator {
    fn new() -> Validator {
        Validator {
            games: 0,
            board: Chess::default(),
            is_valid: true,
            white_elo: 0,
            black_elo: 0,
            result: WDL::Draw,
            data: Vec::with_capacity(1024),
        }
    }
}

impl Visitor for Validator {
    type Result = ();

    fn begin_game(&mut self) {
        self.games += 1;
        self.board = Chess::default();
        self.is_valid = true;
        self.white_elo = 0;
        self.black_elo = 0;
        self.result = WDL::Draw;
    }

    fn header(&mut self, key: &[u8], value: RawHeader<'_>) {
        // Support games from a non-standard starting position.
        match key {
            b"FEN" => {
                self.is_valid = false;
            }
            b"WhiteElo" => {
                self.white_elo = String::from_utf8_lossy(value.0).parse().unwrap();
            }
            b"BlackElo" => {
                self.black_elo = String::from_utf8_lossy(value.0).parse().unwrap();
            }
            b"Result" => {
                self.result = match value.0 {
                    b"1-0" => WDL::White,
                    b"0-1" => WDL::Black,
                    _ => WDL::Draw,
                };
            }
            b"Termination" => {
                let r = String::from_utf8_lossy(value.0).to_ascii_lowercase();
                self.is_valid = r != "time forfeit" && r != "abandoned";
            }
            b"TimeControl" => {
                let t = String::from_utf8_lossy(value.0).split('+').next().unwrap().parse::<i32>();
                if let Ok(t) = t {
                    if t < 8 * 60 {
                        self.is_valid = false;
                    }
                } else {
                    self.is_valid = false;
                }
            }
            _ => {}
        }
    }

    fn end_headers(&mut self) -> Skip {
        Skip(!self.is_valid)
    }

    fn san(&mut self, san_plus: SanPlus) {
        if self.is_valid {
            self.board.play_unchecked(&san_plus.san.to_move(&self.board).unwrap());
            self.data.push(DataPoint {
                eval: 0,
                wdl: self.result,
                avg_elo: (self.white_elo + self.black_elo) / 2,
                ply: u32::from(self.board.fullmoves()),
            })
        }
    }

    fn comment(&mut self, comment: RawComment<'_>) {
        if self.is_valid {
            let comment_s = String::from_utf8_lossy(comment.0);
            let comment_s = comment_s.trim();

            let mt = RE.captures(comment_s);
            if let Some(r) = mt {
                let e: f32 = r[1].parse().unwrap();
                if !(-15.0..=15.0).contains(&e) {
                    self.data.pop();
                } else {
                    self.data.last_mut().unwrap().eval = (e * 100.0) as i32;
                }
            } else {
                self.data.pop();
            }
        }
    }

    fn begin_variation(&mut self) -> Skip {
        Skip(true)
    }

    fn end_game(&mut self) -> Self::Result {}
}

fn main() -> Result<(), io::Error> {
    for arg in env::args().skip(1) {
        let file = File::open(&arg).expect("fopen");

        let uncompressed: Box<dyn io::Read> = if arg.ends_with(".bz2") {
            Box::new(bzip2::read::MultiBzDecoder::new(file))
        } else {
            Box::new(file)
        };

        let data_file = File::create("./data.txt").unwrap();
        let mut data_writer = io::BufWriter::new(data_file);

        let mut reader = BufferedReader::new(uncompressed);

        let mut stats = Validator::new();
        const GAMES: usize = 500000;
        const WRITE_EVERY: usize = 8000;
        for g in 1..=GAMES {
            reader.read_game(&mut stats)?;
            if g % WRITE_EVERY == 0 {
                for d in stats.data.iter() {
                    data_writer.write_all(d.format().as_bytes()).unwrap();
                    data_writer.write_all(b"\n").unwrap();
                }
                stats.data.clear();
            }
        }
    }

    Ok(())
}

@Dboingue
Copy link

Dboingue commented Jul 15, 2022

I can't figure out where the data is, or the methodology used to give the linked equations. I may not be at the right place to ask this. And statistical data analysis behind code design is not subject to open source code transparency spirit. right. between open data and open source code their is a hole, where as long as the code can be implemented, its math or statistical foundation can be just refered to, and the conclusion summarized, without obligation to provide enough methodolical information to independently reproduce the results or just examine for oneself.

This is a general question about data analysis and scientific process I have about many things I have seen in chess engine development. So it may not really apply here, but i formalize it anyway. I may have jumped in the middle of the data analysis chain of communicatoins..

can you redirect me.. and confirm my assumptions about the possible hole between open data and open source code, from a scientific reproducibility perspective.. if you care.. not a real github issue i guess.

the code does not help me; i am not a real coder... or it would be a lot of work just to get to the basic data analysis. i did not try to read the code.. should i? I read training somewhere. Was there some machine learning set up? should i induce it from the code?

To help you. I am most curious about the relationship or definitions involving winning odds of a position given a game and a pair of ratings for that game ( i see you have fixed both time interval of data set, and average pair rating, right). Now, how is the question posed in terms of position SF score, and game outcome. how are games being sampled for their positions, are all positions of a game part of the equation construction (where does training come into the picture?). Anything of that nature that would not need me to decrypt the code, or even if the code contains such information.. I don't know where i am asking in the whole pipeliine here.. please forgive me if hair in the soup....

@Dboingue
Copy link

Dboingue commented Jul 15, 2022

and lastly. do people really prefer equation with numbers in them instead of symbolic parameters that could be retraced thourhg the data analysis protocol for their meaning. Again, this is not only from here that I ask myself that... don't take it too specific. just that computer need numbers in the end, and that seems to make getting the math picture really difficult and often a forgotten concern in equation sharing.... but I think i am missing how things are usually done. missing a central place where to expect some packaged description of the math level procedure... self contained. dev.s must know where to go to gather that picture... i am trying to get to it here.

@ornicar
Copy link
Collaborator

ornicar commented Jul 15, 2022

I agree a full scientific reproducible method would be nice, although not strictly necessary, and I'm already very grateful to @SnowballSH for their work.

If you like, you can use the open lichess games database to come up with your own method and winning chances equations, then publish it for everyone else to reproduce.

@SnowballSH
Copy link
Contributor Author

SnowballSH commented Jul 15, 2022

I totally understand your feeling. The current mean square loss is about 0.6 which can definitely be reduced.
A deeper neural network that takes in game ply, elo, and time will definitely result in an accurate model. However I think such a model will be overkill...
I'm not professional at data analysis so I don't know about the scientific steps. And agreeing to Onicar's comment, feel free to discuss an alternative here and I'd love to hear it.

@SandroMartens
Copy link

>

Prep:

use std::{env, fs::File, io};
use std::io::Write;

use pgn_reader::{BufferedReader, RawComment, RawHeader, SanPlus, Skip, Visitor};
use shakmaty::{Chess, Position};

use regex::Regex;

#[macro_use]
extern crate lazy_static;

lazy_static! {
    pub static ref RE: Regex = Regex::new(r"\[%eval (-?\d+\.\d*)]").unwrap();
}

#[derive(Copy, Clone, Debug, PartialEq, Eq)]
enum WDL {
    White,
    Draw,
    Black,
}

impl WDL {
    fn format(&self) -> &'static str {
        match self {
            WDL::White => "w",
            WDL::Draw => "d",
            WDL::Black => "b",
        }
    }

    fn val(&self) -> i32 {
        match self {
            WDL::White => 1,
            WDL::Draw => 0,
            WDL::Black => -1,
        }
    }
}

#[derive(Debug)]
struct DataPoint {
    eval: i32,
    wdl: WDL,
    avg_elo: u32,
    ply: u32,
}

impl DataPoint {
    fn format(&self) -> String {
        format!("{},{},{},{}", self.eval, self.wdl.format(), self.avg_elo, self.ply)
    }
}

struct Validator {
    games: usize,
    board: Chess,
    is_valid: bool,
    white_elo: u32,
    black_elo: u32,
    result: WDL,
    data: Vec<DataPoint>,
}

impl Validator {
    fn new() -> Validator {
        Validator {
            games: 0,
            board: Chess::default(),
            is_valid: true,
            white_elo: 0,
            black_elo: 0,
            result: WDL::Draw,
            data: Vec::with_capacity(1024),
        }
    }
}

impl Visitor for Validator {
    type Result = ();

    fn begin_game(&mut self) {
        self.games += 1;
        self.board = Chess::default();
        self.is_valid = true;
        self.white_elo = 0;
        self.black_elo = 0;
        self.result = WDL::Draw;
    }

    fn header(&mut self, key: &[u8], value: RawHeader<'_>) {
        // Support games from a non-standard starting position.
        match key {
            b"FEN" => {
                self.is_valid = false;
            }
            b"WhiteElo" => {
                self.white_elo = String::from_utf8_lossy(value.0).parse().unwrap();
            }
            b"BlackElo" => {
                self.black_elo = String::from_utf8_lossy(value.0).parse().unwrap();
            }
            b"Result" => {
                self.result = match value.0 {
                    b"1-0" => WDL::White,
                    b"0-1" => WDL::Black,
                    _ => WDL::Draw,
                };
            }
            b"Termination" => {
                let r = String::from_utf8_lossy(value.0).to_ascii_lowercase();
                self.is_valid = r != "time forfeit" && r != "abandoned";
            }
            b"TimeControl" => {
                let t = String::from_utf8_lossy(value.0).split('+').next().unwrap().parse::<i32>();
                if let Ok(t) = t {
                    if t < 8 * 60 {
                        self.is_valid = false;
                    }
                } else {
                    self.is_valid = false;
                }
            }
            _ => {}
        }
    }

    fn end_headers(&mut self) -> Skip {
        Skip(!self.is_valid)
    }

    fn san(&mut self, san_plus: SanPlus) {
        if self.is_valid {
            self.board.play_unchecked(&san_plus.san.to_move(&self.board).unwrap());
            self.data.push(DataPoint {
                eval: 0,
                wdl: self.result,
                avg_elo: (self.white_elo + self.black_elo) / 2,
                ply: u32::from(self.board.fullmoves()),
            })
        }
    }

    fn comment(&mut self, comment: RawComment<'_>) {
        if self.is_valid {
            let comment_s = String::from_utf8_lossy(comment.0);
            let comment_s = comment_s.trim();

            let mt = RE.captures(comment_s);
            if let Some(r) = mt {
                let e: f32 = r[1].parse().unwrap();
                if !(-15.0..=15.0).contains(&e) {
                    self.data.pop();
                } else {
                    self.data.last_mut().unwrap().eval = (e * 100.0) as i32;
                }
            } else {
                self.data.pop();
            }
        }
    }

    fn begin_variation(&mut self) -> Skip {
        Skip(true)
    }

    fn end_game(&mut self) -> Self::Result {}
}

fn main() -> Result<(), io::Error> {
    for arg in env::args().skip(1) {
        let file = File::open(&arg).expect("fopen");

        let uncompressed: Box<dyn io::Read> = if arg.ends_with(".bz2") {
            Box::new(bzip2::read::MultiBzDecoder::new(file))
        } else {
            Box::new(file)
        };

        let data_file = File::create("./data.txt").unwrap();
        let mut data_writer = io::BufWriter::new(data_file);

        let mut reader = BufferedReader::new(uncompressed);

        let mut stats = Validator::new();
        const GAMES: usize = 500000;
        const WRITE_EVERY: usize = 8000;
        for g in 1..=GAMES {
            reader.read_game(&mut stats)?;
            if g % WRITE_EVERY == 0 {
                for d in stats.data.iter() {
                    data_writer.write_all(d.format().as_bytes()).unwrap();
                    data_writer.write_all(b"\n").unwrap();
                }
                stats.data.clear();
            }
        }
    }

    Ok(())
}
Can you explain what that code does? I understand Python, but I have no idea what even is this second language lmao.

I have some ideas how to do a regression and I want to compare that.

@SnowballSH
Copy link
Contributor Author

>

Prep:

use std::{env, fs::File, io};

use std::io::Write;

use pgn_reader::{BufferedReader, RawComment, RawHeader, SanPlus, Skip, Visitor};

use shakmaty::{Chess, Position};

use regex::Regex;

#[macro_use]

extern crate lazy_static;

lazy_static! {

pub static ref RE: Regex = Regex::new(r"\[%eval (-?\d+\.\d*)]").unwrap();

}

#[derive(Copy, Clone, Debug, PartialEq, Eq)]

enum WDL {

White,
Draw,
Black,

}

impl WDL {

fn format(&self) -> &'static str {
    match self {
        WDL::White => "w",
        WDL::Draw => "d",
        WDL::Black => "b",
    }
}
fn val(&self) -> i32 {
    match self {
        WDL::White => 1,
        WDL::Draw => 0,
        WDL::Black => -1,
    }
}

}

#[derive(Debug)]

struct DataPoint {

eval: i32,
wdl: WDL,
avg_elo: u32,
ply: u32,

}

impl DataPoint {

fn format(&self) -> String {
    format!("{},{},{},{}", self.eval, self.wdl.format(), self.avg_elo, self.ply)
}

}

struct Validator {

games: usize,
board: Chess,
is_valid: bool,
white_elo: u32,
black_elo: u32,
result: WDL,
data: Vec<DataPoint>,

}

impl Validator {

fn new() -> Validator {
    Validator {
        games: 0,
        board: Chess::default(),
        is_valid: true,
        white_elo: 0,
        black_elo: 0,
        result: WDL::Draw,
        data: Vec::with_capacity(1024),
    }
}

}

impl Visitor for Validator {

type Result = ();
fn begin_game(&mut self) {
    self.games += 1;
    self.board = Chess::default();
    self.is_valid = true;
    self.white_elo = 0;
    self.black_elo = 0;
    self.result = WDL::Draw;
}
fn header(&mut self, key: &[u8], value: RawHeader<'_>) {
    // Support games from a non-standard starting position.
    match key {
        b"FEN" => {
            self.is_valid = false;
        }
        b"WhiteElo" => {
            self.white_elo = String::from_utf8_lossy(value.0).parse().unwrap();
        }
        b"BlackElo" => {
            self.black_elo = String::from_utf8_lossy(value.0).parse().unwrap();
        }
        b"Result" => {
            self.result = match value.0 {
                b"1-0" => WDL::White,
                b"0-1" => WDL::Black,
                _ => WDL::Draw,
            };
        }
        b"Termination" => {
            let r = String::from_utf8_lossy(value.0).to_ascii_lowercase();
            self.is_valid = r != "time forfeit" && r != "abandoned";
        }
        b"TimeControl" => {
            let t = String::from_utf8_lossy(value.0).split('+').next().unwrap().parse::<i32>();
            if let Ok(t) = t {
                if t < 8 * 60 {
                    self.is_valid = false;
                }
            } else {
                self.is_valid = false;
            }
        }
        _ => {}
    }
}
fn end_headers(&mut self) -> Skip {
    Skip(!self.is_valid)
}
fn san(&mut self, san_plus: SanPlus) {
    if self.is_valid {
        self.board.play_unchecked(&san_plus.san.to_move(&self.board).unwrap());
        self.data.push(DataPoint {
            eval: 0,
            wdl: self.result,
            avg_elo: (self.white_elo + self.black_elo) / 2,
            ply: u32::from(self.board.fullmoves()),
        })
    }
}
fn comment(&mut self, comment: RawComment<'_>) {
    if self.is_valid {
        let comment_s = String::from_utf8_lossy(comment.0);
        let comment_s = comment_s.trim();
        let mt = RE.captures(comment_s);
        if let Some(r) = mt {
            let e: f32 = r[1].parse().unwrap();
            if !(-15.0..=15.0).contains(&e) {
                self.data.pop();
            } else {
                self.data.last_mut().unwrap().eval = (e * 100.0) as i32;
            }
        } else {
            self.data.pop();
        }
    }
}
fn begin_variation(&mut self) -> Skip {
    Skip(true)
}
fn end_game(&mut self) -> Self::Result {}

}

fn main() -> Result<(), io::Error> {

for arg in env::args().skip(1) {
    let file = File::open(&arg).expect("fopen");
    let uncompressed: Box<dyn io::Read> = if arg.ends_with(".bz2") {
        Box::new(bzip2::read::MultiBzDecoder::new(file))
    } else {
        Box::new(file)
    };
    let data_file = File::create("./data.txt").unwrap();
    let mut data_writer = io::BufWriter::new(data_file);
    let mut reader = BufferedReader::new(uncompressed);
    let mut stats = Validator::new();
    const GAMES: usize = 500000;
    const WRITE_EVERY: usize = 8000;
    for g in 1..=GAMES {
        reader.read_game(&mut stats)?;
        if g % WRITE_EVERY == 0 {
            for d in stats.data.iter() {
                data_writer.write_all(d.format().as_bytes()).unwrap();
                data_writer.write_all(b"\n").unwrap();
            }
            stats.data.clear();
        }
    }
}
Ok(())

}

Can you explain what that code does? I understand Python, but I have no idea what even is this second language lmao.

I have some ideas how to do a regression and I want to compare that.

You don't have to worry about it -- it just extracts eval and WDL data from compressed PGN files downloaded from lichess database.

@jazzzooo
Copy link

jazzzooo commented Nov 15, 2022

If @SandroMartens or anyone else wants to try improving the regression, ive made a much shorter python script that will keep games with computer analysis. It's fast enough to be bottlenecked by your internet speed. Simply run:
curl -s https://database.lichess.org/standard/lichess_db_standard_rated_2022-10.pgn.zst --output - | pzstd -d | python filter.py

from sys import stdin
from time import perf_counter as t

nl, kept, count = 0, 0, 0
game = ""
total = 92629656
keep = True
start = t()
f = open("out", "w")

while line := stdin.readline():
    nl += line == "\n"
    game += line
    if line[:9] in ("[WhiteElo", "[BlackElo"):
        # example of filtering by rating
        keep = keep and abs(int(line.split('"')[1]) - 2000) < 100
    elif line[:12] == "[TimeControl":
        # example of filtering by time control
        keep = keep and line == '[TimeControl "180+0"]\n'

    if nl == 2:
        if keep and "%eval" in game:
            f.write(game)
            kept += 1
        nl = 0
        game = ""
        keep = True
        count += 1
        if not count % 10000:
            print(
                f"\r{count/(t()-start):.2f} games/s {count*100/total:.2f}% {kept*100/count:.3f}% kept",
                end="",
            )
f.close()

@jazzzooo
Copy link

chart (34)
This is not based on a lot of data, also I used average score instead of win%, so this comparison is not apples to apples. Someone might wanna run a more in depth analysis

@SnowballSH
Copy link
Contributor Author

I think you should analyze games with increments since time trouble s a big factor in playing strength/accuracy.

@SnowballSH
Copy link
Contributor Author

Also, filter out games where players lost on time (see my rust code for the filters I used). A eval of 500cp but lost on time is probably not a good data.

@SandroMartens
Copy link

I think you should analyze games with increments since time trouble s a big factor in playing strength/accuracy.

I did a quick test and found that >80% of games are played in 3+0, so i dont think it makes much of a difference. At least i didn't include time control as a feature. Gonna post the rest tomorrow i think

@jazzzooo
Copy link

@SnowballSH good points, I did grab all 3+2 games too because i think those have the best quantity-quality ratio on lichess. But I don't plan to analyze the data right now, so you're welcome to try getting more and better data and trying it yourself. Also I recommend filtering out games where the rating diff after the game is >20, thats only 5% of the games and will be players with unstable ratings.

@SnowballSH
Copy link
Contributor Author

For anyone interested, I think you should use my rust code if you have rust on your machine since the library is created by Lila devs (forgot which) and parses directly from the file without needing to extract it. And it's faster.
Otherwise use the python script presented above.

@SnowballSH
Copy link
Contributor Author

However I think there is a better way to compute winning chance than this formula. Maybe lila should use a neural network model that takes in elo, phase, (maybe time) too. For anyone ambitious enough to write a neural network model :)

@SandroMartens
Copy link

SandroMartens commented Nov 16, 2022

Here's my try:
https://github.com/SandroMartens/Lichess-Win-Predictions/blob/master/readme.md
(I think I broke the Github Markdown renderer lmao)

However I think there is a better way to compute winning chance than this formula. Maybe lila should use a neural network model that takes in elo, phase, (maybe time) too. For anyone ambitious enough to write a neural network model :)

I think it's a good idea to keep the model simple so its easy to explain.

@lucasart
Copy link

lucasart commented Aug 26, 2023

@SnowballSH excellent work! Would it be possible to update your analysis with a lower rating cutoff ? for example 2000 instead of 2300.

My experience (looking at server analysis of my games), is that the formula compresses scores too much for lower rated players (>99% of population), and i would like it to follow centipawns more. Sure, there should be compression: a score of +8 doesn't double win proba compared to +4. But, it makes a big difference in practice at amateur level. When I go from +8 to +4 by missing the free piece that was given, I want to see it flagged as an error (or blunder), not an inaccurary. On the other hand, I get nitpicked by some obscure difference between -0.2 and +0.5 in openings flagged as inaccuracies, which is confusing at my level (1750 blitz), where these moves are just fine.

@SnowballSH
Copy link
Contributor Author

It is surely possible - unfortunately it has been about 10 months since I last worked on this script so I probably forgot the entire setup and everything I did. The method probably very outdated as well as it is very simple.
I think you should open a separate issue for lila to advocate for "accuracy based on rating" feature.
Thanks!

@SnowballSH
Copy link
Contributor Author

I looked into this a bit. The question arises, that the definition of "inaccuracy" is always subjective to the players, so we have to use an objective method to determine it. When you realize your win chance decreases as an amateur after you drop a piece in +8 position, another amateur might have seen a line where the sacrifice might lead to a easier win for humans. There seems to be too many variables for this particular issue, so I don't think it is very practical to implement.

@SnowballSH excellent work! Would it be possible to update your analysis with a lower rating cutoff ? for example 2000 instead of 2300.

My experience (looking at server analysis of my games), is that the formula compresses scores too much for lower rated players (>99% of population), and i would like it to follow centipawns more. Sure, there should be compression: a score of +8 doesn't double win proba compared to +4. But, it makes a big difference in practice at amateur level. When I go from +8 to +4 by missing the free piece that was given, I want to see it flagged as an error (or blunder), not an inaccurary. On the other hand, I get nitpicked by some obscure difference between -0.2 and +0.5 in openings flagged as inaccuracies, which is confusing at my level (1750 blitz), where these moves are just fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants