Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running split_by_player.py #1

Open
DavidAAbbott opened this issue Apr 7, 2023 · 4 comments
Open

Error when running split_by_player.py #1

DavidAAbbott opened this issue Apr 7, 2023 · 4 comments

Comments

@DavidAAbbott
Copy link

No matter which .pgn file I try, I seem to get various errors after running "9-pgn_to_training_data.sh" which in turn runs "split_by_player.py".

Here is the error output when trying to use my own Lichess games:

2023-04-07 06:38:39 split_by_player.py finndave.pgn finndave output/split/games 2023-04-07 06:38:39 Starting split_by_player 2023-04-07 06:38:39 Error encounteredlayers from finndave.pgn Traceback (most recent call last): File "split_by_player.py", line 48, in <module> main() File "/home/david/anaconda3/envs/transfer_chess/lib/python3.7/site-packages/backend-1.0.0-py3.7.egg/backend/utils.py", line 112, in wrapped_main val = mainFunc(*args, **kwds) File "split_by_player.py", line 25, in main for i, (d, l) in enumerate(games): File "/home/david/anaconda3/envs/transfer_chess/lib/python3.7/site-packages/backend-1.0.0-py3.7.egg/backend/pgn_parsering.py", line 20, in __iter__ yield next(self) File "/home/david/anaconda3/envs/transfer_chess/lib/python3.7/site-packages/backend-1.0.0-py3.7.egg/backend/pgn_parsering.py", line 41, in __next__ raise RuntimeError(l) RuntimeError:

@DavidAAbbott
Copy link
Author

I have discovered that if I leave only 1 game in the pgn then I get this output instead:

2023-04-15 20:14:02 split_by_player.py finndave.pgn finndave output/split/games 2023-04-15 20:14:02 Starting split_by_player 2023-04-15 20:14:02 0 found totals of 1:0 players from finndave.pgn 2023-04-15 20:14:02 Writing white 2023-04-15 20:14:02 Writing black 2023-04-15 20:14:02 done 2023-04-15 20:14:02 Run completed 2023-04-15 20:14:03 pgn_fractional_split.py output/split/games_white.pgn.bz2 output/split/train_white.pgn.bz2 output/split/validate_white.pgn.bz2 --ratios 90 10 2023-04-15 20:14:03 Starting pgn_fractional_split 2023-04-15 20:14:03 0 done total from output/split/games_white.pgn.bz2 2023-04-15 20:14:03 Writing 1 games to: output/split/train_white.pgn.bz2 2023-04-15 20:14:03 Writing 0 games to: output/split/validate_white.pgn.bz2 2023-04-15 20:14:03 done 2023-04-15 20:14:03 Run completed bzcat: Can't open input file output/split/train_white.pgn.bz2: No such file or directory. Processing stdin 0 games matched out of 0. cat: '*.pgn': No such file or directory

@paulphys
Copy link

@DavidAAbbott Did you manage to resolve this? I'm running into the same issue here.

@paulphys
Copy link

I just figured it out. The first issue arises from double empty lines between games in your downloaded PGN file. The official Lichess archive used in this project only contains single empty lines between games. If you have downloaded your own games via the Lichess API, you would have to get rid of the double lines.

To convert to single empty lines, simply run:

cat lichess_tevatron_2023-12-19.pgn | sed '/^$/N;/^\n$/D' > tevatron_archive_fixed.pgn

The second issue comes from a supposedly bad path within 1-data_generation/9-pgn_to_training_data.sh.

My fixed version:

#!/bin/bash
set -e

#args input_path output_dir player

player_file=${1}
p_dir=${2}
p_name=${3}

train_frac=90
val_frac=10

split_dir=$p_dir/split

mkdir -p ${p_dir}
mkdir -p ${split_dir}

echo "${p_name} to ${p_dir}"

python split_by_player.py $player_file $p_name $split_dir/games

for c in "white" "black"; do
    python pgn_fractional_split.py $split_dir/games_$c.pgn.bz2 $split_dir/train_$c.pgn.bz2 $split_dir/validate_$c.pgn.bz2 --ratios $train_frac $val_frac

    cd $p_dir
    mkdir -p pgns
    for s in "train" "validate"; do
        mkdir -p $s
        mkdir -p $s/$c

        #using tool from:
        #https://www.cs.kent.ac.uk/people/staff/djb/pgn-extract/
        
        bzcat split/${s}_${c}.pgn.bz2 | pgn-extract -7 -C -N  -#1000

        cat *.pgn > pgns/${s}_${c}.pgn
        rm -v *.pgn

        #using tool from:
        #https://github.com/DanielUranga/trainingdata-tool
        screen -S "${p_name}-${c}-${s}" -dm bash -c "cd ${s}/${c}; trainingdata-tool -v ../../pgns/${s}_${c}.pgn"
    done
    cd -
done

After changing this file, simply run:
sudo ./9-pgn_to_training_data.sh tevatron_archive_fixed.pgn output tevatron

@paulphys
Copy link

For anyone running into issues compiling trainingdata-tool with the error message:

/home/paul/dev/chess/trainingdata-tool/lc0/src/neural/writer.h:39:3: error: ‘uint32_tdoes not name a type
   39 |   uint32_t version;
      |   ^~~~~~~~
/home/paul/dev/chess/trainingdata-tool/lc0/src/neural/writer.h:31:1: note: ‘uint32_tis defined in header<cstdint>’; did you forget to#include <cstdint>’?
   30 | #include "utils/cppattributes.h"
  +++ |+#include <cstdint>

Simply add #include <cstdint> to the top of trainingdata-tool/lc0/src/neural/writer.h

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants