Error when running split_by_player.py #1

DavidAAbbott · 2023-04-07T10:56:35Z

No matter which .pgn file I try, I seem to get various errors after running "9-pgn_to_training_data.sh" which in turn runs "split_by_player.py".

Here is the error output when trying to use my own Lichess games:

2023-04-07 06:38:39 split_by_player.py finndave.pgn finndave output/split/games 2023-04-07 06:38:39 Starting split_by_player 2023-04-07 06:38:39 Error encounteredlayers from finndave.pgn Traceback (most recent call last): File "split_by_player.py", line 48, in <module> main() File "/home/david/anaconda3/envs/transfer_chess/lib/python3.7/site-packages/backend-1.0.0-py3.7.egg/backend/utils.py", line 112, in wrapped_main val = mainFunc(*args, **kwds) File "split_by_player.py", line 25, in main for i, (d, l) in enumerate(games): File "/home/david/anaconda3/envs/transfer_chess/lib/python3.7/site-packages/backend-1.0.0-py3.7.egg/backend/pgn_parsering.py", line 20, in __iter__ yield next(self) File "/home/david/anaconda3/envs/transfer_chess/lib/python3.7/site-packages/backend-1.0.0-py3.7.egg/backend/pgn_parsering.py", line 41, in __next__ raise RuntimeError(l) RuntimeError:

The text was updated successfully, but these errors were encountered:

DavidAAbbott · 2023-04-16T00:17:07Z

I have discovered that if I leave only 1 game in the pgn then I get this output instead:

2023-04-15 20:14:02 split_by_player.py finndave.pgn finndave output/split/games 2023-04-15 20:14:02 Starting split_by_player 2023-04-15 20:14:02 0 found totals of 1:0 players from finndave.pgn 2023-04-15 20:14:02 Writing white 2023-04-15 20:14:02 Writing black 2023-04-15 20:14:02 done 2023-04-15 20:14:02 Run completed 2023-04-15 20:14:03 pgn_fractional_split.py output/split/games_white.pgn.bz2 output/split/train_white.pgn.bz2 output/split/validate_white.pgn.bz2 --ratios 90 10 2023-04-15 20:14:03 Starting pgn_fractional_split 2023-04-15 20:14:03 0 done total from output/split/games_white.pgn.bz2 2023-04-15 20:14:03 Writing 1 games to: output/split/train_white.pgn.bz2 2023-04-15 20:14:03 Writing 0 games to: output/split/validate_white.pgn.bz2 2023-04-15 20:14:03 done 2023-04-15 20:14:03 Run completed bzcat: Can't open input file output/split/train_white.pgn.bz2: No such file or directory. Processing stdin 0 games matched out of 0. cat: '*.pgn': No such file or directory

paulphys · 2023-12-19T16:58:48Z

@DavidAAbbott Did you manage to resolve this? I'm running into the same issue here.

paulphys · 2023-12-19T17:54:30Z

I just figured it out. The first issue arises from double empty lines between games in your downloaded PGN file. The official Lichess archive used in this project only contains single empty lines between games. If you have downloaded your own games via the Lichess API, you would have to get rid of the double lines.

To convert to single empty lines, simply run:

cat lichess_tevatron_2023-12-19.pgn | sed '/^$/N;/^\n$/D' > tevatron_archive_fixed.pgn

The second issue comes from a supposedly bad path within 1-data_generation/9-pgn_to_training_data.sh.

My fixed version:

#!/bin/bash
set -e

#args input_path output_dir player

player_file=${1}
p_dir=${2}
p_name=${3}

train_frac=90
val_frac=10

split_dir=$p_dir/split

mkdir -p ${p_dir}
mkdir -p ${split_dir}

echo "${p_name} to ${p_dir}"

python split_by_player.py $player_file $p_name $split_dir/games

for c in "white" "black"; do
    python pgn_fractional_split.py $split_dir/games_$c.pgn.bz2 $split_dir/train_$c.pgn.bz2 $split_dir/validate_$c.pgn.bz2 --ratios $train_frac $val_frac

    cd $p_dir
    mkdir -p pgns
    for s in "train" "validate"; do
        mkdir -p $s
        mkdir -p $s/$c

        #using tool from:
        #https://www.cs.kent.ac.uk/people/staff/djb/pgn-extract/
        
        bzcat split/${s}_${c}.pgn.bz2 | pgn-extract -7 -C -N  -#1000

        cat *.pgn > pgns/${s}_${c}.pgn
        rm -v *.pgn

        #using tool from:
        #https://github.com/DanielUranga/trainingdata-tool
        screen -S "${p_name}-${c}-${s}" -dm bash -c "cd ${s}/${c}; trainingdata-tool -v ../../pgns/${s}_${c}.pgn"
    done
    cd -
done

After changing this file, simply run:
sudo ./9-pgn_to_training_data.sh tevatron_archive_fixed.pgn output tevatron

paulphys · 2023-12-19T18:00:10Z

For anyone running into issues compiling trainingdata-tool with the error message:

/home/paul/dev/chess/trainingdata-tool/lc0/src/neural/writer.h:39:3: error: ‘uint32_t’ does not name a type
   39 |   uint32_t version;
      |   ^~~~~~~~
/home/paul/dev/chess/trainingdata-tool/lc0/src/neural/writer.h:31:1: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
   30 | #include "utils/cppattributes.h"
  +++ |+#include <cstdint>

Simply add #include <cstdint> to the top of trainingdata-tool/lc0/src/neural/writer.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when running split_by_player.py #1

Error when running split_by_player.py #1

DavidAAbbott commented Apr 7, 2023

DavidAAbbott commented Apr 16, 2023

paulphys commented Dec 19, 2023

paulphys commented Dec 19, 2023

paulphys commented Dec 19, 2023

Error when running split_by_player.py #1

Error when running split_by_player.py #1

Comments

DavidAAbbott commented Apr 7, 2023

DavidAAbbott commented Apr 16, 2023

paulphys commented Dec 19, 2023

paulphys commented Dec 19, 2023

paulphys commented Dec 19, 2023