The aim of this project is to predict the outcome of horse racing using machine learning algorithms.
From RaceBets
The dataset comes from Kaggle and covers races in HK from 1997 to 2005.
The data consists of 6,349 races with 4,405 runners.
The 5,878 races ran before January 2005 are used to develop the forecasting models whereas the remaining 471 races ran after January 2005 are preserved to conduct out-of-sample testing.
We have an article explaining our journey through this process. You can find a link below:
requirements.txt
: list of requirements needed to run this projectbaseline_models.ipynb
: notebook containing informations for part 1 on baseline modelsquick_eda_horse_racing.ipynb
: notebook with a quick EDA on our datasetcreate_dataset.py
andconfig.py
are both used to split our inital data into train and test sets depending on the date of racesextract_features.py
is used to perform feature engineeringwinner/
: folder containing all notebooks and ML models to bet on the winnerplaced/
: folder containing all notebooks and ML models to bet on placed horses (the Top 3)
Let's have a look about the winner files
winner_01_lgbm_optim
: runs the hyperoptimization for LGBMwinner_02_train
: runs all training processes either for LGBM and deep learning then saves resultswinner_03_show_result
: helps us to verify our informations and go deeper about our predictions for a specific monthwinner_04_all_results
: consolidates all months with an ensemble model and shows final resultswinner_functions.py
: contains the required functions to run those 4 previous notebooksmodel/
: contains all saved models from winner_02_trainresult_hyperopt.csv
: file with all our optimizations steps
Let's have a look about the placed files
placed_01_train
: runs all training processes for deep learning then saves resultsplaced_02_show_result
: helps us to verify our informations and go deeper about our predictions for a specific monthplaced_03_consolidated
: consolidates all months with an ensemble model and shows final resultsplaced_functions.py
: contains the required functions to run those 4 previous notebooksmodel/
: contains all saved models from placed_01_train and LGBM models from winner_folder