In this repo I am uploading the code and instructions for replication for the paper. To have a look at how the paper's plots and anlyses were created see: Analyses
- python3: please install python and all the requirements (
pip install -r requirements.txt
) ** I did run the analyses again while revising the manuscript using sklearn=1.1 to get some new visualisations So, you need to update to that as well if you want all visualisations to work - The data comes from the UCI repository.
- Computation infrastructure: This repo uses an HTCondor server to submit jobs. You can run all scripts manually on any modern machine, but it could take some time without some copute cluster
- setup the environment as mentioned above
- Get Data
- Many datasets are downloaded directly from UCI Repo, but the following once need to be downloaded in put into
./data/raw/
:- /bank-additional/bank-additional-full.csv
- /raw/student/student-mat.csv
student
So, please go to the UCI repository following the links above, next click onData Folder
. Here, you will find .zip files with the same name as the .csv you see above. Dowload and unzip them. Lastly, create a./data/raw/
folder (if not existing yet) and put .csv into it.
- Many datasets are downloaded directly from UCI Repo, but the following once need to be downloaded in put into
- Run data preparation script:
./00_prepare_data.sh
- Run experiments:
./01_condor_submission.sh
:- For exact reproduction:
- This is easiest with a HTCondor. If you are on one just use
bash ./01_condor_submission.sh
- Else you will have to adjust the submission process to your ecosystem:
- if you run all the lines in
./01_condor_submission.sh
wihtout the| condor_submit
you will see all the things I am running in the HTCondor submit file style - from here you can either run things manually or rewrite it for you environment
- if you run all the lines in
- This is easiest with a HTCondor. If you are on one just use
- A good alternative might be:
- run the experiments you are interested in by running the python file:
python3 ./src/run_analysis.py ...
where instead of ... you put in the needed arguments - you can find the arguements needed in the
if __name__ == '__main__':
at the bottom of the./src/run_analysis.py
- run the experiments you are interested in by running the python file:
- For exact reproduction:
- Now you can create the jupyter book to get an overview of the anaysis by using:
bash ./02_build_analyses.sh
- or run the python script in the hydro style inside of
./analyses/content/