Confound-Leakage: Confound Removal In Machine Learing Leads To Leakage

In this repo I am uploading the code and instructions for replication for the paper. To have a look at how the paper's plots and anlyses were created see: Analyses

What you need

python3: please install python and all the requirements (pip install -r requirements.txt) ** I did run the analyses again while revising the manuscript using sklearn=1.1 to get some new visualisations So, you need to update to that as well if you want all visualisations to work
The data comes from the UCI repository.
Computation infrastructure: This repo uses an HTCondor server to submit jobs. You can run all scripts manually on any modern machine, but it could take some time without some copute cluster

How to execute:

setup the environment as mentioned above
Get Data
- Many datasets are downloaded directly from UCI Repo, but the following once need to be downloaded in put into ./data/raw/:
  - /bank-additional/bank-additional-full.csv
  - /raw/student/student-mat.csv student
    So, please go to the UCI repository following the links above, next click on Data Folder. Here, you will find .zip files with the same name as the .csv you see above. Dowload and unzip them. Lastly, create a ./data/raw/ folder (if not existing yet) and put .csv into it.
Run data preparation script: ./00_prepare_data.sh
Run experiments: ./01_condor_submission.sh:
- For exact reproduction:
  - This is easiest with a HTCondor. If you are on one just use bash ./01_condor_submission.sh
  - Else you will have to adjust the submission process to your ecosystem:
    - if you run all the lines in ./01_condor_submission.sh wihtout the | condor_submit you will see all the things I am running in the HTCondor submit file style
    - from here you can either run things manually or rewrite it for you environment
- A good alternative might be:
  - run the experiments you are interested in by running the python file: python3 ./src/run_analysis.py ... where instead of ... you put in the needed arguments
  - you can find the arguements needed in the if __name__ == '__main__': at the bottom of the ./src/run_analysis.py
Now you can create the jupyter book to get an overview of the anaysis by using:
- bash ./02_build_analyses.sh
- or run the python script in the hydro style inside of ./analyses/content/

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
analyses		analyses
extra/check_auc		extra/check_auc
julearn		julearn
leakconfound		leakconfound
src		src
.flake8		.flake8
.gitignore		.gitignore
00_prepare_data.sh		00_prepare_data.sh
01_condor_submission.sh		01_condor_submission.sh
02_build_analyses.sh		02_build_analyses.sh
03_submit_permutation.sh		03_submit_permutation.sh
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Confound-Leakage: Confound Removal In Machine Learing Leads To Leakage

What you need

How to execute:

Overview used UCI datasets (with links):

About

Releases

Packages

Languages

License

juaml/ConfoundLeakage

Folders and files

Latest commit

History

Repository files navigation

Confound-Leakage: Confound Removal In Machine Learing Leads To Leakage

What you need

How to execute:

Overview used UCI datasets (with links):

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages