This GitHub repository contains the scripts used to produce the results for the manuscript Designing DNA-based predictors of drug response using the signal joint with gene expression.
The scripts require anaconda (or mini-conda).
Using create_environment.sh would create the percolate_manuscript environment with all required packages installed.
To reproduce the results presented in the manuscript, you can follow these steps.
Using data_download/scripts/download_GDSC.sh will automatically download and process all the data needed for reproducing the different figures. Downloaded and processed files will appear in the data folder.
Using sh model_training/launch_GDSC_estimation_components_gridsearchAIC.sh would launch the model selection by Grid Search (AIC), train the different GLM-PCA models and align the models by Percolate. Results are saved in output.
If you use scripts figuring in this repo, please cite Designing DNA-based predictors of drug response using the signal joint with gene expression, Mourragui et al 2022, Biorxiv.