This repository contains the GENBAIT project for bait selection in BioID experiments. This project is designed to be reproducible using Snakemake. Below are the instructions on how to reproduce the results of each step in the workflow using the provided configuration files.
Before running the workflow, ensure you have the following installed:
- Python 3.10+
- Snakemake
- Git LFS (for handling large files)
It is recommended to create a virtual environment to manage dependencies:
python -m venv genbait_env
source genbait_env/bin/activate # On Windows use `genbait_env\Scripts\activate`
Navigate to the root directory of the project and run:
git clone https://github.com/camlab-bioml/genbait_reproducibility.git
cd genbait_reproducibility
pip install .
This will install the package along with all required dependencies.
You can reproduce the results for each dataset by running the Snakemake workflow. The configuration files for dataset1 and dataset2 are provided in the config directory.
snakemake --cores 1 data/dataset1/df_norm.csv --configfile config/config_dataset1.yaml
snakemake --cores 1 data/dataset2/df_norm.csv --configfile config/config_dataset2.yaml
snakemake --cores 1 results/dataset1/GA_results/popfile.pkl --configfile config/config_dataset1.yaml
snakemake --cores 1 results/dataset2/GA_results/popfile.pkl --configfile config/config_dataset2.yaml
snakemake --cores 1 results/dataset1/plots/GA_vs_Random_plot.png --configfile config/config_dataset1.yaml
snakemake --cores 1 results/dataset2/plots/GA_vs_Random_plot.png --configfile config/config_dataset2.yaml
snakemake --cores 1 results/dataset1/GA_number_of_baits_seeds/popfile_features_{num_features}_seed_{seed}.pkl --configfile config/config_dataset1.yaml
snakemake --cores 1 results/dataset2/GA_number_of_baits_seeds/popfile_features_{num_features}_seed_{seed}.pkl --configfile config/config_dataset2.yaml
snakemake --cores 1 results/dataset1/plots/nbaits_vs_max_value_seeds_boxplot_GA.png --configfile config/config_dataset1.yaml
snakemake --cores 1 results/dataset2/plots/nbaits_vs_max_value_seeds_boxplot_GA.png --configfile config/config_dataset2.yaml
snakemake --cores 1 results/dataset1/plots/ml_correlation_plot_averaged.png --configfile config/config_dataset1.yaml
snakemake --cores 1 results/dataset2/plots/ml_correlation_plot_averaged.png --configfile config/config_dataset2.yaml
snakemake --cores 1 results/dataset1/plots/nmf_scores_vs_each_method.png --configfile config/config_dataset1.yaml
snakemake --cores 1 results/dataset2/plots/nmf_scores_vs_each_method.png --configfile config/config_dataset2.yaml
snakemake --cores 1 results/dataset1/plots/nmf_cos_scores_vs_each_method.png --configfile config/config_dataset1.yaml
snakemake --cores 1 results/dataset2/plots/nmf_cos_scores_vs_each_method.png --configfile config/config_dataset2.yaml
snakemake --cores 1 results/dataset1/plots/nmf_kl_scores_vs_each_method.png --configfile config/config_dataset1.yaml
snakemake --cores 1 results/dataset2/plots/nmf_kl_scores_vs_each_method.png --configfile config/config_dataset2.yaml
snakemake --cores 1 results/dataset1/plots/nmf_ari_values_vs_each_method.png --configfile config/config_dataset1.yaml
snakemake --cores 1 results/dataset2/plots/nmf_ari_values_vs_each_method.png --configfile config/config_dataset2.yaml
snakemake --cores 1 results/dataset1/plots/nmf_go_components_scores_values_vs_each_method.png --configfile config/config_dataset1.yaml
snakemake --cores 1 results/dataset2/plots/nmf_go_components_scores_values_vs_each_method.png --configfile config/config_dataset2.yaml
snakemake --cores 1 results/dataset1/plots/remaining_preys_vs_each_method_sorted.png --configfile config/config_dataset1.yaml
snakemake --cores 1 results/dataset2/plots/remaining_preys_vs_each_method_sorted.png --configfile config/config_dataset2.yaml
snakemake --cores 1 results/dataset1/plots/GO_terms_retrieval_percentage_vs_each_method.png --configfile config/config_dataset1.yaml
snakemake --cores 1 results/dataset2/plots/GO_terms_retrieval_percentage_vs_each_method.png --configfile config/config_dataset2.yaml
snakemake --cores 1 results/dataset1/plots/Leiden_ARI_values_vs_each_method.png --configfile config/config_dataset1.yaml
snakemake --cores 1 results/dataset2/plots/Leiden_ARI_values_vs_each_method.png --configfile config/config_dataset2.yaml
snakemake --cores 1 results/dataset1/plots/GMM_ARI_values_vs_each_method.png --configfile config/config_dataset1.yaml
snakemake --cores 1 results/dataset2/plots/GMM_ARI_values_vs_each_method.png --configfile config/config_dataset2.yaml
snakemake --cores 1 results/dataset1/plots/GMM_mean_correlation_values_vs_each_method.png --configfile config/config_dataset1.yaml
snakemake --cores 1 results/dataset2/plots/GMM_mean_correlation_values_vs_each_method.png --configfile config/config_dataset2.yaml
snakemake --cores 1 results/dataset1/plots/combined_metrics_comparison_plot.png --configfile config/config_dataset1.yaml
snakemake --cores 1 results/dataset2/plots/combined_metrics_comparison_plot.png --configfile config/config_dataset2.yaml