Repo for the classification of the MFD Ontology habitats in the Microflora Danica project and reproduces the results from "Section IV: Convergence of supervised and unsupervised habitat descriptors" of the MFD manuscripts.
Please download the input data from the Zenodo repo in the /data
folder and amend the /config/hab_class.yaml
file as indicated.
The script is meant to run with a SLURM system and requires mamba and conda but installs all the other required packages via mamba. The precise partition names or the resource requirements might need to be adjusted in config/config.yaml
and in the .rule
files in scripts/scripts_python/rules/
.
It will aslo try to detect automatically the location where it is. If this fails please amend the file scripts/scripts_bash/hab_class.sh
as indicated there.
The results from the paper can be reproduced by running:
sbatch scripts/scripts_bash/hab_class.sh
The results will be collected in the /analysis
folder.
The figures related to the habitat classification are generated by the following scripts:
- '/scripts/scripts_R/tree_pr_auc_classes.Rmd' generates Figure 3;
- '/scripts/scripts_R/FN_analysis.Rmd' generates Supplementary Note Figure 1b.
The input of these figure-generating scripts are some of the files generated by the first one, they location where to find them is controlled by the "data.path" variable.