This repository uses Poetry for managing dependencies.
To begin, install Poetry version 1.1.14:
curl -sSL https://install.python-poetry.org | python3 - --version 1.1.14
Read their documentation for updated instructions.
To verify Poetry is installed, run poetry --version
the output should be Poetry version 1.1.14
Installing all of the necessary dependencies to a new virtual environment should now be as easy as running poetry install
from the root directory of this repo.
To activate the virtual environment run poetry shell
(if there is an issue, it could be there is a space in the path to the virtual environment).
To run the jupyter notebooks in VS code, you will need to change the kernel to point to the python binary in the virtual environment folder (read Poetry docs to find the location of this). The binary should be in path_to_venv/.venv/bin/python
.
- cadena
- classification
- cs_transformer
- de_transformer
- dense_subgraph
- grid_search_cv
- iidaka_transformer
- important_edges
- nested_grid_search_cv
- percentile_alphas
- pipeline
- replication
- test_local_search_omission
- utils
The files in this directory consitute some of the outdated code that was used during the experiments conducted for this research paper. Some files were deleted, but could likely be found in the git history if necessary.
To get any of this code to work, it may need to be tweaked. It serves only as an indication as to what experiments were performed, but were either revised or abandoned.
This directory contains the data used in this study.
- BrainNetViewer - Contains the files used to create the brain diagrams with the BrainNetViewer program. The edge files were generated with the important_edges.ipynb notebook.
- ABIDE - Contains the BOLD timeseries of the ABIDE dataset downloaded from the Preprocessed Connectomes Project amazon s3 bucket and a phenotypic trait file for each of the subjects. The script used to download these files is named download_abide_files.ipynb and can be found in the scripts directory. Note that the script can download timeseries without GSR and bandpass filtering with a simple modification to one of the variables.
- generated_filt_global - Contains the raw correlation matrices generated in this study using the timeseries_to_networks.ipynb notebook in the scripts directory.
- lanciano_datasets_corr_thresh_80 - Contains the thresholded correlation matrices provided by Lanciano et al in their project repository.
Each of the directories containing data files have a unique.json file that was generated with the calculate_unique_subjects.py script. This file gives filepaths to each unique subject, since subjects can be found in more than one subdirectory depending on their phenotypic traits.
This directory contains output files from the experiments reported on in the study. These output files contain details such as the accuracy, runtime, and confusion matrices of the experiments.
In this directory, two files named matlab sdp.npy and python sdp.npy can be found. These files contain example outputs of the Matlab and Python SDP solvers respectively for the same given input. It was found that the outputted values differed only on the order of 10E−6 to 10E−4, which is likely due to a difference in precision between the libraries used.
The default output directory when running experiments. The gitignore file causes files here to be ignored by the repository.
Similar to the output directory, but contains generated plots. These plots are ignored by git.
Contains various scripts for downloading and processing files.
Contains some reused functions from Lanciano et al.'s repository, but they are believed to have originated from the work of Cadena et al.
Functions for performing nested and non-nested cross validation for the experiments, as well as outputting results.
Transformer class implementing the Contrast Subgraph approach.
Transformer class implementing the Discriminative Edges approach.
Functions for finding and refining dense subgraphs. These include the SDP and QP solvers, and the local search algorithms.
The notebook for performing the cross validation experiments. Certain lines can be uncommented and commented to run the experiments on different datasets.
Transformer class implementing the Effect Size Thresholding approach.
Used to create the edge files for the BrainNetViewer directory. It divides the brain networks into 5 folds and identifies the most discriminative edges by accumulating the difference network values for edges when they are chosen. This is repeated 10 times to get a good idea of which edges are the most important overall.
The notebook for performing the nested cross validation experiments. Certain lines can be uncommented and commented to run the experiments on different datasets.
This file was used to investigate the hypothesis that Lanciano et al. may have used all of the graphs to find the contrast subgraphs. Alpha values were obtained from the entire group of brain networks within each category based on the best percentiles reported by Lanciano et al. These were then used to obtain contrast subgraphs with the original code provided by Lanciano et al.
Custom pipeline class similar to the class provided by sklearn, but customized for the purposes of this study.
Notebook for running the replication experiment with the best reported parameters by Lanciano et al.
The last step of the local search algorithm described in section 3.1.3 appeared to be redundant, and during the translation of the code, this step was accidentally omitted. A test was later carried out in which 5 difference networks were each used with 1000 random node sets in the local search algorithm. Over the 5000 iterations of the algorithm, the complement of the improved node-set was never superior, hence it was very unlikely it could have affected the results of the replication.
Various helper functions, such as the logical implementation of using contrast subgraphs to embed the brain networks.