This file is a revision of the original doc.
The current version only contains doc related to reproducing results in the chapter 4 of thesis.
$ROOT
refers to repo root.
Everything can be found in the following places in the mind cluster.
/user_data/yimengzh/thesis-yimeng-v2
/user_data/yimengzh/strflab-python
/user_data/yimengzh/pytorch-module-in-json
/user_data/yimengzh/gaya-data
/home/yimengzh/toolchain/yimeng-thesis-v2-20200106_2c0c603d8a871cd40d99848371ad443a.simg
/user_data/yimengzh/thesis-yimeng-v2/additional_stuffs
raw 8K data can be found in yuanyuan_8k_neural.hdf5
and yuanyuan_8k_images.hdf5
under
/user_data/yimengzh/thesis-yimeng-v2/results/datasets/raw
. These two files contain recordings of six days. We used
data from three days. These files are generated from source MATLAB files which are ultimately generated from raw recording
data. This repo contains scripts to convert MATLAB files into the above HDF5 formats, as described below; the raw MATLAB
files were generated by some spike sorting plus format conversion,
and Summer/Yuanyuan should have more knowledge about the
generation process.
raw NS 2250 data can be found in /user_data/yimengzh/gaya-data/data/tang/batch/final/tang_neural.npy
and
/user_data/yimengzh/gaya-data/data/tang/images/all_imags.npy
. Hal has more knowledge
about the generation process from raw recording data into the above NumPy files.
yimeng-thesis-v2-20200106_2c0c603d8a871cd40d99848371ad443a.simg
under~/toolchain/
. it can be obtained by converting the docker image available atdocker pull leelabcnbc/yimeng-thesis-v2:20200106
. Check Section toolchain 20200106 in the old README.- Singularity. should work on 2.6.1 as well as 3.0 version.
- you need those some of the dependencies specified in
$ROOT/setup_env_variables.sh
. Those packages are mostly available in lab GitHub. Only the following of them are needed to reproduce the results in the paper. Click each link below for each dependency's commit that worked with this repo. Newer commits in theory should do as well.pytorch-module-in-json
which implements the DSL for model definition.strflab-python
for computing ccnorm.gaya-data
needed to obtain NS 2250 data. Check with Hal on the location of the data.
The steps should work on the CNBC cluster (mind) and will work with single machine with some small adaptations.
All the actual computation is done inside the Singularity container.
- For model training, explicit invocation of Singularity is not needed, as my code already handles that.
- For everything else, the code has to run after doing the following steps.
- open the container.
singularity shell --nv -B /data2/yimengzh:/my_data -B /scratch:/my_data_2 ~/toolchain/yimeng-thesis-v2-20200106_2c0c603d8a871cd40d99848371ad443a.simg
- set up environment variables
cd /my_data # note the starting `.` you can also do `source ./setup_env_variables.sh` . ./setup_env_variables.sh
- this is only needed for Jupyter notebooks.
# XXXX should be replaced by an appropriate port number. jupyter notebook --no-browser --port=XXXX
- open the container.
- first, you need to download ImageNet 8K data. Run the command OUTSIDE the container.
$ROOT/setup_private_data.sh
- run the following inside the container
python $ROOT/scripts/preprocessing/raw_data.py python $ROOT/scripts/preprocessing/prepared_data.py
Ask Hal about it. This code repo uses Hal's code under the hood to obtain the data.
All commands should run outside the container, with a basic Python 3.6+ environment
without any additional dependency needed. On the CNBC cluster, such an environment
can be established using scl enable rh-python36 bash
.
Run the following files under $ROOT/scripts/training/yuanyuan_8k_a_3day/maskcnn_polished_with_rcnn_k_bl
.
These files in total may train some extra models. But these form the minimal set
of files required to cover all models used in the paper.
submit_20200530.py
submit_20200530_2.py
submit_20200617.py
submit_20200704.py
submit_20200705.py
submit_20200707.py
submit_20200708.py
submit_20200709.py
submit_20200731.py
submit_20200801.py
submit_20201001.py
submit_20201012.py
Run the following files under $ROOT/scripts/training/gaya/maskcnn_polished_with_rcnn_k_bl
.
These files in total may train some extra models. But these form the minimal set
of files required to cover all models used in the paper.
submit_20201002_tang.py
submit_20201018_tang.py
Only 8/16/32 ch models were considered; higher ch will result in a higher frequency of OOM, making the results not very useful.
Run the following files under $ROOT/scripts/training/yuanyuan_8k_a_3day/maskcnn_polished_with_rcnn_k_bl
.
These files in total may train some extra models. But these form the minimal set
of files required to cover all models used in the paper.
submit_20201114.py
submit_20201118.py
Run the following files under $ROOT/scripts/training/gaya/maskcnn_polished_with_rcnn_k_bl
.
These files in total may train some extra models. But these form the minimal set
of files required to cover all models used in the paper.
submit_20201215_tang.py
Only 16/32 ch, 2 L models trained using all data were considered, as these models had lowest memory requirement and matched recurrent models the best.
Run the following files under $ROOT/scripts/training/yuanyuan_8k_a_3day/maskcnn_polished_with_rcnn_k_bl
.
These files in total may train some extra models. But these form the minimal set
of files required to cover all models used in the paper.
submit_20201205.py
submit_20201205_2.py
submit_20201213.py
submit_20201213_2.py
Run the following files under $ROOT/scripts/training/gaya/maskcnn_polished_with_rcnn_k_bl
.
These files in total may train some extra models. But these form the minimal set
of files required to cover all models used in the paper.
submit_20201218_tang.py
check files in results_thesis