Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move all notebooks to scripts #58

Merged
merged 38 commits into from
Nov 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
d5e1942
delete hidden snakemake folder
Nov 20, 2024
beb3110
change pcna result order
Nov 20, 2024
9458d4a
add setup function to prereq to handle SDF/pc differences
Nov 20, 2024
8069271
add classes and labels to results config
ritvikvasan Nov 20, 2024
acf4055
move to run_embeddings and run_features
ritvikvasan Nov 20, 2024
be3fd45
remove hidden snakemake files
ritvikvasan Nov 20, 2024
1689542
use cytodl config path for dataloading
ritvikvasan Nov 20, 2024
c734d7f
debug runs
ritvikvasan Nov 21, 2024
cf470b3
test runs for embeddings and features pass for PCNA dataset
Nov 21, 2024
1292b7f
remove pdb
Nov 21, 2024
34447f6
move gpu stuff to utils
Nov 21, 2024
512ccd2
move fig notebooks into single analysis script
Nov 21, 2024
178a826
working analysis script for punctate structures
ritvikvasan Nov 22, 2024
4faae2b
add sdf analysis utils and merge script
ritvikvasan Nov 22, 2024
5e9e07b
fix nonetype error
ritvikvasan Nov 22, 2024
eceefa4
working sdf analysis runs
ritvikvasan Nov 22, 2024
cfd4eb7
Update data paths in the results config for other_polymorphic.yaml
fatwir Nov 22, 2024
18945d2
Update data paths in the results config for npm1_perturb.yaml
fatwir Nov 22, 2024
a8b4bb8
Update data paths in the results config for other_punctate.yaml
fatwir Nov 22, 2024
06228bb
Update other_punctate.yaml
fatwir Nov 22, 2024
11ac09d
add str2bool check
ritvikvasan Nov 22, 2024
ad6c964
rename npm1 experiment
ritvikvasan Nov 22, 2024
8371227
fix cellpack analysis error (no s key)
Nov 22, 2024
3d50c0d
remove notebooks and make script for drugdata analysis
Nov 22, 2024
d07414c
remvove selected gpu
Nov 22, 2024
e14be8c
Merge branch 'Pre-Commit-analyses' into scripts_features_rebased
pgarrison Nov 23, 2024
103b649
Modified the code to get the GPU ID based on memory utilization
Nov 25, 2024
4d42ed9
Updated the paths in configs (merged from main)
Nov 25, 2024
35e7977
Revert "Updated the paths in configs (merged from main)"
Nov 25, 2024
304d39c
Merge pull request #59 from AllenCell/fix_gpu_id
fatwir Nov 26, 2024
6ec5819
Modified the compute_evolve_dataloaders function to make this work fo…
Nov 26, 2024
dbb20b4
Revert "Modified the compute_evolve_dataloaders function to make this…
Nov 26, 2024
7da2d9c
merge main
Nov 26, 2024
d08ac35
remove leading underscores
Nov 26, 2024
34bc7b9
create dir if doesnt exist + fix cellpack evolve bug
ritvikvasan Nov 26, 2024
c35a1ba
add image path and pc path for polymorphic data
ritvikvasan Nov 26, 2024
77c2986
remove stray prints
ritvikvasan Nov 26, 2024
a4dc4e9
add new scripts to docs
ritvikvasan Nov 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions configs/results/cellpack.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,13 @@ names:
"Rotation_invariant_pointcloud",
"Rotation_invariant_pointcloud_jitter",
]
data_paths: [
"./configs/data/cellpack/image.yaml",
"./configs/data/cellpack/image.yaml",
"./configs/data/cellpack/pc.yaml",
"./configs/data/cellpack/pc.yaml",
# "./src/br/configs/data/cellpack/pc_jitter.yaml",
"./configs/data/cellpack/pc.yaml",
data_paths:
[
"/data/cellpack/image.yaml",
"/data/cellpack/image.yaml",
"/data/cellpack/pc.yaml",
"/data/cellpack/pc.yaml",
"/data/cellpack/pc.yaml",
]
classification_label: ["rule"]
regression_label:
13 changes: 8 additions & 5 deletions configs/results/npm1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,12 @@ names:
]
data_paths:
[
"./configs/data/npm1/pc.yaml",
"./configs/data/npm1/so3_image_sdf.yaml",
"./configs/data/npm1/so3_image_seg.yaml",
"./configs/data/npm1/classical_image_sdf.yaml",
"./configs/data/npm1/classical_image_seg.yaml",
"/data/npm1/pc.yaml",
"/data/npm1/so3_image_sdf.yaml",
"/data/npm1/so3_image_seg.yaml",
"/data/npm1/classical_image_sdf.yaml",
"/data/npm1/classical_image_seg.yaml",
]
classification_label: ["STR_connectivity_cc_thresh"]
regression_label:
["mean_centroid_distances", "mean_nucleolus_volume", "mean_nucleolus_area"]
10 changes: 5 additions & 5 deletions configs/results/npm1_perturb.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ names:
]
data_paths:
[
"./configs/data/npm1_perturb/pc.yaml",
"./configs/data/npm1_perturb/classical_image_sdf.yaml",
"./configs/data/npm1_perturb/classical_image_seg.yaml",
"./configs/data/npm1_perturb/so3_image_sdf.yaml",
"./configs/data/npm1_perturb/so3_image_seg.yaml",
"/data/npm1_perturb/pc.yaml",
"/data/npm1_perturb/classical_image_sdf.yaml",
"/data/npm1_perturb/classical_image_seg.yaml",
"/data/npm1_perturb/so3_image_sdf.yaml",
"/data/npm1_perturb/so3_image_seg.yaml",
]
16 changes: 9 additions & 7 deletions configs/results/other_polymorphic.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
orig_df: ./morphology_appropriate_representation_learning/preprocessed_data/other_polymorphic/manifest.csv
image_path:
pc_path:
image_path: ./morphology_appropriate_representation_learning/preprocessed_data/other_polymorphic/manifest.csv
pc_path: ./morphology_appropriate_representation_learning/preprocessed_data/other_polymorphic/manifest.csv
model_checkpoints:
[
"./morphology_appropriate_representation_learning/model_checkpoints/other_polymorphic/Rotation_invariant_pointcloud_SDF.ckpt",
Expand All @@ -19,9 +19,11 @@ names:
]
data_paths:
[
"./configs/data/other_polymorphic/pc.yaml",
"./configs/data/other_polymorphic/so3_image_sdf.yaml",
"./configs/data/other_polymorphic/so3_image_seg.yaml",
"./configs/data/other_polymorphic/classical_image_sdf.yaml",
"./configs/data/other_polymorphic/classical_image_seg.yaml",
"/data/other_polymorphic/pc.yaml",
"/data/other_polymorphic/so3_image_sdf.yaml",
"/data/other_polymorphic/so3_image_seg.yaml",
"/data/other_polymorphic/classical_image_sdf.yaml",
"/data/other_polymorphic/classical_image_seg.yaml",
]
classification_label: ["structure_name"]
regression_label: ["avg_dists", "mean_volume", "mean_surface_area"]
13 changes: 6 additions & 7 deletions configs/results/other_punctate.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,21 @@ model_checkpoints:
"./morphology_appropriate_representation_learning/model_checkpoints/other_punctate/Classical_image.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/other_punctate/Rotation_invariant_image.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/other_punctate/Classical_pointcloud.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/other_punctate/Rotation_invariant_pointcloud.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/other_punctate/Rotation_invariant_pointcloud_structurenorm.ckpt",
]
names:
[
"Classical_image",
"Rotation_invariant_image",
"Classical_pointcloud",
"Rotation_invariant_pointcloud",
"Rotation_invariant_pointcloud_structurenorm",
]
data_paths:
[
"./configs/data/other_punctate/image.yaml",
"./configs/data/other_punctate/image.yaml",
"./configs/data/other_punctate/pc.yaml",
"./configs/data/other_punctate/pc_intensity.yaml",
"./configs/data/other_punctate/pc_intensity_structurenorm.yaml",
"/data/other_punctate/image.yaml",
"/data/other_punctate/image.yaml",
"/data/other_punctate/pc.yaml",
"/data/other_punctate/pc_intensity_structurenorm.yaml",
]
classification_label: ["structure_name", "cell_stage"]
regression_label:
20 changes: 11 additions & 9 deletions configs/results/pcna.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,25 +3,27 @@ image_path: ./morphology_appropriate_representation_learning/preprocessed_data/p
pc_path: ./morphology_appropriate_representation_learning/preprocessed_data/pcna/manifest.csv
model_checkpoints:
[
"./morphology_appropriate_representation_learning/model_checkpoints/pcna/Classical_pointcloud.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/pcna/Rotation_invariant_pointcloud.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/pcna/Classical_image.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/pcna/Rotation_invariant_image.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/pcna/Classical_pointcloud.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/pcna/Rotation_invariant_pointcloud.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/pcna/Rotation_invariant_pointcloud_jitter.ckpt",
]
names:
[
"Classical_pointcloud",
"Rotation_invariant_pointcloud",
"Classical_image",
"Rotation_invariant_image",
"Classical_pointcloud",
"Rotation_invariant_pointcloud",
"Rotation_invariant_pointcloud_jitter",
]
data_paths: [
"./configs/data/pcna/pc.yaml",
"./configs/data/pcna/pc_intensity.yaml",
"./configs/data/pcna/image.yaml",
"./configs/data/pcna/image.yaml",
"/data/pcna/image.yaml",
"/data/pcna/image.yaml",
"/data/pcna/pc.yaml",
"/data/pcna/pc_intensity.yaml",
# "./src/br/configs/data/pcna/pc_intensity_jitter.yaml",
"./configs/data/pcna/pc_intensity.yaml",
"/data/pcna/pc_intensity.yaml",
]
classification_label: ["cell_stage_fine", "flag_comment"]
regression_label:
50 changes: 33 additions & 17 deletions docs/USAGE.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,10 +62,10 @@ Coming soon.

## Steps to train models

Training these models can take weeks. We've published our trained models so you don't have to. Skip to the next section if you'd like to just use our models.
Training these models can take days. We've published our trained models so you don't have to. Skip to the next section if you'd like to just use our models.

1. Create a single cell manifest (e.g. csv, parquet) for each dataset with a column corresponding to final processed paths, and create a split column corresponding to train/test/validation split.
2. Update the final single cell dataset path (`SINGLE_CELL_DATASET_PATH`) and the column in the manifest for appropriate input modality (`SDF_COLUMN`/`SEG_COLUMN`/`POINTCLOUD_COLUMN`/`IMAGE_COLUMN`) in each datamodule yaml files. e.g. for PCNA data these yaml files are located here -
2. Update the final single cell dataset path (`SINGLE_CELL_DATASET_PATH`) and the column in the manifest for appropriate input modality (`SDF_COLUMN`/`SEG_COLUMN`/`POINTCLOUD_COLUMN`/`IMAGE_COLUMN`) in each [datamodule file](../configs/data/). e.g. for PCNA data these yaml files are located here -

```
└── configs
Expand All @@ -77,14 +77,16 @@ Training these models can take weeks. We've published our trained models so you
         └── pc_intensity_jitter.yaml <- Datamodule for PCNA point clouds with intensity and jitter
```

3. Train models using cyto_dl. Ensure to run the training scripts from the folder where the repo was cloned (and where all the data was downloaded). Experiment configs for point cloud and image models are located here -
3. Train models using cyto_dl. Ensure to run the training scripts from the folder where the repo was cloned (and where all the data was downloaded). [Experiment configs](../configs/experiment/) for point cloud and image models for the cellpack dataset are located here -

```
└── configs
   └── experiment
      └── cellpack
         ├── image_equiv.yaml <- Rotation invariant image model experiment
         └── pc_equiv.yaml <- Rotation invariant point cloud model experiment
         ├── image_classical.yaml <- Classical image model experiment
         ├── image_so3.yaml <- Rotation invariant image model experiment
         └── pc_classical.yaml <- Classical point cloud model experiment
         └── pc_so3.yaml <- Rotation invariant point cloud model experiment
```

Here is an example of training a rotation invariant point cloud model
Expand All @@ -99,6 +101,14 @@ Override parts of the experiment config via command line or manually in the conf
python src/br/models/train.py experiment=cellpack/pc_so3 model=pc/classical_earthmovers_sphere ++csv.save_dir=[SAVE_DIR]
```

4. To compute embeddings from the trained models, update the data paths in the [datamodule files](../configs/data/) and run

```
python src/br/analysis/run_embeddings.py --save_path "./outputs/" --sdf False --dataset_name "pcna" --batch_size 5 --debug False
```

where dataset_name corresponds to a [result config](../configs/results/). The sdf argument should be set to True for experiments involving SDFs like the [npm1 dataset](../configs/results/npm1.yaml) and [other polymorphic dataset](../configs/experiment/other_polymorphic/).

## Steps to download pre-trained models and pre-computed embeddings

1. To skip model training, download pre-trained models
Expand All @@ -110,7 +120,7 @@ python src/br/models/train.py experiment=cellpack/pc_so3 model=pc/classical_eart
* [WTC-11 hIPSc single cell image dataset v1 polymorphic structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/other_polymorphic/)
* [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/npm1_perturb/)

2. Download pre-computed embeddings
2. To skip computing embeddings, download pre-computed embeddings

* [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/cellpack/)
* [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/pcna/)
Expand All @@ -121,16 +131,22 @@ python src/br/models/train.py experiment=cellpack/pc_so3 model=pc/classical_eart

## Steps to run benchmarking analysis

1. Run analysis for each dataset separately via jupyter notebooks
1. To compute benchmarking features from the embeddings and trained models, run

```
python src/br/analysis/run_features.py --save_path "/outputs_cellpack/" --embeddings_path "./morphology_appropriate_representation_learning/model_embeddings/cellpack" --sdf False --dataset_name "cellpack" --debug False
```

where dataset_name corresponds to a [result config](../configs/results/).

2. To run analysis like latent walks and archetype analysis on the embeddings and trained models, run

```
python src/br/analysis/run_analysis.py --save_path "./outputs_cellpack/" --embeddings_path "./morphology_appropriate_representation_learning/model_embeddings/cellpack" --dataset_name "cellpack" --run_name "Rotation_invariant_pointcloud_jitter" --sdf False
```

3. To run drug perturbation analysis, run

```
python src/br/analysis/run_drugdata_analysis.py --save_path "./outputs_npm1_perturb/" --embeddings_path "./morphology_appropriate_representation_learning/model_embeddings/npm1_perturb/" --dataset_name "npm1_perturb"
```
└── src
└── br
   └── notebooks
      ├── fig2_cellpack.ipynb <- Reproduce Fig 2 cellPACK synthetic data results
      ├── fig3_pcna.ipynb <- Reproduce Fig 3 PCNA data results
      ├── fig4_other_punctate.ipynb <- Reproduce Fig 4 other puntate structure data results
      ├── fig5_npm1.ipynb <- Reproduce Fig 5 npm1 data results
      ├── fig6_other_polymorphic.ipynb <- Reproduce Fig 6 other polymorphic data results
      └── fig7_drug_data.ipynb <- Reproduce Fig 7 drug data results
```
File renamed without changes.
Loading