Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move all notebooks to scripts #55

Closed
wants to merge 32 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
584d553
Modified the .toml file to include joblib and also created a prereq.p…
Nov 20, 2024
419495f
Resolved merge conflict for the lock and toml files
Nov 20, 2024
51368a6
Resolved merge conflict for the lock and toml files
Nov 20, 2024
3349ab2
Missed a few conflicts in the lock file, updating these
Nov 20, 2024
6ef33d6
run pre-commit
Nov 20, 2024
bcac5be
Merge branch 'scripts_features' of https://github.com/AllenCell/bench…
Nov 20, 2024
7f44cc0
delete hidden snakemake folder
Nov 20, 2024
396b9d0
change pcna result order
Nov 20, 2024
125fa1b
add setup function to prereq to handle SDF/pc differences
Nov 20, 2024
ed4a153
merge main
Nov 20, 2024
6af2d0f
add classes and labels to results config
ritvikvasan Nov 20, 2024
d639d2d
move to run_embeddings and run_features
ritvikvasan Nov 20, 2024
d2867c6
remove hidden snakemake files
ritvikvasan Nov 20, 2024
2318d5c
use cytodl config path for dataloading
ritvikvasan Nov 20, 2024
c1e475c
debug runs
ritvikvasan Nov 21, 2024
4a9bd8e
test runs for embeddings and features pass for PCNA dataset
Nov 21, 2024
450dbdd
remove pdb
Nov 21, 2024
ce9b95a
move gpu stuff to utils
Nov 21, 2024
4f80f71
move fig notebooks into single analysis script
Nov 21, 2024
c66430f
working analysis script for punctate structures
ritvikvasan Nov 22, 2024
d2dbb19
add sdf analysis utils and merge script
ritvikvasan Nov 22, 2024
24a0bc3
fix nonetype error
ritvikvasan Nov 22, 2024
ee4a6b7
working sdf analysis runs
ritvikvasan Nov 22, 2024
3c59c12
Update data paths in the results config for other_polymorphic.yaml
fatwir Nov 22, 2024
2c8e982
Update data paths in the results config for npm1_perturb.yaml
fatwir Nov 22, 2024
25eca16
Update data paths in the results config for other_punctate.yaml
fatwir Nov 22, 2024
11a91bf
Update other_punctate.yaml
fatwir Nov 22, 2024
25376de
add str2bool check
ritvikvasan Nov 22, 2024
ce605cb
rename npm1 experiment
ritvikvasan Nov 22, 2024
209384d
fix cellpack analysis error (no s key)
Nov 22, 2024
e640adb
remove notebooks and make script for drugdata analysis
Nov 22, 2024
ba9f7ef
remvove selected gpu
Nov 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/actions/pdm/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ runs:
using: composite
steps:
- name: Set up PDM
uses: pdm-project/setup-pdm@568ddd69406b30de1774ec0044b73ae06e716aa4 # v4.1
uses: pdm-project/setup-pdm@568ddd69406b30de1774ec0044b73ae06e716aa4 # v4.1
with:
python-version: "3.10"
version: 2.20.0.post1
Expand Down
5 changes: 2 additions & 3 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,12 @@ concurrency:
cancel-in-progress: true

jobs:

test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: ./.github/actions/pdm

- name: Check that pdm.lock matches pyproject.toml
shell: bash
run: pdm lock --check
run: pdm lock --check
2 changes: 2 additions & 0 deletions ADVANCED_INSTALLATION.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# Installation and usage with pdm

1. [Install pdm](https://pdm-project.org/en/latest/#recommended-installation-method)
2. Install dependencies: `pdm sync --no-isolation`. (The `--no-isolation` flag is required for `torch-scatter`.)
3. Prefix every `python` command with `pdm run`. For example:

```
pdm run python src/br/models/train.py experiment=cellpack/pc_equiv
```
51 changes: 31 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,19 @@
Code for training and benchmarking morphology appropriate representation learning methods.

# Preprocessing

This README gives instructions for running our analysis against preprocessed image data. Because the preprocessing can take a long time, we are publishing preprocessed data along with the original movies. To do the preprocessing yourself, see the instructions in [subpackages/image_preprocessing/README.md](./subpackages/image_preprocessing/README.md).

# Installation

To install and use this software, you need:
* A GPU running CUDA 11.7 (other CUDA versions may work, but they are not officially supported),
* [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) (or Python 3.10 and [pdm](https://pdm-project.org/)), and
* [git](https://github.com/git-guides/install-git).

- A GPU running CUDA 11.7 (other CUDA versions may work, but they are not officially supported),
- [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) (or Python 3.10 and [pdm](https://pdm-project.org/)), and
- [git](https://github.com/git-guides/install-git).

First, clone this repository.

```bash
git clone https://github.com/AllenCell/benchmarking_representations
cd benchmarking_representations
Expand All @@ -29,11 +32,13 @@ Depending on your GPU set-up, you may need to set the `CUDA_VISIBLE_DEVICES` [en
To achieve this, you will first need to get the Universally Unique IDs for the GPUs and then set `CUDA_VISIBLE_DEVICES` to some/all of those (a comma-separated list), as in the following examples.

**Example 1**

```bash
export CUDA_VISIBLE_DEVICES=0,1
```

**Example 2:** Using one partition of a MIG partitioned GPU

```bash
export CUDA_VISIBLE_DEVICES=MIG-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
```
Expand All @@ -49,31 +54,36 @@ pip install -e .
For `pdm` users, follow [these installation steps instead](./ADVANCED_INSTALLATION.md).

## Troubleshooting

**Q:** When installing dependencies, pytorch fails to install with the following error message.

```bash
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus
```

**A:** You may need to configure the `CUDA_VISIBLE_DEVICES` [environment variable](https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/).

## Set env variables

To run the models, you must set the `CYTODL_CONFIG_PATH` environment variable to point to the `br/configs` folder.
Check that your current working directory is the `benchmarking_representations` folder, then run the following command (this will last for only the duration of your shell session).

```bash
export CYTODL_CONFIG_PATH=$PWD/configs/
```

# Usage

## Steps to download and preprocess data

1. Datasets are hosted on quilt. Download raw data at the following links

* [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/cellPACK_single_cell_punctate_structure/)
* [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/packages/aics/nuclear_project_dataset_4)
* [WTC-11 hIPSc single cell image dataset v1](https://staging.allencellquilt.org/b/allencell/tree/aics/hipsc_single_cell_image_dataset/)
* [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/NPM1_single_cell_drug_perturbations/)
- [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/cellPACK_single_cell_punctate_structure/)
- [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/packages/aics/nuclear_project_dataset_4)
- [WTC-11 hIPSc single cell image dataset v1](https://staging.allencellquilt.org/b/allencell/tree/aics/hipsc_single_cell_image_dataset/)
- [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/NPM1_single_cell_drug_perturbations/)

> [!NOTE]
> \[!NOTE\]
> Ensure to download all the data in the same folder where the repo was cloned!

2. Once data is downloaded, run preprocessing scripts to create the final image/pointcloud/SDF datasets (this step is not necessary for the cellPACK dataset). For image preprocessing used for punctate structures, install [snakemake](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) and update the data paths in
Expand Down Expand Up @@ -156,21 +166,21 @@ python src/br/models/train.py experiment=cellpack/pc_so3 model=pc/classical_eart

1. To skip model training, download pre-trained models

* [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/cellpack/)
* [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/pcna/)
* [WTC-11 hIPSc single cell image dataset v1 punctate structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/other_punctate/)
* [WTC-11 hIPSc single cell image dataset v1 nucleolus (NPM1)](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/npm1/)
* [WTC-11 hIPSc single cell image dataset v1 polymorphic structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/other_polymorphic/)
* [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/npm1_perturb/)
- [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/cellpack/)
- [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/pcna/)
- [WTC-11 hIPSc single cell image dataset v1 punctate structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/other_punctate/)
- [WTC-11 hIPSc single cell image dataset v1 nucleolus (NPM1)](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/npm1/)
- [WTC-11 hIPSc single cell image dataset v1 polymorphic structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/other_polymorphic/)
- [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/npm1_perturb/)

2. Download pre-computed embeddings

* [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/cellpack/)
* [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/pcna/)
* [WTC-11 hIPSc single cell image dataset v1 punctate structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/other_punctate/)
* [WTC-11 hIPSc single cell image dataset v1 nucleolus (NPM1)](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/npm1/)
* [WTC-11 hIPSc single cell image dataset v1 polymorphic structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/other_polymorphic/)
* [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/npm1_perturb/)
- [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/cellpack/)
- [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/pcna/)
- [WTC-11 hIPSc single cell image dataset v1 punctate structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/other_punctate/)
- [WTC-11 hIPSc single cell image dataset v1 nucleolus (NPM1)](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/npm1/)
- [WTC-11 hIPSc single cell image dataset v1 polymorphic structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/other_polymorphic/)
- [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/npm1_perturb/)

## Steps to run benchmarking analysis

Expand All @@ -188,6 +198,7 @@ python src/br/models/train.py experiment=cellpack/pc_so3 model=pc/classical_eart
```

# Development

## Project Organization

```
Expand Down
2 changes: 1 addition & 1 deletion configs/logger/csv.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ csv:
_target_: lightning.pytorch.loggers.csv_logs.CSVLogger
save_dir: "${paths.output_dir}"
name: "csv/"
prefix: ""
prefix: ""
16 changes: 9 additions & 7 deletions configs/results/cellpack.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,13 @@ names:
"Rotation_invariant_pointcloud",
"Rotation_invariant_pointcloud_jitter",
]
data_paths: [
"./configs/data/cellpack/image.yaml",
"./configs/data/cellpack/image.yaml",
"./configs/data/cellpack/pc.yaml",
"./configs/data/cellpack/pc.yaml",
# "./src/br/configs/data/cellpack/pc_jitter.yaml",
"./configs/data/cellpack/pc.yaml",
data_paths:
[
"/data/cellpack/image.yaml",
"/data/cellpack/image.yaml",
"/data/cellpack/pc.yaml",
"/data/cellpack/pc.yaml",
"/data/cellpack/pc.yaml",
]
classification_label: ["rule"]
regression_label:
13 changes: 8 additions & 5 deletions configs/results/npm1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,12 @@ names:
]
data_paths:
[
"./configs/data/npm1/pc.yaml",
"./configs/data/npm1/so3_image_sdf.yaml",
"./configs/data/npm1/so3_image_seg.yaml",
"./configs/data/npm1/classical_image_sdf.yaml",
"./configs/data/npm1/classical_image_seg.yaml",
"/data/npm1/pc.yaml",
"/data/npm1/so3_image_sdf.yaml",
"/data/npm1/so3_image_seg.yaml",
"/data/npm1/classical_image_sdf.yaml",
"/data/npm1/classical_image_seg.yaml",
]
classification_label: ["STR_connectivity_cc_thresh"]
regression_label:
["mean_centroid_distances", "mean_nucleolus_volume", "mean_nucleolus_area"]
10 changes: 5 additions & 5 deletions configs/results/npm1_perturb.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ names:
]
data_paths:
[
"./configs/data/npm1_perturb/pc.yaml",
"./configs/data/npm1_perturb/classical_image_sdf.yaml",
"./configs/data/npm1_perturb/classical_image_seg.yaml",
"./configs/data/npm1_perturb/so3_image_sdf.yaml",
"./configs/data/npm1_perturb/so3_image_seg.yaml",
"/data/npm1_perturb/pc.yaml",
"/data/npm1_perturb/classical_image_sdf.yaml",
"/data/npm1_perturb/classical_image_seg.yaml",
"/data/npm1_perturb/so3_image_sdf.yaml",
"/data/npm1_perturb/so3_image_seg.yaml",
]
12 changes: 7 additions & 5 deletions configs/results/other_polymorphic.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,11 @@ names:
]
data_paths:
[
"./configs/data/other_polymorphic/pc.yaml",
"./configs/data/other_polymorphic/so3_image_sdf.yaml",
"./configs/data/other_polymorphic/so3_image_seg.yaml",
"./configs/data/other_polymorphic/classical_image_sdf.yaml",
"./configs/data/other_polymorphic/classical_image_seg.yaml",
"/data/other_polymorphic/pc.yaml",
"/data/other_polymorphic/so3_image_sdf.yaml",
"/data/other_polymorphic/so3_image_seg.yaml",
"/data/other_polymorphic/classical_image_sdf.yaml",
"/data/other_polymorphic/classical_image_seg.yaml",
]
classification_label: ["structure_name"]
regression_label: ["avg_dists", "mean_volume", "mean_surface_area"]
13 changes: 6 additions & 7 deletions configs/results/other_punctate.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,21 @@ model_checkpoints:
"./morphology_appropriate_representation_learning/model_checkpoints/other_punctate/Classical_image.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/other_punctate/Rotation_invariant_image.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/other_punctate/Classical_pointcloud.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/other_punctate/Rotation_invariant_pointcloud.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/other_punctate/Rotation_invariant_pointcloud_structurenorm.ckpt",
]
names:
[
"Classical_image",
"Rotation_invariant_image",
"Classical_pointcloud",
"Rotation_invariant_pointcloud",
"Rotation_invariant_pointcloud_structurenorm",
]
data_paths:
[
"./configs/data/other_punctate/image.yaml",
"./configs/data/other_punctate/image.yaml",
"./configs/data/other_punctate/pc.yaml",
"./configs/data/other_punctate/pc_intensity.yaml",
"./configs/data/other_punctate/pc_intensity_structurenorm.yaml",
"/data/other_punctate/image.yaml",
"/data/other_punctate/image.yaml",
"/data/other_punctate/pc.yaml",
"/data/other_punctate/pc_intensity_structurenorm.yaml",
]
classification_label: ["structure_name", "cell_stage"]
regression_label:
20 changes: 11 additions & 9 deletions configs/results/pcna.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,25 +3,27 @@ image_path: ./morphology_appropriate_representation_learning/preprocessed_data/p
pc_path: ./morphology_appropriate_representation_learning/preprocessed_data/pcna/manifest.csv
model_checkpoints:
[
"./morphology_appropriate_representation_learning/model_checkpoints/pcna/Classical_pointcloud.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/pcna/Rotation_invariant_pointcloud.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/pcna/Classical_image.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/pcna/Rotation_invariant_image.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/pcna/Classical_pointcloud.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/pcna/Rotation_invariant_pointcloud.ckpt",
"./morphology_appropriate_representation_learning/model_checkpoints/pcna/Rotation_invariant_pointcloud_jitter.ckpt",
]
names:
[
"Classical_pointcloud",
"Rotation_invariant_pointcloud",
"Classical_image",
"Rotation_invariant_image",
"Classical_pointcloud",
"Rotation_invariant_pointcloud",
"Rotation_invariant_pointcloud_jitter",
]
data_paths: [
"./configs/data/pcna/pc.yaml",
"./configs/data/pcna/pc_intensity.yaml",
"./configs/data/pcna/image.yaml",
"./configs/data/pcna/image.yaml",
"/data/pcna/image.yaml",
"/data/pcna/image.yaml",
"/data/pcna/pc.yaml",
"/data/pcna/pc_intensity.yaml",
# "./src/br/configs/data/pcna/pc_intensity_jitter.yaml",
"./configs/data/pcna/pc_intensity.yaml",
"/data/pcna/pc_intensity.yaml",
]
classification_label: ["cell_stage_fine", "flag_comment"]
regression_label:
18 changes: 9 additions & 9 deletions pdm.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ dependencies = [
"pip",
"ipython",
"ipykernel>=6.29.5",
"joblib>=1.4.2",
]

[project.urls]
Expand Down Expand Up @@ -72,4 +73,3 @@ dev = ["-e file:///${PROJECT_ROOT}/#egg=benchmarking-representations"]
[build-system]
requires = ["pdm-backend"]
build-backend = "pdm.backend"

File renamed without changes.
Loading
Loading