Skip to content

Commit

Permalink
Merge pull request #56 from AllenCell/Pre-Commit-analyses
Browse files Browse the repository at this point in the history
Pre commit analyses
  • Loading branch information
fatwir authored Nov 25, 2024
2 parents 05a55fd + 8047c10 commit 1056b7f
Show file tree
Hide file tree
Showing 27 changed files with 126 additions and 128 deletions.
2 changes: 1 addition & 1 deletion .github/actions/pdm/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ runs:
using: composite
steps:
- name: Set up PDM
uses: pdm-project/setup-pdm@568ddd69406b30de1774ec0044b73ae06e716aa4 # v4.1
uses: pdm-project/setup-pdm@568ddd69406b30de1774ec0044b73ae06e716aa4 # v4.1
with:
python-version: "3.10"
version: 2.20.0.post1
Expand Down
5 changes: 2 additions & 3 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,12 @@ concurrency:
cancel-in-progress: true

jobs:

test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: ./.github/actions/pdm

- name: Check that pdm.lock matches pyproject.toml
shell: bash
run: pdm lock --check
run: pdm lock --check
2 changes: 2 additions & 0 deletions ADVANCED_INSTALLATION.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# Installation and usage with pdm

1. [Install pdm](https://pdm-project.org/en/latest/#recommended-installation-method)
2. Install dependencies: `pdm sync --no-isolation`. (The `--no-isolation` flag is required for `torch-scatter`.)
3. Prefix every `python` command with `pdm run`. For example:

```
pdm run python src/br/models/train.py experiment=cellpack/pc_equiv
```
50 changes: 30 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,13 @@ This README gives instructions for running our analysis against preprocessed ima
# Installation

To install and use this software, you need:
* A GPU running CUDA 11.7 (other CUDA versions may work, but they are not officially supported),
* [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) (or Python 3.10 and [pdm](https://pdm-project.org/)), and
* [git](https://github.com/git-guides/install-git).

- A GPU running CUDA 11.7 (other CUDA versions may work, but they are not officially supported),
- [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) (or Python 3.10 and [pdm](https://pdm-project.org/)), and
- [git](https://github.com/git-guides/install-git).

First, clone this repository.

```bash
git clone https://github.com/AllenCell/benchmarking_representations
cd benchmarking_representations
Expand All @@ -29,11 +31,13 @@ Depending on your GPU set-up, you may need to set the `CUDA_VISIBLE_DEVICES` [en
To achieve this, you will first need to get the Universally Unique IDs for the GPUs and then set `CUDA_VISIBLE_DEVICES` to some/all of those (a comma-separated list), as in the following examples.

**Example 1**

```bash
export CUDA_VISIBLE_DEVICES=0,1
```

**Example 2:** Using one partition of a MIG partitioned GPU

```bash
export CUDA_VISIBLE_DEVICES=MIG-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
```
Expand All @@ -49,31 +53,36 @@ pip install -e .
For `pdm` users, follow [these installation steps instead](./ADVANCED_INSTALLATION.md).

## Troubleshooting

**Q:** When installing dependencies, pytorch fails to install with the following error message.

```bash
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus
```

**A:** You may need to configure the `CUDA_VISIBLE_DEVICES` [environment variable](https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/).

## Set env variables

To run the models, you must set the `CYTODL_CONFIG_PATH` environment variable to point to the `br/configs` folder.
Check that your current working directory is the `benchmarking_representations` folder, then run the following command (this will last for only the duration of your shell session).

```bash
export CYTODL_CONFIG_PATH=$PWD/configs/
```

# Usage

## Steps to download and preprocess data

1. Datasets are hosted on quilt. Download raw data at the following links

* [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/cellPACK_single_cell_punctate_structure/)
* [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/packages/aics/nuclear_project_dataset_4)
* [WTC-11 hIPSc single cell image dataset v1](https://staging.allencellquilt.org/b/allencell/tree/aics/hipsc_single_cell_image_dataset/)
* [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/NPM1_single_cell_drug_perturbations/)
- [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/cellPACK_single_cell_punctate_structure/)
- [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/packages/aics/nuclear_project_dataset_4)
- [WTC-11 hIPSc single cell image dataset v1](https://staging.allencellquilt.org/b/allencell/tree/aics/hipsc_single_cell_image_dataset/)
- [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/NPM1_single_cell_drug_perturbations/)

> [!NOTE]
> \[!NOTE\]
> Ensure to download all the data in the same folder where the repo was cloned!
2. Once data is downloaded, run preprocessing scripts to create the final image/pointcloud/SDF datasets (this step is not necessary for the cellPACK dataset). For image preprocessing used for punctate structures, install [snakemake](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) and update the data paths in
Expand Down Expand Up @@ -156,21 +165,21 @@ python src/br/models/train.py experiment=cellpack/pc_so3 model=pc/classical_eart

1. To skip model training, download pre-trained models

* [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/cellpack/)
* [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/pcna/)
* [WTC-11 hIPSc single cell image dataset v1 punctate structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/other_punctate/)
* [WTC-11 hIPSc single cell image dataset v1 nucleolus (NPM1)](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/npm1/)
* [WTC-11 hIPSc single cell image dataset v1 polymorphic structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/other_polymorphic/)
* [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/npm1_perturb/)
- [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/cellpack/)
- [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/pcna/)
- [WTC-11 hIPSc single cell image dataset v1 punctate structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/other_punctate/)
- [WTC-11 hIPSc single cell image dataset v1 nucleolus (NPM1)](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/npm1/)
- [WTC-11 hIPSc single cell image dataset v1 polymorphic structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/other_polymorphic/)
- [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/npm1_perturb/)

2. Download pre-computed embeddings

* [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/cellpack/)
* [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/pcna/)
* [WTC-11 hIPSc single cell image dataset v1 punctate structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/other_punctate/)
* [WTC-11 hIPSc single cell image dataset v1 nucleolus (NPM1)](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/npm1/)
* [WTC-11 hIPSc single cell image dataset v1 polymorphic structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/other_polymorphic/)
* [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/npm1_perturb/)
- [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/cellpack/)
- [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/pcna/)
- [WTC-11 hIPSc single cell image dataset v1 punctate structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/other_punctate/)
- [WTC-11 hIPSc single cell image dataset v1 nucleolus (NPM1)](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/npm1/)
- [WTC-11 hIPSc single cell image dataset v1 polymorphic structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/other_polymorphic/)
- [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/npm1_perturb/)

## Steps to run benchmarking analysis

Expand All @@ -188,6 +197,7 @@ python src/br/models/train.py experiment=cellpack/pc_so3 model=pc/classical_eart
```

# Development

## Project Organization

```
Expand Down
2 changes: 1 addition & 1 deletion configs/logger/csv.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ csv:
_target_: lightning.pytorch.loggers.csv_logs.CSVLogger
save_dir: "${paths.output_dir}"
name: "csv/"
prefix: ""
prefix: ""
1 change: 0 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -72,4 +72,3 @@ dev = ["-e file:///${PROJECT_ROOT}/#egg=benchmarking-representations"]
[build-system]
requires = ["pdm-backend"]
build-backend = "pdm.backend"

7 changes: 6 additions & 1 deletion src/br/cellpack/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,19 @@ pip install quilt3
```

## Usage

1. Get the reference nuclear shapes:

```bash
python get_reference_nuclear_shapes.py
```

2. Generate the synthetic data:

```bash
python generate_synthetic_data.py
```

Additional options can be specified through the command line. Run `python generate_synthetic_data.py --help` for more information.

The generated synthetic data will be saved in `data/packings`
The generated synthetic data will be saved in `data/packings`
2 changes: 1 addition & 1 deletion src/br/cellpack/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
"""Methods to generate simulated cellPACK data"""
"""Methods to generate simulated cellPACK data."""
60 changes: 24 additions & 36 deletions src/br/cellpack/generate_synthetic_data.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
import os
import json
import pandas as pd
import numpy as np
import concurrent.futures
import gc
import json
import multiprocessing
from time import time
import os
import subprocess
from pathlib import Path
import gc
from time import time

import fire
import numpy as np
import pandas as pd

RULES = [
"random",
Expand All @@ -33,9 +34,7 @@
RECIPE_TEMPLATE_PATH = DATADIR / "templates"
TEMPLATE_FILES = os.listdir(RECIPE_TEMPLATE_PATH)
TEMPLATE_FILES = [
RECIPE_TEMPLATE_PATH / file
for file in TEMPLATE_FILES
if file.split(".")[-1] == "json"
RECIPE_TEMPLATE_PATH / file for file in TEMPLATE_FILES if file.split(".")[-1] == "json"
]

GENERATED_RECIPE_PATH = DATADIR / "generated_recipes"
Expand All @@ -59,8 +58,7 @@ def create_rule_files(
shape_angles=ANGLES,
mesh_path=MESH_PATH,
):
"""
Create rule files for each combination of shape IDs and angles.
"""Create rule files for each combination of shape IDs and angles.
Args:
cellpack_rules (list): List of rule file paths.
Expand All @@ -72,7 +70,7 @@ def create_rule_files(
"""
for rule in cellpack_rules:
print(f"Creating files for {rule}")
with open(rule, "r") as j:
with open(rule) as j:
contents = json.load(j)
contents_shape = contents.copy()
base_version = contents_shape["version"]
Expand All @@ -82,24 +80,22 @@ def create_rule_files(
this_row = this_row.loc[this_row["angle"] == ang]

contents_shape["version"] = f"{base_version}_{this_id}_{ang}"
contents_shape["objects"]["mean_nucleus"]["representations"][
"mesh"
]["name"] = f"{this_id}_{ang}.obj"
contents_shape["objects"]["mean_nucleus"]["representations"][
"mesh"
]["path"] = str(mesh_path)
contents_shape["objects"]["mean_nucleus"]["representations"]["mesh"][
"name"
] = f"{this_id}_{ang}.obj"
contents_shape["objects"]["mean_nucleus"]["representations"]["mesh"][
"path"
] = str(mesh_path)
# save json
with open(
generated_recipe_path
/ f"{base_version}_{this_id}_rotation_{ang}.json",
generated_recipe_path / f"{base_version}_{this_id}_rotation_{ang}.json",
"w",
) as f:
json.dump(contents_shape, f, indent=4)


def update_cellpack_config(config_path=CONFIG_PATH, output_path=DEFAULT_OUTPUT_PATH):
"""
Update the cellPack configuration file with the specified output path.
"""Update the cellPack configuration file with the specified output path.
Args:
config_path (str): The path to the CellPack configuration file.
Expand All @@ -108,16 +104,15 @@ def update_cellpack_config(config_path=CONFIG_PATH, output_path=DEFAULT_OUTPUT_P
Returns:
None
"""
with open(config_path, "r") as j:
with open(config_path) as j:
contents = json.load(j)
contents["out"] = str(output_path)
with open(config_path, "w") as f:
json.dump(contents, f, indent=4)


def get_files_to_use(generated_recipe_path, rules_to_use, shape_rotations):
"""
Retrieves a list of input files to use based on the given rules and shape rotations.
"""Retrieves a list of input files to use based on the given rules and shape rotations.
Args:
generated_recipe_path (str): The path to the directory containing the generated
Expand All @@ -127,7 +122,6 @@ def get_files_to_use(generated_recipe_path, rules_to_use, shape_rotations):
Returns:
input_files_to_use (list): A list of input files to use.
"""
files = os.listdir(generated_recipe_path)
max_num_files = np.inf
Expand All @@ -147,8 +141,7 @@ def get_files_to_use(generated_recipe_path, rules_to_use, shape_rotations):


def run_single_packing(recipe_path, config_path=CONFIG_PATH):
"""
Run the packing using the specified recipe and configuration files.
"""Run the packing using the specified recipe and configuration files.
Args:
recipe_path (str): The path to the recipe file.
Expand Down Expand Up @@ -189,8 +182,7 @@ def run_workflow(
generated_recipe_path=GENERATED_RECIPE_PATH,
template_files=TEMPLATE_FILES,
):
"""
Runs the workflow for generating synthetic data using cellPack.
"""Runs the workflow for generating synthetic data using cellPack.
Args:
output_path (str): Path to the output directory.
Expand Down Expand Up @@ -227,9 +219,7 @@ def run_workflow(
update_cellpack_config(config_path, output_path)

if input_files_to_use is None:
input_files_to_use = get_files_to_use(
generated_recipe_path, rules_to_use, shape_rotations
)
input_files_to_use = get_files_to_use(generated_recipe_path, rules_to_use, shape_rotations)

num_files = len(input_files_to_use)
print(f"Found {num_files} files")
Expand All @@ -247,9 +237,7 @@ def run_workflow(
skipped_count = 0
count = 0
failed_count = 0
with concurrent.futures.ProcessPoolExecutor(
max_workers=num_processes
) as executor:
with concurrent.futures.ProcessPoolExecutor(max_workers=num_processes) as executor:
for file in input_files_to_use:
fname = Path(file).stem
fname = "".join(fname.split("_rotation"))
Expand Down
3 changes: 2 additions & 1 deletion src/br/cellpack/get_reference_nuclear_shapes.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# %%
import quilt3
import os

import quilt3

# %%
b = quilt3.Bucket("s3://allencell")

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,4 @@
],
"projection_axis": "y"
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -88,4 +88,4 @@
}
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -88,4 +88,4 @@
}
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -88,4 +88,4 @@
}
}
}
}
}
2 changes: 1 addition & 1 deletion src/br/data/cellpack/templates/pcna_radial_gradient.json
Original file line number Diff line number Diff line change
Expand Up @@ -75,4 +75,4 @@
}
}
}
}
}
2 changes: 1 addition & 1 deletion src/br/data/cellpack/templates/pcna_random.json
Original file line number Diff line number Diff line change
Expand Up @@ -63,4 +63,4 @@
}
}
}
}
}
Loading

0 comments on commit 1056b7f

Please sign in to comment.