Skip to content

Commit

Permalink
Preprocessing instructions know there are two environments
Browse files Browse the repository at this point in the history
  • Loading branch information
pgarrison committed Nov 23, 2024
1 parent 9766a9d commit 396a749
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 56 deletions.
5 changes: 3 additions & 2 deletions docs/USAGE.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,8 @@ Coming soon.

Training these models can take weeks. We've published our trained models so you don't have to. Skip to the next section if you'd like to just use our models.

1. Update the final single cell dataset path (`SINGLE_CELL_DATASET_PATH`) and the column in the manifest for appropriate input modality (`SDF_COLUMN`/`SEG_COLUMN`/`POINTCLOUD_COLUMN`/`IMAGE_COLUMN`) in each datamodule yaml files. e.g. for PCNA data these yaml files are located here -
1. Create a single cell manifest (e.g. csv, parquet) for each dataset with a column corresponding to final processed paths, and create a split column corresponding to train/test/validation split.
2. Update the final single cell dataset path (`SINGLE_CELL_DATASET_PATH`) and the column in the manifest for appropriate input modality (`SDF_COLUMN`/`SEG_COLUMN`/`POINTCLOUD_COLUMN`/`IMAGE_COLUMN`) in each datamodule yaml files. e.g. for PCNA data these yaml files are located here -

```
└── configs
Expand All @@ -76,7 +77,7 @@ Training these models can take weeks. We've published our trained models so you
         └── pc_intensity_jitter.yaml <- Datamodule for PCNA point clouds with intensity and jitter
```

2. Train models using cyto_dl. Ensure to run the training scripts from the folder where the repo was cloned (and where all the data was downloaded). Experiment configs for point cloud and image models are located here -
3. Train models using cyto_dl. Ensure to run the training scripts from the folder where the repo was cloned (and where all the data was downloaded). Experiment configs for point cloud and image models are located here -

```
└── configs
Expand Down
56 changes: 2 additions & 54 deletions subpackages/image_preprocessing/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Single cell image preprocessing

Code for preprocessing 3D single cell images
Code for alignment, masking, and registration of 3D single cell images.

# Installation

Expand All @@ -17,60 +17,8 @@ pip install -r requirements.txt
pip install -e .
```

# Configure input data

1. Datasets are hosted on quilt. Download raw data at the following links

* [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/cellPACK_single_cell_punctate_structure/)
* [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/packages/aics/nuclear_project_dataset_4)
* [WTC-11 hIPSc single cell image dataset v1](https://staging.allencellquilt.org/b/allencell/tree/aics/hipsc_single_cell_image_dataset/)
* [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/NPM1_single_cell_drug_perturbations/)

> [!NOTE]
> Ensure to download all the data in the same folder where the repo was cloned!
2. For image preprocessing used for punctate structures, update the data paths in:

```
image_preprocessing
└── config
   └── config.yaml <- Data config for image processing workflow
```

Then follow the [installation](src/br/data/preprocessing/image_preprocessing/README.md) steps to run the snakefile located in

```
image_preprocessing
└── Snakefile <- Image preprocessing workflow. Combines alignment, masking, registration
```

For point cloud preprocessing for punctate structures, update data paths and run the workflow in

```
src
└── br
└── data
   └── preprocessing
      └── pc_preprocessing
         └── punctate_cyto.py <- Point cloud sampling from raw images for punctate structures here
```

For SDF preprocessing for polymorphic structures, update data paths and run the workflows in

```
src
└── br
└── data
   └── preprocessing
      └── sdf_preprocessing
         ├── image_sdfs.py <- Create 32**3 resolution SDF images
         └── pc_sdfs.py <- Sample point clouds from 32**3 resolution SDF images
```

In all cases, create a single cell manifest (e.g. csv, parquet) for each dataset with a column corresponding to final processed paths, and create a split column corresponding to train/test/validation split.

# Usage
Once data is downloaded and config files are set up, run preprocessing scripts.
```bash
snakemake -s Snakefile --cores all
```
```

0 comments on commit 396a749

Please sign in to comment.