Preprocessing instructions know there are two environments

AllenCell · Nov 23, 2024 · 396a749 · 396a749
1 parent 9766a9d
commit 396a749
Show file tree

Hide file tree

Showing 2 changed files with 5 additions and 56 deletions.
diff --git a/docs/USAGE.md b/docs/USAGE.md
@@ -64,7 +64,8 @@ Coming soon.
 
 Training these models can take weeks. We've published our trained models so you don't have to. Skip to the next section if you'd like to just use our models.
 
-1. Update the final single cell dataset path (`SINGLE_CELL_DATASET_PATH`) and the column in the manifest for appropriate input modality (`SDF_COLUMN`/`SEG_COLUMN`/`POINTCLOUD_COLUMN`/`IMAGE_COLUMN`) in each datamodule yaml files. e.g. for PCNA data these yaml files are located here -
+1. Create a single cell manifest (e.g. csv, parquet) for each dataset with a column corresponding to final processed paths, and create a split column corresponding to train/test/validation split.
+2. Update the final single cell dataset path (`SINGLE_CELL_DATASET_PATH`) and the column in the manifest for appropriate input modality (`SDF_COLUMN`/`SEG_COLUMN`/`POINTCLOUD_COLUMN`/`IMAGE_COLUMN`) in each datamodule yaml files. e.g. for PCNA data these yaml files are located here -
 
 ```
 └── configs
@@ -76,7 +77,7 @@ Training these models can take weeks. We've published our trained models so you
             └── pc_intensity_jitter.yaml <- Datamodule for PCNA point clouds with intensity and jitter
 ```
 
-2. Train models using cyto_dl. Ensure to run the training scripts from the folder where the repo was cloned (and where all the data was downloaded). Experiment configs for point cloud and image models are located here -
+3. Train models using cyto_dl. Ensure to run the training scripts from the folder where the repo was cloned (and where all the data was downloaded). Experiment configs for point cloud and image models are located here -
 
 ```
 └── configs

diff --git a/subpackages/image_preprocessing/README.md b/subpackages/image_preprocessing/README.md
@@ -1,6 +1,6 @@
 # Single cell image preprocessing
 
-Code for preprocessing 3D single cell images
+Code for alignment, masking, and registration of 3D single cell images.
 
 # Installation
 
@@ -17,60 +17,8 @@ pip install -r requirements.txt
 pip install -e .
 ```
 
-# Configure input data
-
-1. Datasets are hosted on quilt. Download raw data at the following links
-
-* [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/cellPACK_single_cell_punctate_structure/)
-* [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/packages/aics/nuclear_project_dataset_4)
-* [WTC-11 hIPSc single cell image dataset v1](https://staging.allencellquilt.org/b/allencell/tree/aics/hipsc_single_cell_image_dataset/)
-* [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/NPM1_single_cell_drug_perturbations/)
-
-> [!NOTE]  
-> Ensure to download all the data in the same folder where the repo was cloned!
-
-2. For image preprocessing used for punctate structures, update the data paths in:
-
-```
-image_preprocessing
-└── config
-    └── config.yaml <- Data config for image processing workflow
-```
-
-Then follow the [installation](src/br/data/preprocessing/image_preprocessing/README.md) steps to run the snakefile located in
-
-```
-image_preprocessing
-└── Snakefile <- Image preprocessing workflow. Combines alignment, masking, registration
-```
-
-For point cloud preprocessing for punctate structures, update data paths and run the workflow in
-
-```
-src
-└── br
-    └── data
-        └── preprocessing
-            └── pc_preprocessing
-                └── punctate_cyto.py <- Point cloud sampling from raw images for punctate structures here
-```
-
-For SDF preprocessing for polymorphic structures, update data paths and run the workflows in
-
-```
-src
-└── br
-    └── data
-       └── preprocessing
-           └── sdf_preprocessing
-               ├── image_sdfs.py <- Create 32**3 resolution SDF images
-               └── pc_sdfs.py    <- Sample point clouds from 32**3 resolution SDF images
-```
-
-In all cases, create a single cell manifest (e.g. csv, parquet) for each dataset with a column corresponding to final processed paths, and create a split column corresponding to train/test/validation split.
-
 # Usage
 Once data is downloaded and config files are set up, run preprocessing scripts.
 ```bash
 snakemake -s Snakefile --cores all
-```
+```