Merge pull request #56 from AllenCell/Pre-Commit-analyses

Pre commit analyses
AllenCell · Nov 25, 2024 · 1056b7f · 1056b7f
2 parents 05a55fd + 8047c10
commit 1056b7f
Show file tree

Hide file tree

Showing 27 changed files with 126 additions and 128 deletions.
diff --git a/.github/actions/pdm/action.yml b/.github/actions/pdm/action.yml
@@ -4,7 +4,7 @@ runs:
   using: composite
   steps:
     - name: Set up PDM
-      uses: pdm-project/setup-pdm@568ddd69406b30de1774ec0044b73ae06e716aa4  # v4.1
+      uses: pdm-project/setup-pdm@568ddd69406b30de1774ec0044b73ae06e716aa4 # v4.1
       with:
         python-version: "3.10"
         version: 2.20.0.post1

diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -12,13 +12,12 @@ concurrency:
   cancel-in-progress: true
 
 jobs:
-
   test:
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
+      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
       - uses: ./.github/actions/pdm
 
       - name: Check that pdm.lock matches pyproject.toml
         shell: bash
-        run: pdm lock --check
+        run: pdm lock --check
diff --git a/ADVANCED_INSTALLATION.md b/ADVANCED_INSTALLATION.md
@@ -1,7 +1,9 @@
 # Installation and usage with pdm
+
 1. [Install pdm](https://pdm-project.org/en/latest/#recommended-installation-method)
 2. Install dependencies: `pdm sync --no-isolation`. (The `--no-isolation` flag is required for `torch-scatter`.)
 3. Prefix every `python` command with `pdm run`. For example:
+
 ```
 pdm run python src/br/models/train.py experiment=cellpack/pc_equiv
 ```
diff --git a/README.md b/README.md
@@ -8,11 +8,13 @@ This README gives instructions for running our analysis against preprocessed ima
 # Installation
 
 To install and use this software, you need:
-* A GPU running CUDA 11.7 (other CUDA versions may work, but they are not officially supported),
-* [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) (or Python 3.10 and [pdm](https://pdm-project.org/)), and
-* [git](https://github.com/git-guides/install-git).
+
+- A GPU running CUDA 11.7 (other CUDA versions may work, but they are not officially supported),
+- [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) (or Python 3.10 and [pdm](https://pdm-project.org/)), and
+- [git](https://github.com/git-guides/install-git).
 
 First, clone this repository.
+
 ```bash
 git clone https://github.com/AllenCell/benchmarking_representations
 cd benchmarking_representations
@@ -29,11 +31,13 @@ Depending on your GPU set-up, you may need to set the `CUDA_VISIBLE_DEVICES` [en
 To achieve this, you will first need to get the Universally Unique IDs for the GPUs and then set `CUDA_VISIBLE_DEVICES` to some/all of those (a comma-separated list), as in the following examples.
 
 **Example 1**
+
 ```bash
 export CUDA_VISIBLE_DEVICES=0,1
 ```
 
 **Example 2:** Using one partition of a MIG partitioned GPU
+
 ```bash
 export CUDA_VISIBLE_DEVICES=MIG-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
 ```
@@ -49,31 +53,36 @@ pip install -e .
 For `pdm` users, follow [these installation steps instead](./ADVANCED_INSTALLATION.md).
 
 ## Troubleshooting
+
 **Q:** When installing dependencies, pytorch fails to install with the following error message.
+
 ```bash
 torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus
 ```
 
 **A:** You may need to configure the `CUDA_VISIBLE_DEVICES` [environment variable](https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/).
 
 ## Set env variables
+
 To run the models, you must set the `CYTODL_CONFIG_PATH` environment variable to point to the `br/configs` folder.
 Check that your current working directory is the `benchmarking_representations` folder, then run the following command (this will last for only the duration of your shell session).
+
 ```bash
 export CYTODL_CONFIG_PATH=$PWD/configs/
 ```
 
 # Usage
+
 ## Steps to download and preprocess data
 
 1. Datasets are hosted on quilt. Download raw data at the following links
 
-* [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/cellPACK_single_cell_punctate_structure/)
-* [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/packages/aics/nuclear_project_dataset_4)
-* [WTC-11 hIPSc single cell image dataset v1](https://staging.allencellquilt.org/b/allencell/tree/aics/hipsc_single_cell_image_dataset/)
-* [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/NPM1_single_cell_drug_perturbations/)
+- [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/cellPACK_single_cell_punctate_structure/)
+- [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/packages/aics/nuclear_project_dataset_4)
+- [WTC-11 hIPSc single cell image dataset v1](https://staging.allencellquilt.org/b/allencell/tree/aics/hipsc_single_cell_image_dataset/)
+- [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/NPM1_single_cell_drug_perturbations/)
 
-> [!NOTE]  
+> \[!NOTE\]
 > Ensure to download all the data in the same folder where the repo was cloned!
 
 2. Once data is downloaded, run preprocessing scripts to create the final image/pointcloud/SDF datasets (this step is not necessary for the cellPACK dataset). For image preprocessing used for punctate structures, install [snakemake](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) and update the data paths in
@@ -156,21 +165,21 @@ python src/br/models/train.py experiment=cellpack/pc_so3 model=pc/classical_eart
 
 1. To skip model training, download pre-trained models
 
-* [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/cellpack/)
-* [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/pcna/)
-* [WTC-11 hIPSc single cell image dataset v1 punctate structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/other_punctate/)
-* [WTC-11 hIPSc single cell image dataset v1 nucleolus (NPM1)](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/npm1/)
-* [WTC-11 hIPSc single cell image dataset v1 polymorphic structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/other_polymorphic/)
-* [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/npm1_perturb/)
+- [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/cellpack/)
+- [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/pcna/)
+- [WTC-11 hIPSc single cell image dataset v1 punctate structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/other_punctate/)
+- [WTC-11 hIPSc single cell image dataset v1 nucleolus (NPM1)](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/npm1/)
+- [WTC-11 hIPSc single cell image dataset v1 polymorphic structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/other_polymorphic/)
+- [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/npm1_perturb/)
 
 2. Download pre-computed embeddings
 
-* [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/cellpack/)
-* [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/pcna/)
-* [WTC-11 hIPSc single cell image dataset v1 punctate structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/other_punctate/)
-* [WTC-11 hIPSc single cell image dataset v1 nucleolus (NPM1)](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/npm1/)
-* [WTC-11 hIPSc single cell image dataset v1 polymorphic structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/other_polymorphic/)
-* [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/npm1_perturb/)
+- [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/cellpack/)
+- [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/pcna/)
+- [WTC-11 hIPSc single cell image dataset v1 punctate structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/other_punctate/)
+- [WTC-11 hIPSc single cell image dataset v1 nucleolus (NPM1)](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/npm1/)
+- [WTC-11 hIPSc single cell image dataset v1 polymorphic structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/other_polymorphic/)
+- [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/npm1_perturb/)
 
 ## Steps to run benchmarking analysis
 
@@ -188,6 +197,7 @@ python src/br/models/train.py experiment=cellpack/pc_so3 model=pc/classical_eart
 ```
 
 # Development
+
 ## Project Organization
 
 ```

diff --git a/configs/logger/csv.yaml b/configs/logger/csv.yaml
@@ -4,4 +4,4 @@ csv:
   _target_: lightning.pytorch.loggers.csv_logs.CSVLogger
   save_dir: "${paths.output_dir}"
   name: "csv/"
-  prefix: ""
+  prefix: ""
diff --git a/pyproject.toml b/pyproject.toml
@@ -72,4 +72,3 @@ dev = ["-e file:///${PROJECT_ROOT}/#egg=benchmarking-representations"]
 [build-system]
 requires = ["pdm-backend"]
 build-backend = "pdm.backend"
-
diff --git a/src/br/cellpack/README.md b/src/br/cellpack/README.md
@@ -14,14 +14,19 @@ pip install quilt3
 ```
 
 ## Usage
+
 1. Get the reference nuclear shapes:
+
 ```bash
 python get_reference_nuclear_shapes.py
 ```
+
 2. Generate the synthetic data:
+
 ```bash
 python generate_synthetic_data.py
 ```
+
 Additional options can be specified through the command line. Run `python generate_synthetic_data.py --help` for more information.
 
-The generated synthetic data will be saved in `data/packings`
+The generated synthetic data will be saved in `data/packings`
diff --git a/src/br/cellpack/__init__.py b/src/br/cellpack/__init__.py
@@ -1 +1 @@
-"""Methods to generate simulated cellPACK data"""
+"""Methods to generate simulated cellPACK data."""
diff --git a/src/br/cellpack/generate_synthetic_data.py b/src/br/cellpack/generate_synthetic_data.py
@@ -1,14 +1,15 @@
-import os
-import json
-import pandas as pd
-import numpy as np
 import concurrent.futures
+import gc
+import json
 import multiprocessing
-from time import time
+import os
 import subprocess
 from pathlib import Path
-import gc
+from time import time
+
 import fire
+import numpy as np
+import pandas as pd
 
 RULES = [
     "random",
@@ -33,9 +34,7 @@
 RECIPE_TEMPLATE_PATH = DATADIR / "templates"
 TEMPLATE_FILES = os.listdir(RECIPE_TEMPLATE_PATH)
 TEMPLATE_FILES = [
-    RECIPE_TEMPLATE_PATH / file
-    for file in TEMPLATE_FILES
-    if file.split(".")[-1] == "json"
+    RECIPE_TEMPLATE_PATH / file for file in TEMPLATE_FILES if file.split(".")[-1] == "json"
 ]
 
 GENERATED_RECIPE_PATH = DATADIR / "generated_recipes"
@@ -59,8 +58,7 @@ def create_rule_files(
     shape_angles=ANGLES,
     mesh_path=MESH_PATH,
 ):
-    """
-    Create rule files for each combination of shape IDs and angles.
+    """Create rule files for each combination of shape IDs and angles.
 
     Args:
         cellpack_rules (list): List of rule file paths.
@@ -72,7 +70,7 @@ def create_rule_files(
     """
     for rule in cellpack_rules:
         print(f"Creating files for {rule}")
-        with open(rule, "r") as j:
+        with open(rule) as j:
             contents = json.load(j)
             contents_shape = contents.copy()
             base_version = contents_shape["version"]
@@ -82,24 +80,22 @@ def create_rule_files(
                     this_row = this_row.loc[this_row["angle"] == ang]
 
                     contents_shape["version"] = f"{base_version}_{this_id}_{ang}"
-                    contents_shape["objects"]["mean_nucleus"]["representations"][
-                        "mesh"
-                    ]["name"] = f"{this_id}_{ang}.obj"
-                    contents_shape["objects"]["mean_nucleus"]["representations"][
-                        "mesh"
-                    ]["path"] = str(mesh_path)
+                    contents_shape["objects"]["mean_nucleus"]["representations"]["mesh"][
+                        "name"
+                    ] = f"{this_id}_{ang}.obj"
+                    contents_shape["objects"]["mean_nucleus"]["representations"]["mesh"][
+                        "path"
+                    ] = str(mesh_path)
                     # save json
                     with open(
-                        generated_recipe_path
-                        / f"{base_version}_{this_id}_rotation_{ang}.json",
+                        generated_recipe_path / f"{base_version}_{this_id}_rotation_{ang}.json",
                         "w",
                     ) as f:
                         json.dump(contents_shape, f, indent=4)
 
 
 def update_cellpack_config(config_path=CONFIG_PATH, output_path=DEFAULT_OUTPUT_PATH):
-    """
-    Update the cellPack configuration file with the specified output path.
+    """Update the cellPack configuration file with the specified output path.
 
     Args:
         config_path (str): The path to the CellPack configuration file.
@@ -108,16 +104,15 @@ def update_cellpack_config(config_path=CONFIG_PATH, output_path=DEFAULT_OUTPUT_P
     Returns:
         None
     """
-    with open(config_path, "r") as j:
+    with open(config_path) as j:
         contents = json.load(j)
         contents["out"] = str(output_path)
     with open(config_path, "w") as f:
         json.dump(contents, f, indent=4)
 
 
 def get_files_to_use(generated_recipe_path, rules_to_use, shape_rotations):
-    """
-    Retrieves a list of input files to use based on the given rules and shape rotations.
+    """Retrieves a list of input files to use based on the given rules and shape rotations.
 
     Args:
         generated_recipe_path (str): The path to the directory containing the generated
@@ -127,7 +122,6 @@ def get_files_to_use(generated_recipe_path, rules_to_use, shape_rotations):
 
     Returns:
         input_files_to_use (list): A list of input files to use.
-
     """
     files = os.listdir(generated_recipe_path)
     max_num_files = np.inf
@@ -147,8 +141,7 @@ def get_files_to_use(generated_recipe_path, rules_to_use, shape_rotations):
 
 
 def run_single_packing(recipe_path, config_path=CONFIG_PATH):
-    """
-    Run the packing using the specified recipe and configuration files.
+    """Run the packing using the specified recipe and configuration files.
 
     Args:
         recipe_path (str): The path to the recipe file.
@@ -189,8 +182,7 @@ def run_workflow(
     generated_recipe_path=GENERATED_RECIPE_PATH,
     template_files=TEMPLATE_FILES,
 ):
-    """
-    Runs the workflow for generating synthetic data using cellPack.
+    """Runs the workflow for generating synthetic data using cellPack.
 
     Args:
         output_path (str): Path to the output directory.
@@ -227,9 +219,7 @@ def run_workflow(
     update_cellpack_config(config_path, output_path)
 
     if input_files_to_use is None:
-        input_files_to_use = get_files_to_use(
-            generated_recipe_path, rules_to_use, shape_rotations
-        )
+        input_files_to_use = get_files_to_use(generated_recipe_path, rules_to_use, shape_rotations)
 
     num_files = len(input_files_to_use)
     print(f"Found {num_files} files")
@@ -247,9 +237,7 @@ def run_workflow(
         skipped_count = 0
         count = 0
         failed_count = 0
-        with concurrent.futures.ProcessPoolExecutor(
-            max_workers=num_processes
-        ) as executor:
+        with concurrent.futures.ProcessPoolExecutor(max_workers=num_processes) as executor:
             for file in input_files_to_use:
                 fname = Path(file).stem
                 fname = "".join(fname.split("_rotation"))

diff --git a/src/br/cellpack/get_reference_nuclear_shapes.py b/src/br/cellpack/get_reference_nuclear_shapes.py
@@ -1,7 +1,8 @@
 # %%
-import quilt3
 import os
 
+import quilt3
+
 # %%
 b = quilt3.Bucket("s3://allencell")
 

diff --git a/src/br/data/cellpack/config/pcna_parallel_packing_config.json b/src/br/data/cellpack/config/pcna_parallel_packing_config.json
@@ -25,4 +25,4 @@
         ],
         "projection_axis": "y"
     }
-}
+}
diff --git a/src/br/data/cellpack/templates/pcna_planar_gradient_0deg.json b/src/br/data/cellpack/templates/pcna_planar_gradient_0deg.json
@@ -88,4 +88,4 @@
             }
         }
     }
-}
+}
diff --git a/src/br/data/cellpack/templates/pcna_planar_gradient_45deg.json b/src/br/data/cellpack/templates/pcna_planar_gradient_45deg.json
@@ -88,4 +88,4 @@
             }
         }
     }
-}
+}
diff --git a/src/br/data/cellpack/templates/pcna_planar_gradient_90deg.json b/src/br/data/cellpack/templates/pcna_planar_gradient_90deg.json
@@ -88,4 +88,4 @@
             }
         }
     }
-}
+}
diff --git a/src/br/data/cellpack/templates/pcna_radial_gradient.json b/src/br/data/cellpack/templates/pcna_radial_gradient.json
@@ -75,4 +75,4 @@
             }
         }
     }
-}
+}
diff --git a/src/br/data/cellpack/templates/pcna_random.json b/src/br/data/cellpack/templates/pcna_random.json
@@ -63,4 +63,4 @@
             }
         }
     }
-}
+}
Original file line number	Diff line number	Diff line change
Expand Up		@@ -72,4 +72,3 @@ dev = ["-e file:///${PROJECT_ROOT}/#egg=benchmarking-representations"]
		[build-system]
		requires = ["pdm-backend"]
		build-backend = "pdm.backend"
Original file line number	Diff line number	Diff line change
		@@ -1 +1 @@
		"""Methods to generate simulated cellPACK data"""
		"""Methods to generate simulated cellPACK data."""
-Original file line number
+Diff line change
@@ Expand Up / @@ -25,4 +25,4 @@ @@
             ],
             "projection_axis": "y"
         }
-    }
+    }