diff --git a/.github/actions/pdm/action.yml b/.github/actions/pdm/action.yml index 6991120..b26795e 100644 --- a/.github/actions/pdm/action.yml +++ b/.github/actions/pdm/action.yml @@ -4,7 +4,7 @@ runs: using: composite steps: - name: Set up PDM - uses: pdm-project/setup-pdm@568ddd69406b30de1774ec0044b73ae06e716aa4 # v4.1 + uses: pdm-project/setup-pdm@568ddd69406b30de1774ec0044b73ae06e716aa4 # v4.1 with: python-version: "3.10" version: 2.20.0.post1 diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index 9305fa8..dc1e405 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -12,13 +12,12 @@ concurrency: cancel-in-progress: true jobs: - test: runs-on: ubuntu-latest steps: - - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 + - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 - uses: ./.github/actions/pdm - name: Check that pdm.lock matches pyproject.toml shell: bash - run: pdm lock --check \ No newline at end of file + run: pdm lock --check diff --git a/ADVANCED_INSTALLATION.md b/ADVANCED_INSTALLATION.md index eebcb5c..65df75f 100644 --- a/ADVANCED_INSTALLATION.md +++ b/ADVANCED_INSTALLATION.md @@ -1,7 +1,9 @@ # Installation and usage with pdm + 1. [Install pdm](https://pdm-project.org/en/latest/#recommended-installation-method) 2. Install dependencies: `pdm sync --no-isolation`. (The `--no-isolation` flag is required for `torch-scatter`.) 3. Prefix every `python` command with `pdm run`. For example: + ``` pdm run python src/br/models/train.py experiment=cellpack/pc_equiv ``` diff --git a/README.md b/README.md index aa9886b..a026a92 100644 --- a/README.md +++ b/README.md @@ -8,11 +8,13 @@ This README gives instructions for running our analysis against preprocessed ima # Installation To install and use this software, you need: -* A GPU running CUDA 11.7 (other CUDA versions may work, but they are not officially supported), -* [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) (or Python 3.10 and [pdm](https://pdm-project.org/)), and -* [git](https://github.com/git-guides/install-git). + +- A GPU running CUDA 11.7 (other CUDA versions may work, but they are not officially supported), +- [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) (or Python 3.10 and [pdm](https://pdm-project.org/)), and +- [git](https://github.com/git-guides/install-git). First, clone this repository. + ```bash git clone https://github.com/AllenCell/benchmarking_representations cd benchmarking_representations @@ -29,11 +31,13 @@ Depending on your GPU set-up, you may need to set the `CUDA_VISIBLE_DEVICES` [en To achieve this, you will first need to get the Universally Unique IDs for the GPUs and then set `CUDA_VISIBLE_DEVICES` to some/all of those (a comma-separated list), as in the following examples. **Example 1** + ```bash export CUDA_VISIBLE_DEVICES=0,1 ``` **Example 2:** Using one partition of a MIG partitioned GPU + ```bash export CUDA_VISIBLE_DEVICES=MIG-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ``` @@ -49,7 +53,9 @@ pip install -e . For `pdm` users, follow [these installation steps instead](./ADVANCED_INSTALLATION.md). ## Troubleshooting + **Q:** When installing dependencies, pytorch fails to install with the following error message. + ```bash torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus ``` @@ -57,23 +63,26 @@ torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with **A:** You may need to configure the `CUDA_VISIBLE_DEVICES` [environment variable](https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/). ## Set env variables + To run the models, you must set the `CYTODL_CONFIG_PATH` environment variable to point to the `br/configs` folder. Check that your current working directory is the `benchmarking_representations` folder, then run the following command (this will last for only the duration of your shell session). + ```bash export CYTODL_CONFIG_PATH=$PWD/configs/ ``` # Usage + ## Steps to download and preprocess data 1. Datasets are hosted on quilt. Download raw data at the following links -* [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/cellPACK_single_cell_punctate_structure/) -* [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/packages/aics/nuclear_project_dataset_4) -* [WTC-11 hIPSc single cell image dataset v1](https://staging.allencellquilt.org/b/allencell/tree/aics/hipsc_single_cell_image_dataset/) -* [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/NPM1_single_cell_drug_perturbations/) +- [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/cellPACK_single_cell_punctate_structure/) +- [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/packages/aics/nuclear_project_dataset_4) +- [WTC-11 hIPSc single cell image dataset v1](https://staging.allencellquilt.org/b/allencell/tree/aics/hipsc_single_cell_image_dataset/) +- [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/NPM1_single_cell_drug_perturbations/) -> [!NOTE] +> \[!NOTE\] > Ensure to download all the data in the same folder where the repo was cloned! 2. Once data is downloaded, run preprocessing scripts to create the final image/pointcloud/SDF datasets (this step is not necessary for the cellPACK dataset). For image preprocessing used for punctate structures, install [snakemake](https://snakemake.readthedocs.io/en/stable/getting_started/installation.html) and update the data paths in @@ -156,21 +165,21 @@ python src/br/models/train.py experiment=cellpack/pc_so3 model=pc/classical_eart 1. To skip model training, download pre-trained models -* [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/cellpack/) -* [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/pcna/) -* [WTC-11 hIPSc single cell image dataset v1 punctate structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/other_punctate/) -* [WTC-11 hIPSc single cell image dataset v1 nucleolus (NPM1)](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/npm1/) -* [WTC-11 hIPSc single cell image dataset v1 polymorphic structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/other_polymorphic/) -* [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/npm1_perturb/) +- [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/cellpack/) +- [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/pcna/) +- [WTC-11 hIPSc single cell image dataset v1 punctate structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/other_punctate/) +- [WTC-11 hIPSc single cell image dataset v1 nucleolus (NPM1)](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/npm1/) +- [WTC-11 hIPSc single cell image dataset v1 polymorphic structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/other_polymorphic/) +- [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_checkpoints/npm1_perturb/) 2. Download pre-computed embeddings -* [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/cellpack/) -* [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/pcna/) -* [WTC-11 hIPSc single cell image dataset v1 punctate structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/other_punctate/) -* [WTC-11 hIPSc single cell image dataset v1 nucleolus (NPM1)](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/npm1/) -* [WTC-11 hIPSc single cell image dataset v1 polymorphic structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/other_polymorphic/) -* [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/npm1_perturb/) +- [cellPACK synthetic dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/cellpack/) +- [DNA replication foci dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/pcna/) +- [WTC-11 hIPSc single cell image dataset v1 punctate structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/other_punctate/) +- [WTC-11 hIPSc single cell image dataset v1 nucleolus (NPM1)](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/npm1/) +- [WTC-11 hIPSc single cell image dataset v1 polymorphic structures](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/other_polymorphic/) +- [Nucleolar drug perturbation dataset](https://open.quiltdata.com/b/allencell/tree/aics/morphology_appropriate_representation_learning/model_embeddings/npm1_perturb/) ## Steps to run benchmarking analysis @@ -188,6 +197,7 @@ python src/br/models/train.py experiment=cellpack/pc_so3 model=pc/classical_eart ``` # Development + ## Project Organization ``` diff --git a/configs/logger/csv.yaml b/configs/logger/csv.yaml index c524e13..fa028e9 100644 --- a/configs/logger/csv.yaml +++ b/configs/logger/csv.yaml @@ -4,4 +4,4 @@ csv: _target_: lightning.pytorch.loggers.csv_logs.CSVLogger save_dir: "${paths.output_dir}" name: "csv/" - prefix: "" \ No newline at end of file + prefix: "" diff --git a/pyproject.toml b/pyproject.toml index e096bdc..4063134 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -72,4 +72,3 @@ dev = ["-e file:///${PROJECT_ROOT}/#egg=benchmarking-representations"] [build-system] requires = ["pdm-backend"] build-backend = "pdm.backend" - diff --git a/src/br/cellpack/README.md b/src/br/cellpack/README.md index 5ce7c17..b80049f 100644 --- a/src/br/cellpack/README.md +++ b/src/br/cellpack/README.md @@ -14,14 +14,19 @@ pip install quilt3 ``` ## Usage + 1. Get the reference nuclear shapes: + ```bash python get_reference_nuclear_shapes.py ``` + 2. Generate the synthetic data: + ```bash python generate_synthetic_data.py ``` + Additional options can be specified through the command line. Run `python generate_synthetic_data.py --help` for more information. -The generated synthetic data will be saved in `data/packings` \ No newline at end of file +The generated synthetic data will be saved in `data/packings` diff --git a/src/br/cellpack/__init__.py b/src/br/cellpack/__init__.py index 5284942..ebb50f7 100644 --- a/src/br/cellpack/__init__.py +++ b/src/br/cellpack/__init__.py @@ -1 +1 @@ -"""Methods to generate simulated cellPACK data""" +"""Methods to generate simulated cellPACK data.""" diff --git a/src/br/cellpack/generate_synthetic_data.py b/src/br/cellpack/generate_synthetic_data.py index 1cad5b1..f0ddbdf 100644 --- a/src/br/cellpack/generate_synthetic_data.py +++ b/src/br/cellpack/generate_synthetic_data.py @@ -1,14 +1,15 @@ -import os -import json -import pandas as pd -import numpy as np import concurrent.futures +import gc +import json import multiprocessing -from time import time +import os import subprocess from pathlib import Path -import gc +from time import time + import fire +import numpy as np +import pandas as pd RULES = [ "random", @@ -33,9 +34,7 @@ RECIPE_TEMPLATE_PATH = DATADIR / "templates" TEMPLATE_FILES = os.listdir(RECIPE_TEMPLATE_PATH) TEMPLATE_FILES = [ - RECIPE_TEMPLATE_PATH / file - for file in TEMPLATE_FILES - if file.split(".")[-1] == "json" + RECIPE_TEMPLATE_PATH / file for file in TEMPLATE_FILES if file.split(".")[-1] == "json" ] GENERATED_RECIPE_PATH = DATADIR / "generated_recipes" @@ -59,8 +58,7 @@ def create_rule_files( shape_angles=ANGLES, mesh_path=MESH_PATH, ): - """ - Create rule files for each combination of shape IDs and angles. + """Create rule files for each combination of shape IDs and angles. Args: cellpack_rules (list): List of rule file paths. @@ -72,7 +70,7 @@ def create_rule_files( """ for rule in cellpack_rules: print(f"Creating files for {rule}") - with open(rule, "r") as j: + with open(rule) as j: contents = json.load(j) contents_shape = contents.copy() base_version = contents_shape["version"] @@ -82,24 +80,22 @@ def create_rule_files( this_row = this_row.loc[this_row["angle"] == ang] contents_shape["version"] = f"{base_version}_{this_id}_{ang}" - contents_shape["objects"]["mean_nucleus"]["representations"][ - "mesh" - ]["name"] = f"{this_id}_{ang}.obj" - contents_shape["objects"]["mean_nucleus"]["representations"][ - "mesh" - ]["path"] = str(mesh_path) + contents_shape["objects"]["mean_nucleus"]["representations"]["mesh"][ + "name" + ] = f"{this_id}_{ang}.obj" + contents_shape["objects"]["mean_nucleus"]["representations"]["mesh"][ + "path" + ] = str(mesh_path) # save json with open( - generated_recipe_path - / f"{base_version}_{this_id}_rotation_{ang}.json", + generated_recipe_path / f"{base_version}_{this_id}_rotation_{ang}.json", "w", ) as f: json.dump(contents_shape, f, indent=4) def update_cellpack_config(config_path=CONFIG_PATH, output_path=DEFAULT_OUTPUT_PATH): - """ - Update the cellPack configuration file with the specified output path. + """Update the cellPack configuration file with the specified output path. Args: config_path (str): The path to the CellPack configuration file. @@ -108,7 +104,7 @@ def update_cellpack_config(config_path=CONFIG_PATH, output_path=DEFAULT_OUTPUT_P Returns: None """ - with open(config_path, "r") as j: + with open(config_path) as j: contents = json.load(j) contents["out"] = str(output_path) with open(config_path, "w") as f: @@ -116,8 +112,7 @@ def update_cellpack_config(config_path=CONFIG_PATH, output_path=DEFAULT_OUTPUT_P def get_files_to_use(generated_recipe_path, rules_to_use, shape_rotations): - """ - Retrieves a list of input files to use based on the given rules and shape rotations. + """Retrieves a list of input files to use based on the given rules and shape rotations. Args: generated_recipe_path (str): The path to the directory containing the generated @@ -127,7 +122,6 @@ def get_files_to_use(generated_recipe_path, rules_to_use, shape_rotations): Returns: input_files_to_use (list): A list of input files to use. - """ files = os.listdir(generated_recipe_path) max_num_files = np.inf @@ -147,8 +141,7 @@ def get_files_to_use(generated_recipe_path, rules_to_use, shape_rotations): def run_single_packing(recipe_path, config_path=CONFIG_PATH): - """ - Run the packing using the specified recipe and configuration files. + """Run the packing using the specified recipe and configuration files. Args: recipe_path (str): The path to the recipe file. @@ -189,8 +182,7 @@ def run_workflow( generated_recipe_path=GENERATED_RECIPE_PATH, template_files=TEMPLATE_FILES, ): - """ - Runs the workflow for generating synthetic data using cellPack. + """Runs the workflow for generating synthetic data using cellPack. Args: output_path (str): Path to the output directory. @@ -227,9 +219,7 @@ def run_workflow( update_cellpack_config(config_path, output_path) if input_files_to_use is None: - input_files_to_use = get_files_to_use( - generated_recipe_path, rules_to_use, shape_rotations - ) + input_files_to_use = get_files_to_use(generated_recipe_path, rules_to_use, shape_rotations) num_files = len(input_files_to_use) print(f"Found {num_files} files") @@ -247,9 +237,7 @@ def run_workflow( skipped_count = 0 count = 0 failed_count = 0 - with concurrent.futures.ProcessPoolExecutor( - max_workers=num_processes - ) as executor: + with concurrent.futures.ProcessPoolExecutor(max_workers=num_processes) as executor: for file in input_files_to_use: fname = Path(file).stem fname = "".join(fname.split("_rotation")) diff --git a/src/br/cellpack/get_reference_nuclear_shapes.py b/src/br/cellpack/get_reference_nuclear_shapes.py index 7235f36..94e380f 100644 --- a/src/br/cellpack/get_reference_nuclear_shapes.py +++ b/src/br/cellpack/get_reference_nuclear_shapes.py @@ -1,7 +1,8 @@ # %% -import quilt3 import os +import quilt3 + # %% b = quilt3.Bucket("s3://allencell") diff --git a/src/br/data/cellpack/config/pcna_parallel_packing_config.json b/src/br/data/cellpack/config/pcna_parallel_packing_config.json index b08b080..a374a78 100644 --- a/src/br/data/cellpack/config/pcna_parallel_packing_config.json +++ b/src/br/data/cellpack/config/pcna_parallel_packing_config.json @@ -25,4 +25,4 @@ ], "projection_axis": "y" } -} \ No newline at end of file +} diff --git a/src/br/data/cellpack/templates/pcna_planar_gradient_0deg.json b/src/br/data/cellpack/templates/pcna_planar_gradient_0deg.json index cb874b2..e4e507b 100644 --- a/src/br/data/cellpack/templates/pcna_planar_gradient_0deg.json +++ b/src/br/data/cellpack/templates/pcna_planar_gradient_0deg.json @@ -88,4 +88,4 @@ } } } -} \ No newline at end of file +} diff --git a/src/br/data/cellpack/templates/pcna_planar_gradient_45deg.json b/src/br/data/cellpack/templates/pcna_planar_gradient_45deg.json index 6e7b0d5..6e8b826 100644 --- a/src/br/data/cellpack/templates/pcna_planar_gradient_45deg.json +++ b/src/br/data/cellpack/templates/pcna_planar_gradient_45deg.json @@ -88,4 +88,4 @@ } } } -} \ No newline at end of file +} diff --git a/src/br/data/cellpack/templates/pcna_planar_gradient_90deg.json b/src/br/data/cellpack/templates/pcna_planar_gradient_90deg.json index 62df2f7..fb8e1f4 100644 --- a/src/br/data/cellpack/templates/pcna_planar_gradient_90deg.json +++ b/src/br/data/cellpack/templates/pcna_planar_gradient_90deg.json @@ -88,4 +88,4 @@ } } } -} \ No newline at end of file +} diff --git a/src/br/data/cellpack/templates/pcna_radial_gradient.json b/src/br/data/cellpack/templates/pcna_radial_gradient.json index 1e2d962..e9ca5af 100644 --- a/src/br/data/cellpack/templates/pcna_radial_gradient.json +++ b/src/br/data/cellpack/templates/pcna_radial_gradient.json @@ -75,4 +75,4 @@ } } } -} \ No newline at end of file +} diff --git a/src/br/data/cellpack/templates/pcna_random.json b/src/br/data/cellpack/templates/pcna_random.json index 301aa8a..e6feb3c 100644 --- a/src/br/data/cellpack/templates/pcna_random.json +++ b/src/br/data/cellpack/templates/pcna_random.json @@ -63,4 +63,4 @@ } } } -} \ No newline at end of file +} diff --git a/src/br/data/cellpack/templates/pcna_surface_gradient.json b/src/br/data/cellpack/templates/pcna_surface_gradient.json index eaac113..f50dfd9 100644 --- a/src/br/data/cellpack/templates/pcna_surface_gradient.json +++ b/src/br/data/cellpack/templates/pcna_surface_gradient.json @@ -79,4 +79,4 @@ } } } -} \ No newline at end of file +} diff --git a/src/br/features/archetype.py b/src/br/features/archetype.py index 81a10ed..7805afd 100644 --- a/src/br/features/archetype.py +++ b/src/br/features/archetype.py @@ -127,9 +127,7 @@ def __init__( super().__init__(n_archetypes, max_iter, tol, verbose) self.derivative_max_iter = derivative_max_iter - def _computeA( - self, X: np.ndarray, Z: np.ndarray, A: np.ndarray = None - ) -> np.ndarray: + def _computeA(self, X: np.ndarray, Z: np.ndarray, A: np.ndarray = None) -> np.ndarray: A = np.zeros((self.n_samples, self.n_archetypes)) A[:, 0] = 1.0 e = np.zeros(A.shape) @@ -143,9 +141,7 @@ def _computeA( e[range(self.n_samples), argmins] = 0.0 return A - def _computeB( - self, X: np.ndarray, A: np.ndarray, B: np.ndarray = None - ) -> np.ndarray: + def _computeB(self, X: np.ndarray, A: np.ndarray, B: np.ndarray = None) -> np.ndarray: B = np.zeros((self.n_archetypes, self.n_samples)) B[:, 0] = 1.0 e = np.zeros(B.shape) diff --git a/src/br/features/plot.py b/src/br/features/plot.py index 6d0d79e..e436a5d 100644 --- a/src/br/features/plot.py +++ b/src/br/features/plot.py @@ -2,8 +2,9 @@ import matplotlib.pyplot as plt import mitsuba as mi + mi.set_variant("scalar_rgb") -# Use scalar backend for mitsuba rendering +# Use scalar backend for mitsuba rendering # For more detailes refer to # https://mitsuba.readthedocs.io/en/stable/src/key_topics/variants.html import numpy as np @@ -12,8 +13,8 @@ import seaborn as sns import trimesh from mitsuba import ScalarTransform4f as T -from .utils import normalize_intensities_and_get_colormap +from .utils import normalize_intensities_and_get_colormap METRIC_DICT = { "reconstruction": {"metric": ["loss"], "min": [True]}, diff --git a/src/br/models/load_models.py b/src/br/models/load_models.py index 0d7ac5c..e87653d 100644 --- a/src/br/models/load_models.py +++ b/src/br/models/load_models.py @@ -7,9 +7,7 @@ from br.models.utils import get_all_configs_per_dataset -def load_model_from_path( - dataset, results_path, strict=False, split="val", device="cuda:0" -): +def load_model_from_path(dataset, results_path, strict=False, split="val", device="cuda:0"): MODEL_INFO = get_all_configs_per_dataset(results_path) models = MODEL_INFO[dataset] model_sizes = [] diff --git a/src/br/notebooks/fig7_drugdata_analysis.ipynb b/src/br/notebooks/fig7_drugdata_analysis.ipynb index 08b59b1..56b0f59 100644 --- a/src/br/notebooks/fig7_drugdata_analysis.ipynb +++ b/src/br/notebooks/fig7_drugdata_analysis.ipynb @@ -80,9 +80,7 @@ "cell_type": "code", "execution_count": null, "id": "7e8f82b6-ffad-4cf7-942c-efce7bbea688", - "metadata": { - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "# Save embeddings for each model\n", @@ -149,7 +147,7 @@ "metadata": {}, "outputs": [], "source": [ - "all_ret['model'].unique()" + "all_ret[\"model\"].unique()" ] }, { @@ -362,10 +360,14 @@ "metadata": {}, "outputs": [], "source": [ - "rep_dict = {'CNN_sdf_noalign_global': 'Classical_image_SDF', 'CNN_sdf_SO3_global': 'SO3_image_SDF', \n", - " 'CNN_seg_noalign_global': 'Classical_image_seg', 'CNN_seg_SO3_global': 'SO3_image_seg', \n", - " 'vn_so3': 'SO3_pointcloud_SDF'}\n", - "all_rep['model'] = all_rep['model'].replace(rep_dict)" + "rep_dict = {\n", + " \"CNN_sdf_noalign_global\": \"Classical_image_SDF\",\n", + " \"CNN_sdf_SO3_global\": \"SO3_image_SDF\",\n", + " \"CNN_seg_noalign_global\": \"Classical_image_seg\",\n", + " \"CNN_seg_SO3_global\": \"SO3_image_seg\",\n", + " \"vn_so3\": \"SO3_pointcloud_SDF\",\n", + "}\n", + "all_rep[\"model\"] = all_rep[\"model\"].replace(rep_dict)" ] }, { @@ -375,7 +377,12 @@ "metadata": {}, "outputs": [], "source": [ - "ordered_drugs = all_rep.groupby(['Metadata_broad_sample']).mean().sort_values(by='q_value').reset_index()['Metadata_broad_sample']" + "ordered_drugs = (\n", + " all_rep.groupby([\"Metadata_broad_sample\"])\n", + " .mean()\n", + " .sort_values(by=\"q_value\")\n", + " .reset_index()[\"Metadata_broad_sample\"]\n", + ")" ] }, { @@ -431,53 +438,48 @@ }, { "cell_type": "code", - "execution_count": 250, + "execution_count": null, "id": "1f4e4142-b698-4f47-a17a-73a648191720", "metadata": {}, "outputs": [], "source": [ - "df = pd.read_csv('/allen/aics/modeling/ritvik/projects/aws_uploads/morphology_appropriate_representation_learning/cellPACK_single_cell_punctate_structure/reference_nuclear_shapes/manifest.csv')" + "df = pd.read_csv(\n", + " \"/allen/aics/modeling/ritvik/projects/aws_uploads/morphology_appropriate_representation_learning/cellPACK_single_cell_punctate_structure/reference_nuclear_shapes/manifest.csv\"\n", + ")" ] }, { "cell_type": "code", - "execution_count": 253, + "execution_count": null, "id": "692979be-80cb-4d45-bcad-bc6546785178", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'./cellPACK_single_cell_punctate_structure/reference_nuclear_shapes/00a2e026-6f81-4bd5-8ab0-c2e12f8c793c_0.obj'" - ] - }, - "execution_count": 253, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ - "df['nucobj_path'].iloc[0]" + "df[\"nucobj_path\"].iloc[0]" ] }, { "cell_type": "code", - "execution_count": 252, + "execution_count": null, "id": "60580552-ed9c-45fc-90fe-4853f38f0a3b", "metadata": {}, "outputs": [], "source": [ - "df['nucobj_path'] = df['nucobj_path'].apply(lambda x: x.replace('./morphology_appropriate_representation_learning', '.'))" + "df[\"nucobj_path\"] = df[\"nucobj_path\"].apply(\n", + " lambda x: x.replace(\"./morphology_appropriate_representation_learning\", \".\")\n", + ")" ] }, { "cell_type": "code", - "execution_count": 254, + "execution_count": null, "id": "3d863644-b8d5-44b2-9f13-9c348fd98256", "metadata": {}, "outputs": [], "source": [ - "df.to_csv('/allen/aics/modeling/ritvik/projects/aws_uploads/morphology_appropriate_representation_learning/cellPACK_single_cell_punctate_structure/reference_nuclear_shapes/manifest.csv')" + "df.to_csv(\n", + " \"/allen/aics/modeling/ritvik/projects/aws_uploads/morphology_appropriate_representation_learning/cellPACK_single_cell_punctate_structure/reference_nuclear_shapes/manifest.csv\"\n", + ")" ] }, { diff --git a/src/pointcloudutils/datamodules/cellpack.py b/src/pointcloudutils/datamodules/cellpack.py index 76a5830..6067150 100644 --- a/src/pointcloudutils/datamodules/cellpack.py +++ b/src/pointcloudutils/datamodules/cellpack.py @@ -504,7 +504,9 @@ def get_packing(tup): this_path = this_path.replace("positions", "figures/voxelized_image") # Remove the original file extension (.json in this case) - this_path = this_path.rsplit(".", 1)[0] # This splits on the last dot and keeps the first part + this_path = this_path.rsplit(".", 1)[ + 0 + ] # This splits on the last dot and keeps the first part # Append the new file extension this_path = this_path + "_seed_0.ome.tiff" diff --git a/src/pointcloudutils/datamodules/shapenet_dataset/fields.py b/src/pointcloudutils/datamodules/shapenet_dataset/fields.py index af25ef0..e0aaae1 100644 --- a/src/pointcloudutils/datamodules/shapenet_dataset/fields.py +++ b/src/pointcloudutils/datamodules/shapenet_dataset/fields.py @@ -185,9 +185,7 @@ def load(self, model_path, idx, category, mode): category (int): index of category """ if mode in ["val", "test"]: # fix the size in evaluation - self.partial_type = ( - "centerz" if "centerz" in self.partial_type else "centery" - ) + self.partial_type = "centerz" if "centerz" in self.partial_type else "centery" self.part_ratio = 0.5 if self.multi_files is None: diff --git a/src/pointcloudutils/datamodules/shapenet_dataset/utils.py b/src/pointcloudutils/datamodules/shapenet_dataset/utils.py index 82e86fa..6d01cfd 100644 --- a/src/pointcloudutils/datamodules/shapenet_dataset/utils.py +++ b/src/pointcloudutils/datamodules/shapenet_dataset/utils.py @@ -1,12 +1,11 @@ -from torchvision import transforms import numpy as np +from torchvision import transforms + from .fields import IndexField, PartialPointCloudField, PointCloudField, PointsField from .transforms import PointcloudNoise, SubsamplePointcloud, SubsamplePoints -def get_data_fields( - mode, points_subsample, input_type, points_file, multi_files, points_iou_file -): +def get_data_fields(mode, points_subsample, input_type, points_file, multi_files, points_iou_file): """Returns the data fields. Args: @@ -58,9 +57,7 @@ def get_inputs_field( transform = transforms.Compose( [SubsamplePointcloud(pointcloud_n), PointcloudNoise(pointcloud_noise)] ) - inputs_field = PointCloudField( - pointcloud_file, transform, multi_files=multi_files - ) + inputs_field = PointCloudField(pointcloud_file, transform, multi_files=multi_files) elif input_type == "partial_pointcloud": transform = transforms.Compose( [ diff --git a/subpackages/image_preprocessing/config/config.yaml b/subpackages/image_preprocessing/config/config.yaml index eb87d98..0c6a25a 100644 --- a/subpackages/image_preprocessing/config/config.yaml +++ b/subpackages/image_preprocessing/config/config.yaml @@ -10,9 +10,9 @@ force: false raise_errors: true samples_per_structure: -output_dir: +output_dir: -input_manifest: +input_manifest: remote_provider: diff --git a/subpackages/image_preprocessing/image_preprocessing/steps/merge.py b/subpackages/image_preprocessing/image_preprocessing/steps/merge.py index af3fa29..4fa91da 100644 --- a/subpackages/image_preprocessing/image_preprocessing/steps/merge.py +++ b/subpackages/image_preprocessing/image_preprocessing/steps/merge.py @@ -137,9 +137,7 @@ def run_step(self, row): ) ).astype("uint16") - channel_names = ( - raw_channel_names + [seg_channel_names[ix] for ix in seg_channels_to_use] - ) + channel_names = raw_channel_names + [seg_channel_names[ix] for ix in seg_channels_to_use] output_path = self.store_image( data_new, channel_names, raw_img.physical_pixel_sizes, cell_id diff --git a/tests/test_import_project.py b/tests/test_import_project.py index 2374faf..04d7ba6 100644 --- a/tests/test_import_project.py +++ b/tests/test_import_project.py @@ -1,8 +1,10 @@ def test_import_br(): import br + def test_import_pointcloudutils(): import pointcloudutils + def test_import_datamodules(): import pointcloudutils.datamodules