Single-cell representation learning (#153)

* Merging code related to figures (#146) * notes on standard report * Add code for generating figures --------- Co-authored-by: Alishba Imran <[email protected]> * produce a report of useful visualizations to assess the dimensionality and features learned by embeddings (#140) * notes on standard report * add lib of computed features * correlates PCA with computed features * compute for all timepoints * compute correlation * remove cv library usage * remove edge detection * convert to dataframe * for entire well * add std_dev feature * fix patch size --------- Co-authored-by: Soorya Pradeep <[email protected]> * Remove obsolete scripts for contrastive phenotyping (#150) * remove obsolete training and prediction scripts * lint contrastive scripts * SSL: fix MLP head and remove L2 normalization (#145) * draft projection head per Update the projection head (normalization and size). #139 * reorganize comments in example fit config * configurable stem stride and projection dimensions * update type hint and docstring for ContrastiveEncoder * clarify embedding_dim * use the forward method directly for projected * normalize projections only when fitting the projected features saved during prediction is now *not* normalized * remove unused logger * refactor training code into translation and representation modules * extract image logging functions * use AdamW instead of Adam for contrastive learning * inline single-use argument * fix normalization * fix MLP layer order * fix output dimensions * remove L2 normalization before computing loss * compute rank of features and projections * documentation --------- Co-authored-by: Shalin Mehta <[email protected]> * created and updated classify_feb_embeddings.py * Module and scripts for evaluating representations (#156) * docstring * move scripts from contrastive_scripts to viscy/scripts * organize files in applications/contrastive_phenotyping * delete unused evaluation code * more cleanup * refactor evaluation metrics for translation task * refactor viscy.evaluation -> viscy.translation.evaluation_metrics and viscy.representation.evaluation * WIP: representation evaluation module * WIP: representation eval - docstrings in numpy format * WIP: more documentation * refactor: feature_extractor moved to viscy.representation.evaluation * lint * bug fix * refactored common computations and dataset * add imbalance-learn dependecy to metrics * refactor classification of embeddings * organize viscy.representation.evaluation * ruff * Soorya's plotting script * WIP: combine two versions of plot_embeddings.py * simplify representation.viscy.evaluation - move LCA to its own module * refactor of viscy.representation.evaluation * refactored and tested PCA and UMAP plots --------- Co-authored-by: Soorya Pradeep <[email protected]> * delete duplicate file * lint * fix import paths * rename translation tests * rename translation metrics * Sample positive and negative samples with a time offset for the triplet contrastive task (#154) * wip: sample positive and negative samples from another time point * configure time interval in triplet data module * vectorized anchor filtering * conditional augmentation for anchor anchor is augmented if the positive is another time point * example training script for the CTC dataset this is optimized to run on MPS * add example CTC prediction config for MPS * add fig for mitosis * add script to save image patches * add save patches as npy * save figure at 300dpi * Linear probing (#160) * refactor linear probing with lightning * test convenience function * always convert to long before onehot * use onehot only during training * supply trainer through argument to avoid wrapping * only log per epoch * example script for linear probing * add comment about loss curve * fix sample filtering order for select tracks * add script to visualize integrated gradients * plot integrated gradients over time * Use sklearn's logistic regression for linear probing (#169) * use binary logistic regression to initialize the linear layer * plot integrated gradients from a binary classifier * add cmap to 'visual' requirements * move model assembling to lca * rename init argument * disable feature scaling * update test and evaluation scripts to use new API * add docstrings to LCA * Tweak attribution visualization (#170) * add maplotlib style sheet for figure making * add cell division attribution * add matplotlib style sheet * move attribution computation to lca * tweak contrast limits and text * add captum to optional dependencies * move attribution function to a method of the classifier * add script to show organelle dynamics * add occlusion attribution * more generic save path * add uninfected cell * tweak subplot spacing * UMAP line plot to assess temporal smoothness in features space (#176) * add maplotlib style sheet for figure making * add cell division attribution * add matplotlib style sheet * move attribution computation to lca * tweak contrast limits and text * add captum to optional dependencies * move attribution function to a method of the classifier * add script to show organelle dynamics * add occlusion attribution * more generic save path * add uninfected cell * tweak subplot spacing * lower case titles * reduce UMAP components to 2 and add indices * add script to make the bridge gaps figure * fixed import error * formatted with black * reduce to single arrow on plot * remove reduntant script * Fixes on correlation of PCA and UMAP components to computed_feature script (#159) * reduce initial patch size * add radial profiling * add function descriptions * add umap correlation * add def comments * change umap for all data * add script for 1 chan * add p-value analysis * add PCA analysis * remove duplicate script * Refactor and format code * Format code * Removed umap correlation * note for future refactor --------- Co-authored-by: Ziwen Liu <[email protected]> * updated eval module & cosine sim figures (#168) * updated files * format fixed for tests * updated scripts * umap dist code * bug fixes and linting * logistic regression script * add infection figure script * Add script for generating infection figure and perform prediction on the June dataset * Format code * Black format evaluation module and fix import in figure_cell_infection script * Refactor scatterplot colors and markers * Calculate model accuracy * Add script for appendix video * formatted code * updated displacement funcs for full embeddings * script for displacement computation * fix style * fix docstring format --------- Co-authored-by: Shalin Mehta <[email protected]> Co-authored-by: Soorya Pradeep <[email protected]> Co-authored-by: Ziwen Liu <[email protected]> * Fixup representation (#180) * fix docstrings and type hint for the ContrastiveEncoder * refactor the representation evaluation module into submodules * move shared image logging into utils * fix line end * fix import paths in example notebooks * Unified CLI entry point (#182) * remove obsolete metrics script for translation * move cellpose annotation script * consolidate CLI documentation * remove old CLI help * move translation CLI to its own module * move contrastive CLI to its own module * remove old CLI module * remove global entry script * share trainer class between tasks * move cli from init to main * inherit base CLI class for tasks * improve type hint and docstring * restore global CLI entry point * special case subclass mode for preprocessing * remove separate entry points * add CLI description message * make the setup function private * fix subclass mode detection * remove unused arguments from custom subcommands * use generic path in example * fix docstring style * update virtual staining example configs * update CTC SSL example configs * update infection SSL example configs * Remove outdated comment * updating the dlmbl notebooks * updating dependendencies to allow viscy>0.2 in examples * updating phase contrast demo notebook. * updating references to main * Store UMAP embeddings in SSL predictions (#184) * extract function for computing umap * specific return type for predict step * write umap in prediction * raise log level for umap computation * fix key conversion * Add representation section to readme (#186) * draft readme * direct link dynaCLR schematic * add DynaCLR schemetic figure * add static schematic and link to video --------- Co-authored-by: Ziwen Liu <[email protected]> Co-authored-by: Ziwen Liu <[email protected]> * fix link syntax in readme --------- Co-authored-by: Shalin Mehta <[email protected]> Co-authored-by: Alishba Imran <[email protected]> Co-authored-by: Soorya Pradeep <[email protected]> Co-authored-by: Alishba Imran <[email protected]> Co-authored-by: Soorya19Pradeep <[email protected]> Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>
mehta-lab · Oct 17, 2024 · ee834ce · ee834ce
1 parent a0dcbde
commit ee834ce
Show file tree

Hide file tree

Showing 98 changed files with 7,105 additions and 4,043 deletions.
diff --git a/README.md b/README.md
@@ -102,6 +102,36 @@ The robust virtual staining models (i.e *VSCyto2D*, *VSCyto3D*, *VSNeuromast*),
 
 A full illustration of the virtual staining pipeline can be found [here](https://github.com/mehta-lab/VisCy/blob/dde3e27482e58a30f7c202e56d89378031180c75/docs/virtual_staining.md).
 
+## Image representation learning
+
+We are currently developing self-supervised representation learning to map cell state dynamics in response to perturbations,
+with focus on cell and organelle remodeling due to viral infection.
+
+See our recent work on temporally regularized contrastive sampling method
+for representation learning on [arXiv](https://arxiv.org/abs/2410.11281).
+
+<details>
+ <summary> Pradeep, Imran, Liu et al., 2024 </summary>
+
+  <pre><code>
+@misc{pradeep_contrastive_2024,
+      title={Contrastive learning of cell state dynamics in response to perturbations},
+      author={Soorya Pradeep and Alishba Imran and Ziwen Liu and Taylla Milena Theodoro and Eduardo Hirata-Miyasaki and Ivan Ivanov and Madhura Bhave and Sudip Khadka and Hunter Woosley and Carolina Arias and Shalin B. Mehta},
+      year={2024},
+      eprint={2410.11281},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2410.11281},
+}
+    </code></pre>
+  </details>
+
+### Workflow demo
+
+[Exploration of learned embeddings with napari-iohub](https://drive.google.com/file/d/16WSoTvXJ-siLb7iyOueOag_cKn9Iwckc/view?usp=drive_link)
+
+![DynaCLR](https://github.com/mehta-lab/VisCy/blob/9eaab7eca50d684d8a473ad9da089aeab0e8f6a0/docs/figures/dynaCLR_schematic.png?raw=true)
+
 ## Installation
 
 1. We recommend using a new Conda/virtual environment.
@@ -148,4 +178,3 @@ for reading and writing data in [OME-Zarr](https://www.nature.com/articles/s4159
 The full functionality is tested on Linux `x86_64` with NVIDIA Ampere GPUs (CUDA 12.4).
 Some features (e.g. mixed precision and distributed training) may not be available with other setups,
 see [PyTorch documentation](https://pytorch.org) for details.
-
diff --git a/applications/contrastive_phenotyping/contrastive_cli/fit.yml b/applications/contrastive_phenotyping/contrastive_cli/fit.yml
diff --git a/applications/contrastive_phenotyping/contrastive_cli/fit_ctc_mps.yml b/applications/contrastive_phenotyping/contrastive_cli/fit_ctc_mps.yml
@@ -0,0 +1,117 @@
+# See help here on how to configure hyper-parameters with config files:
+# https://lightning.ai/docs/pytorch/stable/cli/lightning_cli_advanced.html
+seed_everything: 42
+trainer:
+  accelerator: gpu
+  strategy: auto
+  devices: 1
+  num_nodes: 1
+  precision: 32-true
+  logger:
+    class_path: lightning.pytorch.loggers.TensorBoardLogger
+    # Nesting the logger config like this is equivalent to
+    # supplying the following argument to `lightning.pytorch.Trainer`:
+    # logger=TensorBoardLogger(
+    #     "/hpc/projects/intracellular_dashboard/viral-sensor/infection_classification/models/contrastive_tune_augmentations",
+    #     log_graph=True,
+    #     version="vanilla",
+    # )
+    init_args:
+      save_dir: /Users/ziwen.liu/Projects/test-time
+      # this is the name of the experiment.
+      # The logs will be saved in `save_dir/lightning_logs/version`
+      version: time_interval_1
+      log_graph: True
+  callbacks:
+    - class_path: lightning.pytorch.callbacks.LearningRateMonitor
+      init_args:
+        logging_interval: step
+    - class_path: lightning.pytorch.callbacks.ModelCheckpoint
+      init_args:
+        monitor: loss/val
+        every_n_epochs: 1
+        save_top_k: 4
+        save_last: true
+  fast_dev_run: false
+  max_epochs: 100
+  log_every_n_steps: 10
+  enable_checkpointing: true
+  inference_mode: true
+  use_distributed_sampler: true
+  # synchronize batchnorm parameters across multiple GPUs.
+  # important for contrastive learning to normalize the tensors across the whole batch.
+  sync_batchnorm: true
+model:
+  class_path: viscy.representation.engine.ContrastiveModule
+  init_args:
+    encoder:
+      class_path: viscy.representation.contrastive.ContrastiveEncoder
+      init_args:
+        backbone: convnext_tiny
+        in_channels: 1
+        in_stack_depth: 1
+        stem_kernel_size: [1, 4, 4]
+        stem_stride: [1, 4, 4]
+        embedding_dim: 768
+        projection_dim: 32
+        drop_path_rate: 0.0
+    loss_function:
+      class_path: torch.nn.TripletMarginLoss
+      init_args:
+        margin: 0.5
+    lr: 0.0002
+    log_batches_per_epoch: 3
+    log_samples_per_batch: 2
+    example_input_array_shape: [1, 1, 1, 128, 128]
+data:
+  class_path: viscy.data.triplet.TripletDataModule
+  init_args:
+    data_path: /Users/ziwen.liu/Downloads/Hela_CTC.zarr
+    tracks_path: /Users/ziwen.liu/Downloads/Hela_CTC.zarr
+    source_channel:
+      - DIC
+    z_range: [0, 1]
+    batch_size: 16
+    num_workers: 4
+    initial_yx_patch_size: [256, 256]
+    final_yx_patch_size: [128, 128]
+    time_interval: 1
+    normalizations:
+      - class_path: viscy.transforms.NormalizeSampled
+        init_args:
+          keys: [DIC]
+          level: fov_statistics
+          subtrahend: mean
+          divisor: std
+    augmentations:
+      - class_path: viscy.transforms.RandAffined
+        init_args:
+          keys: [DIC]
+          prob: 0.8
+          scale_range: [0, 0.2, 0.2]
+          rotate_range: [3.14, 0.0, 0.0]
+          shear_range: [0.0, 0.01, 0.01]
+          padding_mode: zeros
+      - class_path: viscy.transforms.RandAdjustContrastd
+        init_args:
+          keys: [DIC]
+          prob: 0.5
+          gamma: [0.8, 1.2]
+      - class_path: viscy.transforms.RandScaleIntensityd
+        init_args:
+          keys: [DIC]
+          prob: 0.5
+          factors: 0.5
+      - class_path: viscy.transforms.RandGaussianSmoothd
+        init_args:
+          keys: [DIC]
+          prob: 0.5
+          sigma_x: [0.25, 0.75]
+          sigma_y: [0.25, 0.75]
+          sigma_z: [0.0, 0.0]
+      - class_path: viscy.transforms.RandGaussianNoised
+        init_args:
+          keys: [DIC]
+          prob: 0.5
+          mean: 0.0
+          std: 0.2
diff --git a/applications/contrastive_phenotyping/contrastive_cli/predict.yml b/applications/contrastive_phenotyping/contrastive_cli/predict.yml
diff --git a/applications/contrastive_phenotyping/contrastive_cli/predict_ctc_mps.yml b/applications/contrastive_phenotyping/contrastive_cli/predict_ctc_mps.yml
@@ -0,0 +1,48 @@
+seed_everything: 42
+trainer:
+  accelerator: gpu
+  strategy: auto
+  devices: auto
+  num_nodes: 1
+  precision: 32-true
+  callbacks:
+    - class_path: viscy.representation.embedding_writer.EmbeddingWriter
+      init_args:
+        output_path: /Users/ziwen.liu/Projects/test-time/predict/time_interval_1.zarr
+  inference_mode: true
+model:
+  class_path: viscy.representation.engine.ContrastiveModule
+  init_args:
+    encoder:
+      class_path: viscy.representation.contrastive.ContrastiveEncoder
+      init_args:
+        backbone: convnext_tiny
+        in_channels: 1
+        in_stack_depth: 1
+        stem_kernel_size: [1, 4, 4]
+        stem_stride: [1, 4, 4]
+        embedding_dim: 768
+        projection_dim: 32
+        drop_path_rate: 0.0
+    example_input_array_shape: [1, 1, 1, 128, 128]
+data:
+  class_path: viscy.data.triplet.TripletDataModule
+  init_args:
+    data_path: /Users/ziwen.liu/Downloads/Hela_CTC.zarr
+    tracks_path: /Users/ziwen.liu/Downloads/Hela_CTC.zarr
+    source_channel: DIC
+    z_range: [0, 1]
+    batch_size: 16
+    num_workers: 4
+    initial_yx_patch_size: [128, 128]
+    final_yx_patch_size: [128, 128]
+    time_interval: 1
+    normalizations:
+      - class_path: viscy.transforms.NormalizeSampled
+        init_args:
+          keys: [DIC]
+          level: fov_statistics
+          subtrahend: mean
+          divisor: std
+return_predictions: false
+ckpt_path: /Users/ziwen.liu/Projects/test-time/lightning_logs/time_interval_1/checkpoints/last.ckpt