Merge branch 'main' into data-conventions

janelia-cellmap · Nov 12, 2024 · 4a4312b · 4a4312b
2 parents 64a508b + 975b8b8
commit 4a4312b
Show file tree

Hide file tree

Showing 4 changed files with 207 additions and 3 deletions.
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -19,8 +19,8 @@
 # -- Project information -----------------------------------------------------
 
 project = "DaCapo"
-copyright = "2024, Caroline Malin-Mayor, Jeff Rhoades, Marwan Zouinkhi, William Patton, David Ackerman, Jan Funke"
-author = "Caroline Malin-Mayor, Jeff Rhoades, Marwan Zouinkhi, William Patton, David Ackerman, Jan Funke"
+copyright = "2024, William Patton, Jeff Rhoades, Marwan Zouinkhi,  David Ackerman, Caroline Malin-Mayor, Jan Funke"
+author = " William Patton, Jeff Rhoades, Marwan Zouinkhi, David Ackerman, Caroline Malin-Mayor, Jan Funke"
 
 
 # -- General configuration ---------------------------------------------------

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -10,12 +10,14 @@
    overview
    install
    notebooks/minimal_tutorial
+   unet_architectures
    tutorial
    docker
    aws
    cosem_starter
+   roadmap
    autoapi/index
    cli
 
 .. include:: ../../README.md
-   :parser: myst_parser.sphinx_
+   :parser: myst_parser.sphinx_
diff --git a/docs/source/roadmap.rst b/docs/source/roadmap.rst
@@ -0,0 +1,77 @@
+.. _sec_roadmap:
+
+Road Map
+========
+
+Overview
+--------
+
++-----------------------------------+------------------+-------------------------------+
+| Task                              | Priority         | Current State                 |
++===================================+==================+===============================+
+| Write Documentation               | High             | Started with a long way to go |
++-----------------------------------+------------------+-------------------------------+
+| Simplify configurations           | High             | First draft complete          |
++-----------------------------------+------------------+-------------------------------+
+| Develop Data Conventions          | High             | First draft complete          |
++-----------------------------------+------------------+-------------------------------+
+| Improve Blockwise Post-Processing | Low              | Not Started                   |
++-----------------------------------+------------------+-------------------------------+
+| Simplify Array handling           | High             | Almost done (Up/Down sampling)|
++-----------------------------------+------------------+-------------------------------+
+
+Detailed Road Map
+-----------------
+
+ - [ ] Write Documentation
+     - [ ] tutorials: not more than three, simple and continuously tested (with Github actions, small U-Net on CPU could work)
+         - [x] Basic tutorial: train a U-Net on a toy dataset
+           - [ ] Parametrize the basic tutorial across tasks (instance/semantic segmentation).
+           - [ ] Improve visualizations. Move some simple plotting functions to DaCapo.
+           - [ ] Add a pure pytorch implementation to show benefits side-by-side
+           - [ ] Track performance metrics (e.g., loss, accuracy, etc.) so we can make sure we aren't regressing
+         - [ ] semantic segmentation (LM and EM)
+         - [ ] instance segmentation (LM or EM, can be simulated)
+     - [ ] general documentation of CLI, also API for developers (curate docstrings)
+ - [x] Simplify configurations
+     - [x] Depricate old configs
+     - [x] Add simplified config for simple cases
+     - [x] can still get rid of `*Config` classes
+ - [x] Develop Data Conventions
+     - [x] document conventions
+     - [ ] convenience scripts to convert dataset into our convention (even starting from directories of PNG files)
+ - [ ] Improve Blockwise Post-Processing
+     - [ ] De-duplicate code between “in-memory” and “block-wise” processing
+         - [ ] have only block-wise algorithms, use those also for “in-memory”
+         - [ ] no more “in-memory”, this is just a run with a different Compute Context
+     - [ ] Incorporate `volara` into DaCapo (embargo until January)
+     - [ ] Improve debugging support (logging of chain of commands for reproducible runs)
+     - [ ] Split long post-processing steps into several smaller ones for composability (e.g., support running each step independently if we want to support choosing between `waterz` and `mutex_watershed` for fragment generation or agglomeration)
+ - [x] Incorporate `funlib.persistence` adaptors.
+     - [x] all of those can be adapters:
+         - [x] Binarize Labels into Mask
+         - [x] Scale/Shift intensities
+         - [ ] Up/Down sample (if easily possible)
+         - [ ] DVID source
+         - [x] Datatype conversions
+         - [x] everything else
+     - [x] simplify array configs accordingly
+
+Can Have
+--------
+
+ - [ ] Support other stats stores. Too much time, effort and code was put into the stats and didn’t provide a very nice interface:
+     - [ ] defining variables to store
+     - [ ] efficiently batch writing, storing and reading stats to both files and mongodb
+     - [ ] visualizing stats.
+     - [ ] Jeff and Marwan suggest MLFlow instead of WandB
+ - [ ] Support for slurm clusters
+ - [ ] Support for cloud computing (AWS)
+ - [ ] Lazy loading of dependencies (import takes too long)
+ - [ ] Support bioimage model spec for model dissemination
+
+Non-Goals (for v1.0)
+--------------------
+
+- custom dash board
+- GUI to run experiments
diff --git a/docs/source/unet_architectures.rst b/docs/source/unet_architectures.rst
@@ -0,0 +1,125 @@
+UNet Models
+===========
+
+This section explains how to configure and use UNet models in DaCapo. Several configurations for different types of UNet architectures are demonstrated below.
+
+Overview
+--------
+
+UNet is a popular architecture for image segmentation tasks, particularly in biomedical imaging. DaCapo provides support for configuring various types of UNet models with customizable parameters.
+
+Examples
+--------
+
+Here are some examples of UNet configurations:
+
+1. **Upsample UNet**
+
+.. code-block:: python
+
+    from dacapo.experiments.architectures import CNNectomeUNetConfig
+    from funlib.geometry import Coordinate
+
+    architecture_config = CNNectomeUNetConfig(
+        name="upsample_unet",
+        input_shape=Coordinate(216, 216, 216),
+        eval_shape_increase=Coordinate(72, 72, 72),
+        fmaps_in=1,
+        num_fmaps=12,
+        fmaps_out=72,
+        fmap_inc_factor=6,
+        downsample_factors=[(2, 2, 2), (3, 3, 3), (3, 3, 3)],
+        constant_upsample=True,
+        upsample_factors=[(2, 2, 2)],
+    )
+
+2. **Yoshi UNet**
+
+.. code-block:: python
+
+    yoshi_unet_config = CNNectomeUNetConfig(
+        name="yoshi-unet",
+        input_shape=Coordinate(188, 188, 188),
+        eval_shape_increase=Coordinate(72, 72, 72),
+        fmaps_in=1,
+        num_fmaps=12,
+        fmaps_out=72,
+        fmap_inc_factor=6,
+        downsample_factors=[(2, 2, 2), (2, 2, 2), (2, 2, 2)],
+        constant_upsample=True,
+        upsample_factors=[],
+    )
+
+3. **Attention Upsample UNet**
+
+.. code-block:: python
+
+    attention_upsample_config = CNNectomeUNetConfig(
+        name="attention-upsample-unet",
+        input_shape=Coordinate(216, 216, 216),
+        eval_shape_increase=Coordinate(72, 72, 72),
+        fmaps_in=1,
+        num_fmaps=12,
+        fmaps_out=72,
+        fmap_inc_factor=6,
+        downsample_factors=[(2, 2, 2), (3, 3, 3), (3, 3, 3)],
+        constant_upsample=True,
+        upsample_factors=[(2, 2, 2)],
+        use_attention=True,
+    )
+
+4. **2D UNet**
+
+.. code-block:: python
+
+    architecture_config = CNNectomeUNetConfig(
+        name="2d_unet",
+        input_shape=(2, 132, 132),
+        eval_shape_increase=(8, 32, 32),
+        fmaps_in=2,
+        num_fmaps=8,
+        fmaps_out=8,
+        fmap_inc_factor=2,
+        downsample_factors=[(1, 4, 4), (1, 4, 4)],
+        kernel_size_down=[[(1, 3, 3)] * 2] * 3,
+        kernel_size_up=[[(1, 3, 3)] * 2] * 2,
+        constant_upsample=True,
+        padding="valid",
+    )
+
+5. **UNet with Batch Normalization**
+
+.. code-block:: python
+
+    architecture_config = CNNectomeUNetConfig(
+        name="unet_norm",
+        input_shape=Coordinate(216, 216, 216),
+        eval_shape_increase=Coordinate(72, 72, 72),
+        fmaps_in=1,
+        num_fmaps=2,
+        fmaps_out=2,
+        fmap_inc_factor=2,
+        downsample_factors=[(2, 2, 2), (3, 3, 3), (3, 3, 3)],
+        constant_upsample=True,
+        upsample_factors=[],
+        batch_norm=False,
+    )
+
+Configuration Parameters
+------------------------
+
+- **name**: A unique identifier for the configuration.
+- **input_shape**: The shape of the input data.
+- **eval_shape_increase**: Increase in shape during evaluation.
+- **fmaps_in**: Number of input feature maps.
+- **num_fmaps**: Number of feature maps in the first layer.
+- **fmaps_out**: Number of output feature maps.
+- **fmap_inc_factor**: Factor by which feature maps increase in each layer.
+- **downsample_factors**: Factors by which the input is downsampled at each layer.
+- **upsample_factors**: Factors by which the input is upsampled at each layer.
+- **constant_upsample**: Whether to use constant upsampling.
+- **use_attention**: Whether to use attention mechanisms.
+- **batch_norm**: Whether to use batch normalization.
+- **padding**: Padding mode for convolutional layers.
+
+This page should serve as a reference for configuring UNet models in DaCapo. Adjust the parameters as per your dataset and task requirements.