ome · constantinpape · Jan 27, 2022 · Sep 7, 2021 · Oct 1, 2021 · Oct 1, 2021
diff --git a/latest/index.bs b/latest/index.bs
@@ -107,6 +107,8 @@ Images {#image-layout}
 
 The following layout describes the expected Zarr hierarchy for images with
 multiple levels of resolutions and optionally associated labels.
+Note that the number of dimensions is variable between 2 and 5 and that axis names are arbitrary, see [[#multiscale-md]] for details.
+For this example we assume an image with 5 dimensions and axes called `t,c,z,y,x`.
 
 ```
 .                             # Root folder, potentially in S3,
@@ -127,7 +129,7 @@ multiple levels of resolutions and optionally associated labels.
     │   │                     # by the "multiscales" metadata, but is often a sequence starting at 0.
     │   │
     │   ├── .zarray           # All image arrays must be up to 5-dimensional
-    │   │                     # with dimension order (t, c, z, y, x).
+    │   │                     # with the axis of type time before type channel, before spatial axes.
     │   │
     │   └─ t                  # Chunks are stored with the nested directory layout.
     │      └─ c               # All but the last chunk element are stored as directories.
@@ -205,49 +207,105 @@ Metadata {#metadata}
 The various `.zattrs` files throughout the above array hierarchy may contain metadata
 keys as specified below for discovering certain types of data, especially images.
 
+"axes" metadata {#axes-md}
+--------------------------
+
+"axes" describes the dimensions of a physical coordinate space. It is a list of dictionaries, where each dictionary describes a dimension (axis) and:
+- MUST contain the field "name" that gives the name for this dimension. The values MUST be unique across all "name" fields.
+- SHOULD contain the field "type". It SHOULD be one of "space", "time" or "channel", but MAY take other values for custom axis types that are not part of this specification yet.
+- SHOULD contain the field "unit" to specify the physical unit of this dimension. The value SHOULD be one of the following strings, which are valid units according to UDUNITS-2.
+    - Units for "space" axes: 'angstrom', 'attometer', 'centimeter', 'decimeter', 'exameter', 'femtometer', 'foot', 'gigameter', 'hectometer', 'inch', 'kilometer', 'megameter', 'meter', 'micrometer', 'mile', 'millimeter', 'nanometer', 'parsec', 'petameter', 'picometer', 'terameter', 'yard', 'yoctometer', 'yottameter', 'zeptometer', 'zettameter'
+    - Units for "time" axes: 'attosecond', 'centisecond', 'day', 'decisecond', 'exasecond', 'femtosecond', 'gigasecond', 'hectosecond', 'hour', 'kilosecond', 'megasecond', 'microsecond', 'millisecond', 'minute', 'nanosecond', 'petasecond', 'picosecond', 'second', 'terasecond', 'yoctosecond', 'yottasecond', 'zeptosecond', 'zettasecond'
+
+If part of [[#multiscale-md]], the length of "axes" MUST be equal to the number of dimensions of the arrays that contain the image data.
+
+
+"transformations" metadata {#trafo-md}
+-------------------------------------
+
+"transformations" describes a series of transformations, e.g. to map discrete data space of an array to the corresponding physical space.
+It is a list of dictionaries. Each entry describes a single transformation and MUST contain the field "type".
+The value of "type" MUST be one of the elements of the `type` column in the table below.
+Additional fields for the entry depend on "type" and are defined by the column `fields`.
+
+| type          | fields | description |
+| ------------- | ------ |------------ |
+| `identity`    |        | identity transformation, is the default transformation and is typically not explicitly defined |
+| `translation` | one of: `"translation":List[float]`, `"path":str` | translation vector, stored either as a list of floats (`"translation"`) or as binary data at a location in this container (`path`). The length of vector defines number of dimensions. |
+| `scale`       | one of: `"scale":List[float]`, `"path":str` | scale vector, stored either as a list of floats (`scale`) or as binary data at a location in this container (`path`). The length of vector defines number of dimensions. |
+
+In addition, the field "axisIndices" MAY be given to specify the subset of axes that the transformation is applied to, leaving other axes unchanged. If not given, the transformation is applied to all axes. The length of "axisIndices" MUST be equal to the dimensionality of the transformation. If "axisIndices" are not given, the dimensionality of the transformation MUST be equal to the number of dimensions of the space that the transformation is applied to.
+If given, "axisIndices" MUST be given in increasing order. It uses zero-based indexing.
+
+The transformations in the list are applied sequentally and in order.
+
+
 "multiscales" metadata {#multiscale-md}
 ---------------------------------------
 
-Metadata about the multiple resolution representations of the image can be
-found under the "multiscales" key in the group-level metadata.
+Metadata about an image can be found under the "multiscales" key in the group-level metadata. Here, image refers to 2 to 5 dimensional data representing image or volumetric data with optional time or channel axes. It is stored in a multiple resolution representation.
 
 "multiscales" contains a list of dictionaries where each entry describes a multiscale image.
 
-Each dictionary contained in the list MUST contain the field "datasets", which is a list of dictionaries describing
-the arrays storing the individual resolution levels.
+Each "multiscales" dictionary MUST contain the field "axes", see [[#axes-md]].
+The length of "axes" must be between 2 and 5 and MUST be equal to the dimensionality of the zarr arrays storing the image data (see "datasets:path").
+The "axes" MUST contain 2 or 3 entries of "type:space" and MAY contain one additional entry of "type:time" and MAY contain one additional entry of "type:channel" or a null / custom type.
+The order of the entries MUST correspond to the order of dimensions of the zarr arrays. In addition, the entries MUST be ordered by "type" where the "time" axis must come first (if present), followed by the  "channel" or custom axis (if present) and the axes of type "space".
+If there are three spatial axes where two correspond to the image plane ("yx") and images are stacked along the other (anisotropic) axis ("z"), the spatial axes SHOULD be ordered as "zyx".
+The values of the "name" fields must be given as a list in the field "_ARRAY_DIMENSIONS" in the attributes (.zattr) of the zarr arrays.
+This ensures compatibility with the [xarray zarr encoding](http://xarray.pydata.org/en/stable/internals/zarr-encoding-spec.html#zarr-encoding).
+E.g. for "axes: [{"name": "z"}, {"name": "y"}, {"name": x}]", the zarr arrays must contain "{"_ARRAY_DIMENSIONS": ["z", "y", "x"]}" in their attributes.
+
+Each "multiscales" dictionary MUST contain the field "datasets", which is a list of dictionaries describing the arrays storing the individual resolution levels.
 Each dictionary in "datasets" MUST contain the field "path", whose value contains the path to the array for this resolution relative
 to the current zarr group. The "path"s MUST be ordered from largest (i.e. highest resolution) to smallest.
 
-It MUST contain the field "axes", which is a list of dimension names of the axes.
-The values MUST be unique and one of `{"t", "c", "z", "y", "x"}`.
-The number of values MUST be the same as the number of dimensions of the arrays corresponding to this image.
-In addition, the "axes" values MUST be repeated in the field "_ARRAY_DIMENSIONS" of all scale groups
-(i.e. groups containing arrays with the multiscale data).
-This ensures compatibility with the [xarray zarr encoding](http://xarray.pydata.org/en/stable/internals/zarr-encoding-spec.html#zarr-encoding).
+Each "datasets" dictionary MUST have the same number of dimensions and MUST NOT have more than 5 dimensions. The number of dimensions and order MUST correspond to number and order of "axes".
+Each dictionary MAY contain the field "transformations", which contains a list of transformations that map the data coordinates to the physical coordinates (as specified by "axes") for this resolution level.
+The transformations are defined according to [[#trafo-md]]. In addition, the transformation types MUST only be `identity`, `translation` or `scale`.
+They MUST contain at most one `scale` transformation per axis that specifies the pixel size in physical units.
+It also MUST contain at most one `translation` per axis that specifies the offset from the origin in physical units.
+If both `scale` and `translation` are given `translation` must be listed after `scale` to ensure that it is given in physical coordinates. If "transformations" is not given, the identity transformation is assumed.
+The requirements (only `scale` and `translation`, restrictions on order) are in place to provide a simple mapping from data coordinates to physical coordinates while
+being compatible with the general transformation spec.
 
-It SHOULD contain the field "name".
+Each "multiscales" dictionary MAY contain the field "transformations", describing transformations that are applied to each resolution level.
+The transformations MUST follow the same rules about allowed types, order, etc. as in "datasets:transformations".
+These transformations are applied after the per resolution level transformations specified in "datasets". They can for example be used to specify the `scale` for a dimension that is the same for all resolutions.
 
-It SHOULD contain the field "version", which indicates the version of the
-multiscale metadata of this image (current version is 0.3).
-
-It SHOULD contain the field "type", which gives the type of downscaling method used to generate the multiscale image pyramid.
+Each "multiscales" dictionary SHOULD contain the field "name". It SHOULD contain the field "version", which indicates the version of the multiscale metadata of this image (current version is 0.4).
 
+Each "multiscales" dictionary SHOULD contain the field "type", which gives the type of downscaling method used to generate the multiscale image pyramid.
 It SHOULD contain the field "metadata", which contains a dictionary with additional information about the downscaling method.
 
-```json
+```
 {
     "multiscales": [
         {
-            "version": "0.3",
+            "version": "0.4",
             "name": "example",
-            "datasets": [
-                {"path": "0"},
-                {"path": "1"},
-                {"path": "2"}
-            ],
             "axes": [
-                "t", "c", "z", "y", "x"
+                {"name": "t", "type": "time", "unit": "millisecond"},
+                {"name": "c", "type": "channel"},
+                {"name": "z", "type": "space", "unit": "micrometer"},
+                {"name": "y", "type": "space", "unit": "micrometer"},
+                {"name": "x", "type": "space", "unit": "micrometer"}
+            ],
+            "datasets": [
+                {
+                    "path": "0",
+                    "transformations": [{"type": "scale", "scale": [0.5, 0.5, 0.5], "axisIndices": [2, 3, 4]}]  # the voxel size for the first scale level (0.5 micrometer)
+                }
+                {
+                    "path": "1",
+                    "transformations": [{"type": "scale", "scale": [1.0, 1.0, 1.0], "axisIndices": [2, 3, 4]}]  # the voxel size for the second scale level (downscaled by a factor of 2 -> 1 micrometer)
+                },
+                {
+                    "path": "2",
+                    "transformations": [{"type": "scale", "scale": [2.0, 2.0, 2.0], "axisIndices": [2, 3, 4]}]  # the voxel size for the second scale level (downscaled by a factor of 4 -> 2 micrometer)
+                }
             ],
+            "transformations": [{"type": "scale", "scale": [0.1], "axisIndices": [0]],  # the time unit (0.1 milliseconds), which is the same for each scale level
             "type": "gaussian",
             "metadata": {                                       # the fields in metadata depend on the downscaling implementation
                 "method": "skimage.transform.pyramid_gaussian", # here, the paramters passed to the skimage function are given