Extract attributes from ome-zarr rather than from `metadata` (whenever possible) #351

jluethi · 2023-03-15T08:36:39Z

Main part: Read the relevant metadata from the .zattrs files into the metadata dictionary, should require very little compute.

Having it as a separate task may be the easiest first implementation. But longer-term, would be neat to have this more abstract, e.g. that any task that takes an OME-Zarr as its input can use that functionality if not metadata dictionary is provided.

tcompa · 2023-07-10T12:39:49Z

Ref:

tcompa · 2023-09-18T11:56:35Z

Some notes, based on recent discussion on this issue.

When looking at #212 (comment), the current issue concerns this item

Store dataset information (e.g. coarsening_xy and the number of pyramid levels). This use is now deferred to Extract attributes from ome-zarr rather than from metadata (whenever possible) #351; and then this use should be deprecated, and these parameters should not belong to Dataset.meta.

What are the parameters that we'll need to extract?

num_levels
coarsening_xy

Other metadata keys that are not related to this issue:

original_paths
image_extension
image_glob_patterns
copy_ome_zarr
Any plate/well/image component list

Note: this issue should also make the scope of get_parameters_from_metadata smaller, since this helper function should not be used to retrieve parameters like coarsening_xy and num_levels.

tcompa · 2023-09-18T12:03:06Z

Another note, about how to extract num_levels and coarsening_xy from the an existing ome-zarr image group.

The zattrs file could look like

{
    "multiscales": [
        {
            "axes": [
                {
                    "name": "c",
                    "type": "channel"
                },
                {
                    "name": "z",
                    "type": "space",
                    "unit": "micrometer"
                },
                {
                    "name": "y",
                    "type": "space",
                    "unit": "micrometer"
                },
                {
                    "name": "x",
                    "type": "space",
                    "unit": "micrometer"
                }
            ],
            "datasets": [
                {
                    "coordinateTransformations": [
                        {
                            "scale": [
                                1,
                                1.0,
                                0.1625,
                                0.1625
                            ],
                            "type": "scale"
                        }
                    ],
                    "path": "0"
                },
                {
                    "coordinateTransformations": [
                        {
                            "scale": [
                                1,
                                1.0,
                                0.325,
                                0.325
                            ],
                            "type": "scale"
                        }
                    ],
                    "path": "1"
                },
                {
                    "coordinateTransformations": [
                        {
                            "scale": [
                                1,
                                1.0,
                                0.65,
                                0.65
                            ],
                            "type": "scale"
                        }
                    ],
                    "path": "2"
                },
                {
                    "coordinateTransformations": [
                        {
                            "scale": [
                                1,
                                1.0,
                                1.3,
                                1.3
                            ],
                            "type": "scale"
                        }
                    ],
                    "path": "3"
                },
                {
                    "coordinateTransformations": [
                        {
                            "scale": [
                                1,
                                1.0,
                                2.6,
                                2.6
                            ],
                            "type": "scale"
                        }
                    ],
                    "path": "4"
                }
            ],
            "version": "0.4"
        }
    ],
    "omero": {
        "channels": [
            {
                "color": "00FFFF",
                "label": "DAPI",
                "wavelength_id": "A01_C01",
                "window": {
                    "end": 800,
                    "max": 65535,
                    "min": 0,
                    "start": 110
                }
            }
        ],
        "id": 1,
        "name": "TBD",
        "version": "0.4"
    }
}

The parameter extraction then should proceed as follows:

Check that there exists a single multiscale entry, otherwise fail with an informative error message
Count the datasets entries, to determine num_levels
If num_levels=0, do not set coarsening_xy (or set it to some arbitrary value)
If num_levels>0, compute the YX rescaling between subsequent dataset entries and extract the coarsening_xy value
If coarsening is not homogeneous in X and Y, fail with an informative error message
If coarsening is not homogeneous across pyramid levels, fail with an informative error message
Since Support global scale transformations at the multiscale level #50 is still open, we should also check that no global scale transformations are present - and fail otherwise (with an informative error message)

This kind of function should belong to lib_zattrs_utils.py (or a renamed version of this module).

tcompa · 2023-09-18T12:42:23Z

A final note, which is just a reminder: this kind of operations (reading ome-ngff metadata and extracting a few parameters) would greatly benefit from a structured model, ref

jluethi · 2023-09-18T14:00:35Z

Great summary!

num_levels=0

Wouldn't that be for num_levels=1? num_levels=0 should probably raise an error, no? Then also num_levels>1

And on coarsening: Do we round to nearest integer? I wonder whether there's a potential for floating point imprecision here. Or at least make the check for homogeneous values robust again minor differences.
For its usage afterwards, an integer would be much easier to understand than a float.

tcompa · 2023-09-19T06:46:46Z

Wouldn't that be for num_levels=1?

Of course, thanks. This comes the bias of 0-based indexing.. ;)

And on coarsening: Do we round to nearest integer? I wonder whether there's a potential for floating point imprecision here. Or at least make the check for homogeneous values robust again minor differences.
For its usage afterwards, an integer would be much easier to understand than a float.

It clearly must be an integer, and I don't see obvious problem with the "nearest" rounding (e.g. via Python round() function).

If we make some (reasonable, I'd say) assumptions that:

The coarsening factor is expected to be a small integer (e.g. smaller than 10, or 20);
Reasonable physical units (e.g. micrometer) are chosen for the pixel sizes, so that they are numbers close to 1 (say between 0.001 and 1000);
The ground truth is an actual integer (e.g. there was no bug in writing the ome-ngff metadata);

then I don't see room for ambiguity.

Note, for instance, that in #76 we introduced a floating point imprecision of 3e-17 (relative to 0.1625). Let's hugely exaggerate this, and make it 1e-3 (which is a huge 0.6% imprecision on the pixel size); even in this case, we are still safe:

>>> ratio = (0.325 + 1e-3) / (0.1625 - 1e-3)
>>> ratio
2.018575851393189
>>> round(ratio)
2

jluethi · 2023-09-19T07:58:52Z

Sounds good. Rules 1 & 3 make a lot of sense.

Reasonable physical units (e.g. micrometer) are chosen for the pixel sizes, so that they are numbers close to 1 (say between 0.001 and 1000);

I have no idea how generalizable this will be in the OME-Zarrs. Would always work for ours. But is there a strong need to have such a constraint? Or can we just have a warning here?

tcompa · 2023-09-19T08:32:17Z

To clarify: I am not in proposing to enforce any of these assumptions 1-2-3 as an actual constraint (although we could choose to raise some warnings when appropriate). I was rather listing a set of assumptions on "typical" use cases that remove any doubt on rounding robustness.

Reasonable physical units (e.g. micrometer) are chosen for the pixel sizes, so that they are numbers close to 1 (say between 0.001 and 1000);

I have no idea how generalizable this will be in the OME-Zarrs. Would always work for ours. But is there a strong need to have such a constraint? Or can we just have a warning here?

This is not a Fractal issue, but a general one. If we express the pixel size with a huge number of relevant digits which are all meant to be there (e.g. the pixel size is 1 light year plus one micrometer), we cannot always expect arithmetic to work. In our case, we perform a single floating-point operation here (taking the ratio of two pixel sizes) and then only use integers for comparisons, so that I don't expect any issue even in the light-year-plus-micrometer scenario, but I was just saying that if pixel sizes are in a "reasonable" range then we are even safer.

#150)

jluethi · 2023-09-19T10:29:04Z

Sounds good. Happy with warnings for some of those scenarios :)

jluethi added this to Fractal Project Management Mar 15, 2023

github-project-automation bot moved this to TODO in Fractal Project Management Mar 15, 2023

jluethi mentioned this issue Apr 14, 2023

When should tasks write metadata files? Should they write them to disk at all? fractal-analytics-platform/fractal-server#621

Closed

4 tasks

tcompa added the Priority Important, but not the highest priority label May 30, 2023

jluethi mentioned this issue Jun 12, 2023

Add Pydantic model for Channels #386

Closed

tcompa changed the title ~~Add functionality (a task?) to start a workflow with an existing OME-Zarr file~~ Bootstrap metadata from an existing OME-Zarr file Jul 10, 2023

tcompa mentioned this issue Jul 10, 2023

Workflow submission scenarios fractal-analytics-platform/fractal-server#261

Closed

tcompa mentioned this issue Jul 17, 2023

Add overwrite argument to cellpose task #458

Closed

tcompa added the flexibility Support more workflow-execution use cases label Sep 15, 2023

tcompa changed the title ~~Bootstrap metadata from an existing OME-Zarr file~~ Extract attributes from ome-zarr rathern than from metadata (whenever possible) Sep 15, 2023

tcompa mentioned this issue Sep 15, 2023

Each attribute in metadata needs to exist somewhere else in the OME-Zarr file #212

Closed

tcompa changed the title ~~Extract attributes from ome-zarr rathern than from metadata (whenever possible)~~ Extract attributes from ome-zarr rather than from metadata (whenever possible) Sep 15, 2023

tcompa added High Priority Current Priorities & Blocking Issues and removed Priority Important, but not the highest priority labels Sep 18, 2023

tcompa added a commit that referenced this issue Sep 19, 2023

Use new NgffImage model in yokogawa_to_ome_zarr task (ref #351, ref

30d23d5

#150)

tcompa mentioned this issue Sep 19, 2023

Introduce Pydantic models for NGFF metadata #528

Merged

7 tasks

tcompa linked a pull request Sep 19, 2023 that will close this issue

Introduce Pydantic models for NGFF metadata #528

Merged

7 tasks

tcompa closed this as completed in #528 Sep 27, 2023

github-project-automation bot moved this from TODO to Done in Fractal Project Management Sep 27, 2023

jluethi removed this from Fractal Project Management Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract attributes from ome-zarr rather than from `metadata` (whenever possible) #351

Extract attributes from ome-zarr rather than from `metadata` (whenever possible) #351

jluethi commented Mar 15, 2023

tcompa commented Jul 10, 2023

tcompa commented Sep 18, 2023

tcompa commented Sep 18, 2023

tcompa commented Sep 18, 2023

jluethi commented Sep 18, 2023

tcompa commented Sep 19, 2023

jluethi commented Sep 19, 2023

tcompa commented Sep 19, 2023

jluethi commented Sep 19, 2023

Extract attributes from ome-zarr rather than from metadata (whenever possible) #351

Extract attributes from ome-zarr rather than from metadata (whenever possible) #351

Comments

jluethi commented Mar 15, 2023

tcompa commented Jul 10, 2023

tcompa commented Sep 18, 2023

tcompa commented Sep 18, 2023

tcompa commented Sep 18, 2023

jluethi commented Sep 18, 2023

tcompa commented Sep 19, 2023

jluethi commented Sep 19, 2023

tcompa commented Sep 19, 2023

jluethi commented Sep 19, 2023

Extract attributes from ome-zarr rather than from `metadata` (whenever possible) #351

Extract attributes from ome-zarr rather than from `metadata` (whenever possible) #351