Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract attributes from ome-zarr rather than from metadata (whenever possible) #351

Closed
jluethi opened this issue Mar 15, 2023 · 9 comments · Fixed by #528
Closed

Extract attributes from ome-zarr rather than from metadata (whenever possible) #351

jluethi opened this issue Mar 15, 2023 · 9 comments · Fixed by #528
Labels
flexibility Support more workflow-execution use cases High Priority Current Priorities & Blocking Issues

Comments

@jluethi
Copy link
Collaborator

jluethi commented Mar 15, 2023

Main part: Read the relevant metadata from the .zattrs files into the metadata dictionary, should require very little compute.

Having it as a separate task may be the easiest first implementation. But longer-term, would be neat to have this more abstract, e.g. that any task that takes an OME-Zarr as its input can use that functionality if not metadata dictionary is provided.

@tcompa tcompa added the Priority Important, but not the highest priority label May 30, 2023
@tcompa tcompa changed the title Add functionality (a task?) to start a workflow with an existing OME-Zarr file Bootstrap metadata from an existing OME-Zarr file Jul 10, 2023
@tcompa tcompa added the flexibility Support more workflow-execution use cases label Sep 15, 2023
@tcompa tcompa changed the title Bootstrap metadata from an existing OME-Zarr file Extract attributes from ome-zarr rathern than from metadata (whenever possible) Sep 15, 2023
@tcompa tcompa changed the title Extract attributes from ome-zarr rathern than from metadata (whenever possible) Extract attributes from ome-zarr rather than from metadata (whenever possible) Sep 15, 2023
@tcompa tcompa added High Priority Current Priorities & Blocking Issues and removed Priority Important, but not the highest priority labels Sep 18, 2023
@tcompa
Copy link
Collaborator

tcompa commented Sep 18, 2023

Some notes, based on recent discussion on this issue.

When looking at #212 (comment), the current issue concerns this item

  1. Store dataset information (e.g. coarsening_xy and the number of pyramid levels). This use is now deferred to Extract attributes from ome-zarr rather than from metadata (whenever possible) #351; and then this use should be deprecated, and these parameters should not belong to Dataset.meta.

What are the parameters that we'll need to extract?

  • num_levels
  • coarsening_xy

Other metadata keys that are not related to this issue:

  • original_paths
  • image_extension
  • image_glob_patterns
  • copy_ome_zarr
  • Any plate/well/image component list

Note: this issue should also make the scope of get_parameters_from_metadata smaller, since this helper function should not be used to retrieve parameters like coarsening_xy and num_levels.

@tcompa
Copy link
Collaborator

tcompa commented Sep 18, 2023

Another note, about how to extract num_levels and coarsening_xy from the an existing ome-zarr image group.

The zattrs file could look like

{
    "multiscales": [
        {
            "axes": [
                {
                    "name": "c",
                    "type": "channel"
                },
                {
                    "name": "z",
                    "type": "space",
                    "unit": "micrometer"
                },
                {
                    "name": "y",
                    "type": "space",
                    "unit": "micrometer"
                },
                {
                    "name": "x",
                    "type": "space",
                    "unit": "micrometer"
                }
            ],
            "datasets": [
                {
                    "coordinateTransformations": [
                        {
                            "scale": [
                                1,
                                1.0,
                                0.1625,
                                0.1625
                            ],
                            "type": "scale"
                        }
                    ],
                    "path": "0"
                },
                {
                    "coordinateTransformations": [
                        {
                            "scale": [
                                1,
                                1.0,
                                0.325,
                                0.325
                            ],
                            "type": "scale"
                        }
                    ],
                    "path": "1"
                },
                {
                    "coordinateTransformations": [
                        {
                            "scale": [
                                1,
                                1.0,
                                0.65,
                                0.65
                            ],
                            "type": "scale"
                        }
                    ],
                    "path": "2"
                },
                {
                    "coordinateTransformations": [
                        {
                            "scale": [
                                1,
                                1.0,
                                1.3,
                                1.3
                            ],
                            "type": "scale"
                        }
                    ],
                    "path": "3"
                },
                {
                    "coordinateTransformations": [
                        {
                            "scale": [
                                1,
                                1.0,
                                2.6,
                                2.6
                            ],
                            "type": "scale"
                        }
                    ],
                    "path": "4"
                }
            ],
            "version": "0.4"
        }
    ],
    "omero": {
        "channels": [
            {
                "color": "00FFFF",
                "label": "DAPI",
                "wavelength_id": "A01_C01",
                "window": {
                    "end": 800,
                    "max": 65535,
                    "min": 0,
                    "start": 110
                }
            }
        ],
        "id": 1,
        "name": "TBD",
        "version": "0.4"
    }
}

The parameter extraction then should proceed as follows:

  • Check that there exists a single multiscale entry, otherwise fail with an informative error message
  • Count the datasets entries, to determine num_levels
  • If num_levels=0, do not set coarsening_xy (or set it to some arbitrary value)
  • If num_levels>0, compute the YX rescaling between subsequent dataset entries and extract the coarsening_xy value
  • If coarsening is not homogeneous in X and Y, fail with an informative error message
  • If coarsening is not homogeneous across pyramid levels, fail with an informative error message
  • Since Support global scale transformations at the multiscale level #50 is still open, we should also check that no global scale transformations are present - and fail otherwise (with an informative error message)

This kind of function should belong to lib_zattrs_utils.py (or a renamed version of this module).

@tcompa
Copy link
Collaborator

tcompa commented Sep 18, 2023

A final note, which is just a reminder: this kind of operations (reading ome-ngff metadata and extracting a few parameters) would greatly benefit from a structured model, ref

@jluethi
Copy link
Collaborator Author

jluethi commented Sep 18, 2023

Great summary!

num_levels=0

Wouldn't that be for num_levels=1? num_levels=0 should probably raise an error, no? Then also num_levels>1

And on coarsening: Do we round to nearest integer? I wonder whether there's a potential for floating point imprecision here. Or at least make the check for homogeneous values robust again minor differences.
For its usage afterwards, an integer would be much easier to understand than a float.

@tcompa
Copy link
Collaborator

tcompa commented Sep 19, 2023

Wouldn't that be for num_levels=1?

Of course, thanks. This comes the bias of 0-based indexing.. ;)

And on coarsening: Do we round to nearest integer? I wonder whether there's a potential for floating point imprecision here. Or at least make the check for homogeneous values robust again minor differences.
For its usage afterwards, an integer would be much easier to understand than a float.

It clearly must be an integer, and I don't see obvious problem with the "nearest" rounding (e.g. via Python round() function).

If we make some (reasonable, I'd say) assumptions that:

  1. The coarsening factor is expected to be a small integer (e.g. smaller than 10, or 20);
  2. Reasonable physical units (e.g. micrometer) are chosen for the pixel sizes, so that they are numbers close to 1 (say between 0.001 and 1000);
  3. The ground truth is an actual integer (e.g. there was no bug in writing the ome-ngff metadata);

then I don't see room for ambiguity.

Note, for instance, that in #76 we introduced a floating point imprecision of 3e-17 (relative to 0.1625). Let's hugely exaggerate this, and make it 1e-3 (which is a huge 0.6% imprecision on the pixel size); even in this case, we are still safe:

>>> ratio = (0.325 + 1e-3) / (0.1625 - 1e-3)
>>> ratio
2.018575851393189
>>> round(ratio)
2

@jluethi
Copy link
Collaborator Author

jluethi commented Sep 19, 2023

Sounds good. Rules 1 & 3 make a lot of sense.

Reasonable physical units (e.g. micrometer) are chosen for the pixel sizes, so that they are numbers close to 1 (say between 0.001 and 1000);

I have no idea how generalizable this will be in the OME-Zarrs. Would always work for ours. But is there a strong need to have such a constraint? Or can we just have a warning here?

@tcompa
Copy link
Collaborator

tcompa commented Sep 19, 2023

To clarify: I am not in proposing to enforce any of these assumptions 1-2-3 as an actual constraint (although we could choose to raise some warnings when appropriate). I was rather listing a set of assumptions on "typical" use cases that remove any doubt on rounding robustness.

Reasonable physical units (e.g. micrometer) are chosen for the pixel sizes, so that they are numbers close to 1 (say between 0.001 and 1000);

I have no idea how generalizable this will be in the OME-Zarrs. Would always work for ours. But is there a strong need to have such a constraint? Or can we just have a warning here?

This is not a Fractal issue, but a general one. If we express the pixel size with a huge number of relevant digits which are all meant to be there (e.g. the pixel size is 1 light year plus one micrometer), we cannot always expect arithmetic to work. In our case, we perform a single floating-point operation here (taking the ratio of two pixel sizes) and then only use integers for comparisons, so that I don't expect any issue even in the light-year-plus-micrometer scenario, but I was just saying that if pixel sizes are in a "reasonable" range then we are even safer.

@jluethi
Copy link
Collaborator Author

jluethi commented Sep 19, 2023

Sounds good. Happy with warnings for some of those scenarios :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flexibility Support more workflow-execution use cases High Priority Current Priorities & Blocking Issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants