Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choose smallest np.dtype that handles labels data #116

Merged
merged 9 commits into from
Jun 14, 2022

Conversation

will-moore
Copy link
Member

@will-moore will-moore commented May 27, 2022

Fixes #115

This PR

  • Uses the ROI ID, subtracting the smallest ROI ID, as the mask value
  • Uses the smallest np.dtype that is required for the maximum mask value.

To test: export labels and check dtype (was previously "<i8"):

$ omero zarr export Image:6001240
$ omero zarr masks Image:6001240

$ cat 6001240.zarr/labels/0/0/.zarray | grep dtype
    "dtype": "<i2",

Napari should still show masks correctly, with omero IDs shown when mousing over shapes:

$ napari --plugin napari-ome-zarr 6001240.zarr

Mousing over label at bottom-left:

Screenshot 2022-05-27 at 12 41 35

Check shape ID:
https://idr.openmicroscopy.org/iviewer/?shape=1664664

src/omero_zarr/masks.py Outdated Show resolved Hide resolved
for count, shapes in enumerate(masks):
for shape in shapes:
# Using ROI ID allows stitching label from multiple images
# into a Plate and not creating duplicates from different iamges.
# All shapes will be the same value (color) for each ROI
shape_value = shape.roi.id.val
shape_value = shape.roi.id.val - min_roi_id + 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation will certainly works for the bulk of the use cases. Depending on how the saving of the ROIs, the main downside is that it might create gaps in the label values. There is no requirement for the label values must be consecutive but since we are reindexing them, this is slightly confusing.

Could we create a sorted list of all ROI IDs and use sorted_roi_ids.index(shape.roi.id.val) here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in cff3363

@sbesson sbesson self-requested a review June 10, 2022 16:57
Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested image + mask export on a few samples images of different modalities and number of ROIS and uploaded them to a temporary bucket for visualization:

Overall the logic works well and the label images are using a more adapted data type which allowing visualization.

Two outstanding questions:

  • I selected the examples above to have a range of images with different number of ROIs spannign the data type but looking at the metadata, all label images seem to be using int16

    (conversion) [sbesson@pilot-zarr1-dev omero-cli-zarr_116]$ grep dtype 6001247.zarr/labels/0/0/.zarray 
      "dtype": "<i2",
    (conversion) [sbesson@pilot-zarr1-dev omero-cli-zarr_116]$ grep dtype 4496763.zarr/labels/0/0/.zarray 
      "dtype": "<i2",
    (conversion) [sbesson@pilot-zarr1-dev omero-cli-zarr_116]$ grep dtype 10501752.zarr/labels/0/0/.zarray 
      "dtype": "<i2",
    

    Since6001247.zarr only has 48 label values, I would have expected <i1?

  • the code is currently saving the label array as int64 using the combination of masks and label values and then converting into the expected dtype. I have not computed the performance impact of the casting but would it not be preferable to instantiate the mask using the final data type using the knowledge of the maximum label value?

@will-moore
Copy link
Member Author

Thanks @sbesson. Both those issues should be fixed now.

6001247.zarr labels are exported as "dtype": "|i1",

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @will-moore, regenerated the three images + labels above with the latest version of the code

Also generated an example with two labels

The dtype are now int8 or int16 depending on the number of label values as expected.

(conversion) [sbesson@pilot-zarr1-dev omero-cli-zarr_116]$ grep dtype 6001247.zarr/labels/0/0/.zarray 
    "dtype": "|i1",
(conversion) [sbesson@pilot-zarr1-dev omero-cli-zarr_116]$ grep dtype 4496763.zarr/labels/0/0/.zarray
    "dtype": "<i2",
(conversion) [sbesson@pilot-zarr1-dev omero-cli-zarr_116]$ grep dtype 10501752.zarr/labels/0/0/.zarray 
    "dtype": "<i2",
(conversion) [sbesson@pilot-zarr1-dev omero-cli-zarr_116]$ grep dtype 5514375.zarr/labels/Cell/0/.zarray 
    "dtype": "|i1",
(conversion) [sbesson@pilot-zarr1-dev omero-cli-zarr_116]$ grep dtype 5514375.zarr/labels/Chromosomes/0/.zarray 
    "dtype": "|i1",

Anything else to review or should we go ahead with an omero-cli-zarr patch release with the mask dtype fix?
Also should we move the samples above to the idr bucket and register them in idr.github.io/ome-ngff-samples/? And are there other sample additions/updates we would like to make?

@will-moore
Copy link
Member Author

#107 would be nice to get it. Not tested yet, but I just tried it on top of this branch and it's looking good.

@sbesson
Copy link
Member

sbesson commented Jun 14, 2022

Make complete sense since it also brings improvments to the label image writing. I'll put in on my review list and make use of the samples I generated above as the ground truth

@sbesson sbesson merged commit cc3fdd5 into ome:master Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

labels exported as dtype i8
3 participants