Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"standalone" label images #179

Open
bogovicj opened this issue Mar 16, 2023 · 5 comments
Open

"standalone" label images #179

bogovicj opened this issue Mar 16, 2023 · 5 comments

Comments

@bogovicj
Copy link
Contributor

In conversations elsewhere @sbesson says:

At the moment, the specification enforces that such data must be stored within a well-defined labels hierarchy but moving forward, I could certainly imagine a relaxation of this constraint.

A typical use case that comes immediately to mind is the one where segmentation / classification is performed against a read-only Zarr dataset e.g. public data and the output of this process needs to be stored as a new dataset. At the moment, the structure which is the most compliant with the spirit of the specification is create an artificial labels/<label_name>/ hierarchy under the root even if there is no multiscales image. Assuming we relaxed this constraint to allow label images to be stored at the root of the Zarr dataset, I would argue the image-label metadata would become a critical element to identify what we are dealing with.

I agree that relaxing this constraint could be a good idea.

In my view, the spec currently uses the hierarchy (that labels belong in a child of a multiscales), to communicate that labels are derived from, or correspond to a particular multiscales image. We might consider using coordinate systems to communicate this idea in the future after #138 is merged, and to reference related "raw" image data explicitly, once we decide how to encode references. See #144

Related PRs by @virginiascarlett that started this conversation:

@d-v-b
Copy link
Contributor

d-v-b commented Mar 16, 2023

+1 to not using hierarchy to express a relationship like "raw data, segmented data". The space of dependencies between datasets is sufficiently big that we should be using metadata to express this, rather than directly nesting images inside each other.

@virginiascarlett
Copy link
Contributor

virginiascarlett commented Mar 17, 2023

Yes, sometimes it feels like all these decisions about nesting, hierarchies, and collections are a mere artifact of starting from Zarr. Like, if we were to start from the question, "What should be the fundamental design principle for organizing image data?" we would not necessarily say nested hierarchies. I imagine a more natural fit would be something quite permissive, like (but NOT) the BagIt format, which could essentially mandate two things: a place for data, and a place for metadata.

Regarding label images, it seems like all that's really needed is three things:

  1. Some keyword to indicate that this is a segmentation (or sth else)
  2. Some kind of "source" metadata field, which could be a file path, a URI, or something else
  3. Label correspondences

I spend a lot of my time with the DataCite schema, so I am reminded of a couple of interesting mechanisms there: relatedIdentifier and relationshipType to indicate how two items are related to one another, and relatedMetadataScheme to essentially nest one metadata schema within another. To adapt the latter to OME-NGFF would be a more breaking, but more impactful, change.

I see two options:

  1. A new series of optional JSON objects within multiscales conveying the three pieces of information I listed above, with some more flexibility e.g. relatedItem: foo/bar/my_image, relationshipType: isSourceImage.
  2. Create a mechanism for subschemas. Currently, the OME-NGFF solution to the problem of specific use cases, like segmentation, is optional JSON objects embedded within the main schema. A subtle shift would be to something like relatedMetadataSchema, which abstracts away an entire subschema. This would work equally well for a very minimal subschema, like a tiny "labels" subschema, or something quite large and differently encoded, like an entire OME-XML block. Viewers could simply state they they support certain subschemas, and updates to a particular subschema would not require a refresh of the entire spec.

If you couldn't tell, I am partial to option 2), but I am biased, being more of a librarian than a developer.

@will-moore
Copy link
Member

If/when we decide that we want to identify a stand-alone image as a label, we should probably not use image-label key but use a new key like imageLabel, or even just label or labels since the NGFF naming style is camelCase: https://ngff.openmicroscopy.org/latest/#naming-style

@imagesc-bot
Copy link

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/save-a-single-labels-dataset-into-an-ome-zarr/93505/18

@dstansby
Copy link
Contributor

I'm taking a look at the image-labels part of the spec at the moment, and I don't think there's anything that prevents image-labels from being standalone. In particular:

image-label groups MUST also contain multiscales metadata and the two "datasets" series MUST have the same number of entries.

but there is no definition of what "the two datasets" are here. In addition, specifying a source in the image-labels metadata is optional:

The image-label dictionary MAY contain a source key

So unless I'm missing something, standalone labels datasets are valid?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants