-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HCS group layout #9
Comments
Where does the metadata for intermediate levels (plate-acquisition, well) go? For option 2 it can go into |
Following today's discussion and for completeness, pasting here a first list of candidates for specifying HCS metadata as defined in the OME schema:
An advantage of the second layout where each concept ( For the first layout, my assumption is that minimally:
In all cases, as discussed today, testing the layout + metadata in the context of large HCS datasets of typically several 10K images maybe with latency involved will be necessary to ensure the proposed extension remains performant for the typical queries/manipulations. |
Reading through this with @chris-allan, option 2 seems like the way to go. It might be worth splitting the well index into two levels in the hierarchy (row and column), especially for 1536 well plates. With option 2, we would lose some of the flexibility that Allowing but not parsing or enforcing arbitrary group names sounds fine, since the actual indexes would be described in the |
Thanks for the input @melissalinkert and @chris-allan. Option 2 above was indeed representing the wells as an single group with some 2D indexes (typically Option 3: plate/acquisition/row/column/well sample
|
The option 2 hierarchy makes sense, especially from a metadata perspective (note I'm assuming instead of |
Re-mentioning a requirement that occurred to me in the bioformats2raw context here: currently the multiscale group name is the series number in a single "fileset" group, roughly equivalent to option 1 here. That allows mapping any of the metadata in OME-XML based on the series index. If we push images down below wells with new naming, we will need a new heuristic, or need to encoded the series number in the ngff metadata, or encode all of the OME-XML metadata in the ngff metadata. Just a thought. It does make me wonder though if there isn't an option N which some of this:
We could/would then still use one of the layouts above as the preferred/default but it would provide some flexibility if need be. (The library, I think, would default to consuming the metadata as the definitive source, but the layout would make it more user friendly.) |
I am now wondering if the example above image anywhere could be a mechanism to organise images (not in a plate) in a way that user wants to see them. Such feature request has been mentioned many times over the years. |
From today's meeting, the decision was to start implementing the third option (Issue description edited to link to the relevant comment) and update https://github.com/ome/omero-cli-zarr/ to create a Zarr representation of the first illumination corrected plate of the |
Updated Option 3Shows the output that is currently being exported from OMERO by
|
I guess a couple of questions about the plate metadata. |
Couple of additional thoughts/issues while working on an first draft of spec:
Also trying to think of these keys in terms of MUST vs SHOULD vs MAY as per RFC 2119. Happy to skip this for a version version but a naive assumption would be to have anything that can be recomputed from other keys at a SHOULD level rather than a MUST (e.g. column size, names). |
Maybe a bit late but have you looked at the cellH5 format? |
Here's the figure @jkh1 mentioned (there's no direct online link available): |
Definitely not too late and thanks for bringing other known hierarchical representations into the discussion. As a preamble, CellH5 includes several concept including features or objects which are defined part of the mid-term goal as discussed during today's community call but outside the scope of this issue/extension. Focusing primarily on the HCS specification, I tried to summarize my understanding of the mappings between the hierarchical structures defined in the current proposal (mentioned at the 2020-10-29 call), in the CellH5 format as well as in the 2016-06 OME schema for reference:
From my side, what this means is that there is no substantial conceptual gap under the |
Following the support for a
multiscales
andmasks
, the focus is now shifting to trying to represent HCS data in the NGFF spec. An initial prototype of plate layout had already been implemented in the context of the OME Community Meeting 2020 - https://github.com/ome/omero-guide-cellprofiler/blob/3a441e5594b80e8e95e5e473baa8da140db03656/notebooks/idr0002_zarr.ipynb.Overall it feels like the HCS specification should primarily revolve around:
multiscales
modelling the HCS conceptThe number of effective dimensions currently supported by the OME model and the various HCS datasets produced by the community are:
Plate
,Plate Acquisition
(also calledPlate Run
),Well
,Well Sample
(also calledField of View
). The first question is whether how flat vs deep the Zarr folder hierarchy should be to represent these concepts. The two layout below are put for discussion.All names, layout and content are still up for discussion at this stage.
Option 1: single group
This is the closest to the implementation mentioned above where a series of multiscale images aka Zarr groups (potentially with labels) are collected within a plate Zarr group. Each multiscale image represents a field of view within a well within a plate acquisition with its metadata specified in a dedicated
well sample
specification.Pros:
Cons:
Option 2: plate/acquisition/well/well sample
In this proposal, three groups are inserted above the image group: plate, plate acquisition and well. Each multiscale image represents a field of view within a well within a plate acquisition. The full HCS metadata is distributed across the
plate acquisition
,well
andwell sample
specifications.Pros:
Cons:
Option 3: plate/acquisition/row/column/well sample
See https://github.com/ome/omero-ms-zarr/issues/73#issuecomment-706770955
Group names
In both example above,
0, 1,...n
are used as the generic group names. Using more explicit informative names reflecting the acquisition e.g.A1, A2, ...
orA1 Field 1, A1 Field 2...
is definitely a possibility. Given the number of variants found in the ecosystem, I would avoid trying to enforce these names and/or rely on them. Instead the corresponding metadata (typicallyrow
,column
,index
) should be unambiguously specified within the.zattrs
of the relevant group(s).The text was updated successfully, but these errors were encountered: