-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define schema for zarr pyramids #91
Comments
Is this in the example below? Would there be
So if there are 4 levels (0, 1, 2, 3),
Instead of just making this an array of strings, I'd suggest one extra level, with name as the one required field. In the example right now, we know that the round is another piece of metadata that matters, and there may be others:
This would be all dimensions, not just the last two? Here, I'd think a string array would be sufficient?
I think you mean the number of channels should be product all but the last two dimensions?
Does it need to be that strict? If someone had "south" and "east", is that forbidden? I think this is good. I could imagine instead of flat lists of channels, if it's more than 3D, there are extra nested levels... but I'm not sure this would make life easier. |
I agree with this currently. A flat array doesn't quite support the functionality we need. With that said, depending on the modality, OME-TIFFs might store this differently. For some datasets, channels might be the same for all other dimensions (ie, z or time), but some might stain with different antibodies for different time points. We need to keep track of these labels. For example, the CyCIF dataset I've been looking at has Ideally we want an object that for each tiled
This would make data-binding straighforward. Use case: In the UI, a user wants to examine all 'DAPI' stains. We expose the rows of the table above (minus the index information), and allow the user to filter this list. When a selection is made, we render the image at
It doesn't need to be this strict. We just want to ensure that the the image data is row-major, with the last two dimensions being y and x coordinates in deck.gl. |
The stuff for the metadata above might be somewhat outside the scope of this issue but I think it is a good place to have it. |
After talking to @ngehlenborg, I think storing the highest resolution array outside of the pyramid makes a lot of sense:
For a particular URL, we could check |
I think this format makes sense and leaves open adding additional data like segmentation in the store underneath the |
Thanks for the feedback @NHPatterson !
Agreed. Should we add this as a required field in Viv? The current fields are required in the sense that nothing will render if not provided, but we could have a very pesky error message saying "field not provided cannot create scale bar". I've been adding this metadata to the
All we really need to make a guess about how to render from here is "max_levels" key. We could make all other fields optional, and then in the UI communicate that there are fields missing, but a user would still be able to look at their data using sliders like in napari. If they forgot to include a piece of metadata, we could ask that they write that to the zarr store or provide an additional object containing this information. That way we can always upgrade someone's use of vitessce, but not block anyone out who hasn't quite gotten their format right yet. |
You could default to |
Ah that would be great. Allow the option to add fields in a form-like entry or cut and paste JSON. |
We should also version this schema in the |
@manzt: can this be closed? |
Yes, we doing our best to follow what is being decided by the OME community so we don't roll our own solution for zarr. https://github.com/ome/omero-ms-zarr/blob/master/spec.md |
Description
To my knowledge, it is quite open-ended how people create zarr array pyramids. I've currently used the following schema, where each array is a found in the
pyramid
group.But others may store their pyramids in different zarr arrays all together.
I don't think much focus has gone into standardizing this because many people tiling with zarr are using napari, and that library affords flexibility by requiring users to load dask arrays into a list first. For each example above, we could load the same thing into napari with:
As long as
viewer.add_image
gets a list of ndarrays, napari knows what to do. We likely can't be as flexible as this, so I would like to iron out some type of standard for creating these arrays. Ultimately in viv we have a very similar pattern becauseconnections
for the zarr loader are just an array of zarr objects.Proposal
In viv, we should make the nested format from above the default. This way we can create pyramids and keep them all in the same named directory. Also, if a dataset has more than a pyramid (i.e. IMS + MxIF) we could create different groups which house these data within the
zarr
store, keeping together data modalities which will be visualized together.Metadata
The metadata for the pyramid should be contained in the
.zattrs
of the00/
array. This is a JSON file and should have the required fields:max_level
# number of total pyramid levels (zero indexed)channel_names
dimension_names
# last two dimensions should be y, x but might have time, channel, etc...Providing the
max_level
will let us know how many connections to establish for the viewer. We should determineimageHeight
,imageWidth
, andtileSize
all from the zarr arraychunks
andshape
data.Additional flexibility
In case any of the metadata are missing or someone has a more bespoke zarr schema, we should allow an config object to manually set these parameters:
The text was updated successfully, but these errors were encountered: