A few misc questions #55

toloudis · 2021-09-02T18:29:01Z

Let me know if this is the wrong forum and I can move this post.
We are considering making a big move to use ome-zarr. I have some miscellaneous questions/issues on the state of things.

Context: We have lots of on-prem storage but need to move all of it to cloud. We will then need to make large images still accessible to compute that is more distant from the data. A possible expected scenario is
a. microscope-->proprietary file format
b. upload and immediate conversion to open format using aicsimageio
c. scientists do compute and vis on chunked remote open format using aicsimageio
Is it possible to store multiscale zarr groups on different storage categories? For example can we say we want the full resolution level on cold storage but downsampled levels on cloudfront/a more "hot" service?
Is there an assumption on ome-ngff that multiscale resolutions are necessarily halved in x,y at each level? Or can I write any downsampling I want at each level (I have some calculations that forces it to fit in a certain memory footprint, for example). If so, key question: how do I get the data shape at each level?
The current ome-ngff document here https://ngff.openmicroscopy.org/latest/#omero-md refers me to https://docs.openmicroscopy.org/omero/5.6.1/developers/Web/WebGateway.html#imgdata. Does that mean the spec is really the full omero spec contained at the latter link? That latter spec provides for physical pixel dimensions and shape information in top level metadata but it is not shown in the example in the ngff doc page.
We capture a lot of large "multi-scene" files (the dreaded 6th dimension). Let's assume they are not separate wells. In ome-zarr, are we supposed to put them in separate root-level groups in the same store? Does ome provide some recommendation for this apart from just treating them as "different" images?

will-moore · 2021-09-02T22:14:15Z

I can answer some points...

No, the spec doesn't assume that levels are halved in size. As long as "The paths MUST be ordered from largest (i.e. highest resolution) to smallest.
No, the spec is currently just what's on the ngff page. We just reused that omero section for convenience to get up and running quickly, but rendering info is likely to evolve based on community discussions.
In the spec, multiscales is a list with a name for each. Would that work? However, I think most viewers will currently just show the multiscales[0]. There is an ongoing discussion on a "Collections" spec Collections Specification #31 for how to group images. Or if you want to store affine transformations between "scenes", see Transformation Specification #28

Sorry, don't know about storing different resolution levels on different storage media, but I guess if you can map different file paths to different storage then this should be possible??

toloudis · 2021-09-02T22:45:59Z

Regarding different storage media, we know we can do this type of logical mapping with AWS but the question would be how much we can guide an "ome-zarr writer" api to do it for us (this level goes here, and that level goes there) as opposed to building something from scratch. Maybe this is more of a low level zarr question.
We have also discussed storing different projections as multiscales. I.e. we will want to have downsampled volume data, but then might also want to store a "middle slice" thumbnail. So maybe the spec is not general enough for that case. It is also incredibly convenient to know something about the data dimensions and type for each of the multiscales as early as possible (in json metadata), to allow a viewer to decide intelligently how much it should load.
When I tried to implement this in my viewer, it was absolutely necessary to have "physical pixel size" in some form, which is missing from the spec. Additionally, I found that stashing the intensity max and min in window.max and window.min were necessary to avoid an extra traversal of the data in some cases.

joshmoore · 2021-09-03T06:30:07Z

g'morning, @toloudis.

Some additions to @will-moore's thoughts below, but generally 👍 for the questions (and future input on the specs).

2.

Is it possible to store multiscale zarr groups on different storage categories?

In the current spec, no. See #13 for the work to enable it.

the question would be how much we can guide an "ome-zarr writer" api to do it for us
After 5 seconds of pondering, I could see having something like a "remote-array" which you pass in. write_pyramid([cold_array, warm_array, hot_array]).

Maybe this is more of a low level zarr question.

Maybe but I've not seen a proposal or discussion on it to date. (The closest might be fsspec-reference-maker)

3.

No, the spec doesn't assume that levels are halved in size.

Additional work will come on this with the translations (likely v0.4)

might also want to store a "middle slice" thumbnail

hmmm... I wonder if the rendering metadata might not be a place to point to this.

each of the multiscales as early as possible (in json metadata),

hmmm.... a bit hesitant to pull this out of the zarr json metadata and duplicate it in the main block. I wonder if "consolidated_metadata" gets you what you need. (If not, happy to see a proposal)

ome locked and limited conversation to collaborators Sep 3, 2021

joshmoore closed this as completed Sep 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

A few misc questions #55

A few misc questions #55

toloudis commented Sep 2, 2021 •

edited

Loading

will-moore commented Sep 2, 2021

toloudis commented Sep 2, 2021

joshmoore commented Sep 3, 2021

This issue was moved to a discussion.

This issue was moved to a discussion.

A few misc questions #55

A few misc questions #55

Comments

toloudis commented Sep 2, 2021 • edited Loading

will-moore commented Sep 2, 2021

toloudis commented Sep 2, 2021

joshmoore commented Sep 3, 2021

2.

3.

This issue was moved to a discussion.

toloudis commented Sep 2, 2021 •

edited

Loading