Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OME-HCS Compatibility with vizarr #118

Closed
camFoltz opened this issue Sep 9, 2021 · 4 comments · Fixed by #119
Closed

OME-HCS Compatibility with vizarr #118

camFoltz opened this issue Sep 9, 2021 · 4 comments · Fixed by #119

Comments

@camFoltz
Copy link

camFoltz commented Sep 9, 2021

Hello!

I feel as though I should properly introduce myself as I have been active recently on ImageSC regarding ome-zarr data format.
I am an RA at the Chan-Zuckerberg Biohub working on optimizing our computational pipelines and we are now migrating towards the OME-zarr format for all of our raw/processed data. I would like to help you all optimize these viewers (vizarr + ome-zarr-py) as our team starts to use them heavily, so I may start to post some issues here more frequently.

One Issue that I am coming across right away, and it's due to the lack of insight into how the ome-zarr HCS metadata is parsed, has to do with inflexible key structures. I'll outline below:

Consider the OME-Zarr Dataset tree below, with the level above Fake_Row being Plate.zarr

/
 └── Fake_Row
     ├── Fake_Col_0
     │   └── Pos_000
     │       └── array (1, 6, 24, 2048, 2048) uint16
     ├── Fake_Col_1
     │   └── Pos_001
     │       └── array (1, 6, 24, 2048, 2048) uint16
     ├── Fake_Col_2
     │   └── Pos_002
     │       └── array (1, 6, 24, 2048, 2048) uint16
     └── Fake_Col_3
         └── Pos_003
             └── array (1, 6, 24, 2048, 2048) uint16

The metadata at the associated levels is as such:

"Plate Metadata" at Plate.zarr.attrs

{'plate': {'acquisitions': [{'id': 1,
    'maximumfieldcount': 1,
    'name': 'Dataset',
    'starttime': 0}],
  'columns': [{'name': 'Fake_Col_0'},
   {'name': 'Fake_Col_1'},
   {'name': 'Fake_Col_2'},
   {'name': 'Fake_Col_3'}],
  'field_count': 1,
  'name': 'test',
  'rows': [{'name': 'Fake_Row'}],
  'version': '0.1',
  'wells': [{'path': 'Fake_Row/Fake_Col_0'},
   {'path': 'Fake_Row/Fake_Col_1'},
   {'path': 'Fake_Row/Fake_Col_2'},
   {'path': 'Fake_Row/Fake_Col_3'}]}}

"Well Metadata" at Plate.zarr['Fake_Row']['Fake_Col_{i}'].attrs

{'well': {'images': [{'path': 'Pos_000'}], 'version': '0.1'}} # Fake_Col_0
{'well': {'images': [{'path': 'Pos_001'}], 'version': '0.1'}} # Fake_Col_1
{'well': {'images': [{'path': 'Pos_002'}], 'version': '0.1'}} # Fake_Col_2
{'well': {'images': [{'path': 'Pos_003'}], 'version': '0.1'}} # Fake_Col_3

"omero / multi-scales" metadata at Plate.zarr['Fake_Row']['Fake_Col_0']['Pos_000'].attrs

{'multiscales': [{'datasets': [{'path': 'array'}], 'version': '0.1'}],
 'omero': {'channels': [{'active': True,
    'coefficient': 1.0,
    'color': '808080',
    'family': 'linear',
    'inverted': False,
    'label': 'State0',
    'window': {'end': 1279, 'max': 65535, 'min': 0, 'start': 663}},
   {'active': True,
    'coefficient': 1.0,
    'color': '808080',
    'family': 'linear',
    'inverted': False,
    'label': 'State1',
    'window': {'end': 3718, 'max': 65535, 'min': 0, 'start': 1804}},
   {'active': True,
    'coefficient': 1.0,
    'color': '808080',
    'family': 'linear',
    'inverted': False,
    'label': 'State2',
    'window': {'end': 5128, 'max': 65535, 'min': 0, 'start': 2101}},
   {'active': True,
    'coefficient': 1.0,
    'color': '808080',
    'family': 'linear',
    'inverted': False,
    'label': 'State3',
    'window': {'end': 2595, 'max': 65535, 'min': 0, 'start': 1117}},
   {'active': True,
    'coefficient': 1.0,
    'color': '808080',
    'family': 'linear',
    'inverted': False,
    'label': 'Cy5 - 635~730',
    'window': {'end': 594, 'max': 65535, 'min': 0, 'start': 122}},
   {'active': True,
    'coefficient': 1.0,
    'color': '808080',
    'family': 'linear',
    'inverted': False,
    'label': 'FITC - 474~515',
    'window': {'end': 761, 'max': 65535, 'min': 0, 'start': 141}}],
  'rdefs': {'defaultT': 0,
   'defaultZ': 0,
   'model': 'color',
   'projection': 'normal'},
  'version': 0.1}}

This structure makes the most sense for our data, as we like to keep track of the position index in Pos_000, etc as we move through rows and columns. However I am running into the issue while using vizarr such that if this indexing doesn't start back at 0 (Pos_000) beneath every "well" then the viewer raises this error:

Screen Shot 2021-09-09 at 12 56 36 PM

I am really not sure why it is searching for a path (Fake_Row/Fake_Col_1/Pos_000) that isn't referenced in the metadata (or maybe it is and I missed it?). However when I rename all of the "Pos_00{i}" to "Pos_000" and update the "well" metadata to these names then the viewer works accordingly. Can someone show me where this ome-metadata is being parsed or why this might be the case?

I am happy to help investigating / solving these issues if you are looking for development help. Same goes for ome-zarr-py + napari-ome-zarr (which seems a bit more buggy than vizarr) which would be our preferred viewer.

Thanks for your help here.

Best,
Cam

@manzt
Copy link
Member

manzt commented Sep 9, 2021

Hello there - thank you for your interest in using and improving vizarr! I do not work on ome-zarr-py or napari-ome-zarr directly (pinging: @joshmoore, @will-moore), but I can certainly help investigate this issue.

My guess is that vizzar makes some assumptions about the metadata structure that reflect OME-NGFF published by the IDR (since these have been examples we work with primarily), and our HCS metadata traversal can be improved. If you have example data to share, that would help get to the bottom of this quickly.

Here is the part of the code where metadata-traversal begins. The node passed to the viewer is inspected to see if it adheres to the well, plate, or multiscales attrs

vizarr/src/io.ts

Lines 111 to 124 in 1585a03

if (node instanceof ZarrGroup) {
const attrs = (await node.attrs.asObject()) as Ome.Attrs;
if ('plate' in attrs) {
return loadPlate(config, node, attrs.plate);
}
if ('well' in attrs) {
return loadWell(config, node, attrs.well);
}
if ('omero' in attrs) {
return loadOmeroMultiscales(config, node, attrs);
}

@will-moore
Copy link
Collaborator

Hi Cam,
So I guess the loadPlate() code (

export async function loadPlate(config: ImageLayerConfig, grp: ZarrGroup, plateAttrs: Ome.Plate): Promise<SourceData> {
) is making some assumptions in order to reduce the number of calls it has to make.

We only load the first Well to get the path from a Well to an Image (instead of loading all the Wells in the plate).

const wellAttrs = (await grp.getItem(wellPaths[0]).then((g) => g.attrs.asObject())) as Ome.Attrs;

That 'imgPath' is then applied to all the other wellPaths that we have for the plate, so that we can load an Image for each Well.

wellPaths.map((p) => [p, join(p, imgPath, resolution)]),

For this to work for your data, we'd need to get an imgPath for each Well.
This is just one more JSON request for each Well, which is probably not going to be a killer for a small-medium sized plate.
Unfortunately we don't have any way to know whether a plate uses the same path for every well without loading them all. So we'd have to do this for every plate in vizarr, and I worry that the sheer number of calls for a 384-well plate is going to slow things down. That might depend on the back-end @joshmoore?

Do you want to try coding this up?
Running vizarr locally is simply a case of $ npm install and $ npm start.
Or I could open a PR and let you test the build from that?

Will.

@will-moore
Copy link
Collaborator

Part of the problem is the complexity of the HCS spec, which others have also noted.
We are currently discussing the evolution of this into a more generic "Collections" spec at ome/ngff#31
For example, see the outline at ome/ngff#31 (comment)
This should help address the issues we're having here because all the paths to images and other info you need are in a single .zattrs blob, instead of being distributed into many individual blobs for each Well.

If you're able to contribute to that discussion it would be great to find improvements that work for as many use-cases as possible.
Thanks,
Will.

@camFoltz
Copy link
Author

Thanks @will-moore I see the PR but if you still need more help let me know! I will look over the discussion and contribute with our team's inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants