Load each Well to get path to first image for each #119

will-moore · 2021-09-10T11:48:08Z

Fixes #118.

Instead of just using the first Well of a plate to get the path from Well -> Image, this PR loads every Well to handle the case when each Well has a different path to Image.

However, this makes 3 extra calls for each Well e.g. A/1/.zattrs (loads the data we need), A/1/.zgroup (not needed) and A/1/.zarray (404). So maybe there's a better way to load the .zattrs for each Well @manzt ?

This will have performance implications for loading Plates, especially large plates, so shouldn't be merged unless we're happy with that.

cc @joshmoore
cc @camFoltz

will-moore · 2021-09-10T13:59:14Z

@camFoltz You should be able to test this PR using the app deployed at https://deploy-preview-119--vizarr.netlify.app/ although the netlify build seems to be a bit flaky for me.

manzt · 2021-09-10T14:23:53Z

However, this makes 3 extra calls for each Well e.g. A/1/.zattrs (loads the data we need), A/1/.zgroup (not needed) and A/1/.zarray (404). So maybe there's a better way to load the .zattrs for each Well @manzt ?

Will need to think about this.

@camFoltz: I think I noticed you were using vizarr in a jupyter notebook. If that is the case, you can change the link here to the netlify built (or if developing locally (npm start) "http://localhost:3000")

vizarr/example/imjoy_plugin.py

Line 47 in 1585a03

type="vizarr", src="https://hms-dbmi.github.io/vizarr"

If you want to try out these changes in a notebook.

manzt

Just an idea. I think it's a fair optimization to avoid requests for .zarray or .zgroup, since we know were are in an OME-NGFF hierarchy.

manzt · 2021-09-10T14:40:44Z

src/ome.ts

+    return join(wellPath, wellAttrs.well.images[0].path);
+  }
+  const wellImagePaths = await Promise.all(wellPaths.map(getImgPath));
+


Since we don't need the group node other than for the attrs, I think we could make a util to handle the specific use case:

// src/utils.ts const decoder = new TextDecoder(); export function getAttrsOnly<T = unknown>(grp: ZarrGroup, path: string) { return (grp.store as AsyncStore<ArrayBuffer>) .getItem(join(grp.path, path, ".zattrs")) .then((b) => decoder.decode(b)) .then((text) => JSON.parse(text) as T); }

async function getImgPath(wellPath:string) { // This loads .zattrs for each well but also tries to load .zarray (404) and .zgroup const wellAttrs = await getAttrsOnly<{ well: Ome.Well }>(grp, wellPath); return join(wellPath, wellAttrs.well.images[0].path); } const wellImagePaths = await Promise.all(wellPaths.map(getImgPath));

@will-moore any thoughts on this? I am happy to push this PR through and open up a follow up PR regarding this performance enhancement. My main concern is that I don't have many HCS datasets to experiment with, and the IDR links have been somewhat unstable, so it's difficult to test locally.

I started to look at this. I know IDR links have been unstable but https://hms-dbmi.github.io/vizarr/v0.1?source=https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/plates/5966.zarr was working just now (Firefox).
For that big plate it's already quite slow (over a minute) so this might be a killer.
Just trying now with this PR built locally, and it's taking a while (home internet isn't the fastest)!
Nearly 3 mins before it even starts to load chunks!
OK, so it finally loaded after 7 minutes (4618 requests). v0.1 vizarr it was 3297 requests and less that 2 mins. But YMMV.
In either case, most users would probably give up since there's no sign of progress.
This plate is probably a bit too ambitious and maybe shouldn't be a blocker if this PR is a critical fix for @camFoltz.
I won't have time to dig any deeper before next week.

Would that performance enhancement help even before this PR?

https://hms-dbmi.github.io/vizarr/v0.1?source=https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/plates/422.zarr worked too, on 2nd attempt (maybe caching is helping).

I think if the viewer is to be flexible then it should also have flexibility in parsing the structure. Perhaps theres a method in which the loader infers whether or not the underlying positions/arrays/resolutions are of the same name scheme (perhaps after looking at the first 2-3 positions and noticing they're all the same). That way we can have the best of both worlds for now. It is not the most elegant fix, but could help here.

In my case, I would not have any two groups below the column level with the same name, and I am happy to test the performance locally as the datasets scale up. I can generate pretty large arbitrary HCS datasets at this point (now that I have a writer in place here) so I can give this a go.

In the far future we do plan on hosting data on the IDR, so I agree that the performance should be optimized.

Also happy to share / generate these datasets at will for development purposes

camFoltz · 2021-09-13T15:43:21Z

Thank you both! Sorry for the delayed response. I will test this out with my datasets this afternoon.

camFoltz · 2021-09-13T15:50:56Z

I've noticed on the OME-HCS blog post that the embedded viewer can take a while to load large well plates. Is this performance hit based upon metadata reading or querying the pixel data? Reading the JSON .attrs shouldn't take too long here correct? If it is based upon calls to .zgroup or .zattrs then this could potentially make this a bit worse.

camFoltz · 2021-09-13T19:59:52Z

Can confirm that this fixes the issue that I had mentioned in #118. My dataset was small so I did not notice any performance hit.

manzt · 2021-09-13T21:51:01Z

I've noticed on the OME-HCS blog post that the embedded viewer can take a while to load large well plates. Is this performance hit based upon metadata reading or querying the pixel data? Reading the JSON .attrs shouldn't take too long here correct? If it is based upon calls to .zgroup or .zattrs then this could potentially make this a bit worse.

Great question, and something we have discussed somewhat previously in #75. Currently we "open" each resolution independently since the HCS specification does not specify that all wells are identically sized. This leads to a substantial overhead in loading a plate for reasons described in that issue.

We (incorrectly) made the assumption that we could reuse metadata for other parsing which ultimately led to #118.

will-moore · 2021-09-20T10:55:17Z

With Trevor's getAttrsOnly() performance improvement in 7cc8ba0, things are looking a lot better on the 5966.zarr big plate tested above.
Number of requests was reduced to 3535 (from 4618) and the plate loaded in 1.2 mins, so this is comparable to before this PR. I'm not sure how much performance variation is coming from the EBI backend but this PR isn't causing a problem. I tested the other plates at https://www.openmicroscopy.org/2020/12/01/zarr-hcs.html and all looking good, so I think this PR is 👍 .

manzt · 2021-09-26T21:14:10Z

Apologies, I've had a busy week! Thanks for the contributions and input. Merging to support your use-case @camFoltz, and we can discuss separately about further optimizations/assumptions re: metadata to improve performance :)

Load each Well to get path to first image for each

246a49e

manzt reviewed Sep 10, 2021

View reviewed changes

will-moore mentioned this pull request Sep 16, 2021

Collections Specification ome/ngff#31

Open

Use new utils.getAttrsOnly() for loading plate Wells

7cc8ba0

manzt merged commit 3558e73 into hms-dbmi:main Sep 26, 2021

will-moore mentioned this pull request Oct 4, 2022

consolidate Plate to preview image ome/ngff#141

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load each Well to get path to first image for each #119

Load each Well to get path to first image for each #119

will-moore commented Sep 10, 2021

will-moore commented Sep 10, 2021

manzt commented Sep 10, 2021 •

edited

Loading

manzt left a comment

manzt Sep 10, 2021 •

edited

Loading

manzt Sep 17, 2021

will-moore Sep 17, 2021

will-moore Sep 17, 2021

camFoltz Sep 17, 2021 •

edited

Loading

camFoltz Sep 17, 2021

camFoltz commented Sep 13, 2021

camFoltz commented Sep 13, 2021

camFoltz commented Sep 13, 2021

manzt commented Sep 13, 2021 •

edited

Loading

will-moore commented Sep 20, 2021

manzt commented Sep 26, 2021

Load each Well to get path to first image for each #119

Load each Well to get path to first image for each #119

Conversation

will-moore commented Sep 10, 2021

will-moore commented Sep 10, 2021

manzt commented Sep 10, 2021 • edited Loading

manzt left a comment

Choose a reason for hiding this comment

manzt Sep 10, 2021 • edited Loading

Choose a reason for hiding this comment

manzt Sep 17, 2021

Choose a reason for hiding this comment

will-moore Sep 17, 2021

Choose a reason for hiding this comment

will-moore Sep 17, 2021

Choose a reason for hiding this comment

camFoltz Sep 17, 2021 • edited Loading

Choose a reason for hiding this comment

camFoltz Sep 17, 2021

Choose a reason for hiding this comment

camFoltz commented Sep 13, 2021

camFoltz commented Sep 13, 2021

camFoltz commented Sep 13, 2021

manzt commented Sep 13, 2021 • edited Loading

will-moore commented Sep 20, 2021

manzt commented Sep 26, 2021

manzt commented Sep 10, 2021 •

edited

Loading

manzt Sep 10, 2021 •

edited

Loading

camFoltz Sep 17, 2021 •

edited

Loading

manzt commented Sep 13, 2021 •

edited

Loading