-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load each Well to get path to first image for each #119
Conversation
@camFoltz You should be able to test this PR using the app deployed at https://deploy-preview-119--vizarr.netlify.app/ although the netlify build seems to be a bit flaky for me. |
Will need to think about this. @camFoltz: I think I noticed you were using vizarr in a jupyter notebook. If that is the case, you can change the link here to the netlify built (or if developing locally ( vizarr/example/imjoy_plugin.py Line 47 in 1585a03
If you want to try out these changes in a notebook. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea. I think it's a fair optimization to avoid requests for .zarray or .zgroup, since we know were are in an OME-NGFF hierarchy.
return join(wellPath, wellAttrs.well.images[0].path); | ||
} | ||
const wellImagePaths = await Promise.all(wellPaths.map(getImgPath)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't need the group node other than for the attrs, I think we could make a util to handle the specific use case:
// src/utils.ts
const decoder = new TextDecoder();
export function getAttrsOnly<T = unknown>(grp: ZarrGroup, path: string) {
return (grp.store as AsyncStore<ArrayBuffer>)
.getItem(join(grp.path, path, ".zattrs"))
.then((b) => decoder.decode(b))
.then((text) => JSON.parse(text) as T);
}
async function getImgPath(wellPath:string) {
// This loads .zattrs for each well but also tries to load .zarray (404) and .zgroup
const wellAttrs = await getAttrsOnly<{ well: Ome.Well }>(grp, wellPath);
return join(wellPath, wellAttrs.well.images[0].path);
}
const wellImagePaths = await Promise.all(wellPaths.map(getImgPath));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@will-moore any thoughts on this? I am happy to push this PR through and open up a follow up PR regarding this performance enhancement. My main concern is that I don't have many HCS datasets to experiment with, and the IDR links have been somewhat unstable, so it's difficult to test locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started to look at this. I know IDR links have been unstable but https://hms-dbmi.github.io/vizarr/v0.1?source=https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/plates/5966.zarr was working just now (Firefox).
For that big plate it's already quite slow (over a minute) so this might be a killer.
Just trying now with this PR built locally, and it's taking a while (home internet isn't the fastest)!
Nearly 3 mins before it even starts to load chunks!
OK, so it finally loaded after 7 minutes (4618 requests). v0.1 vizarr it was 3297 requests and less that 2 mins. But YMMV.
In either case, most users would probably give up since there's no sign of progress.
This plate is probably a bit too ambitious and maybe shouldn't be a blocker if this PR is a critical fix for @camFoltz.
I won't have time to dig any deeper before next week.
Would that performance enhancement help even before this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://hms-dbmi.github.io/vizarr/v0.1?source=https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/plates/422.zarr worked too, on 2nd attempt (maybe caching is helping).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if the viewer is to be flexible then it should also have flexibility in parsing the structure. Perhaps theres a method in which the loader infers whether or not the underlying positions/arrays/resolutions are of the same name scheme (perhaps after looking at the first 2-3 positions and noticing they're all the same). That way we can have the best of both worlds for now. It is not the most elegant fix, but could help here.
In my case, I would not have any two groups below the column level with the same name, and I am happy to test the performance locally as the datasets scale up. I can generate pretty large arbitrary HCS datasets at this point (now that I have a writer in place here) so I can give this a go.
In the far future we do plan on hosting data on the IDR, so I agree that the performance should be optimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also happy to share / generate these datasets at will for development purposes
Thank you both! Sorry for the delayed response. I will test this out with my datasets this afternoon. |
I've noticed on the OME-HCS blog post that the embedded viewer can take a while to load large well plates. Is this performance hit based upon metadata reading or querying the pixel data? Reading the JSON .attrs shouldn't take too long here correct? If it is based upon calls to |
Can confirm that this fixes the issue that I had mentioned in #118. My dataset was small so I did not notice any performance hit. |
Great question, and something we have discussed somewhat previously in #75. Currently we "open" each resolution independently since the HCS specification does not specify that all wells are identically sized. This leads to a substantial overhead in loading a plate for reasons described in that issue. We (incorrectly) made the assumption that we could reuse metadata for other parsing which ultimately led to #118. |
With Trevor's |
Apologies, I've had a busy week! Thanks for the contributions and input. Merging to support your use-case @camFoltz, and we can discuss separately about further optimizations/assumptions re: metadata to improve performance :) |
Fixes #118.
Instead of just using the first Well of a plate to get the path from Well -> Image, this PR loads every Well to handle the case when each Well has a different path to Image.
However, this makes 3 extra calls for each Well e.g.
A/1/.zattrs
(loads the data we need),A/1/.zgroup
(not needed) andA/1/.zarray
(404). So maybe there's a better way to load the.zattrs
for each Well @manzt ?This will have performance implications for loading Plates, especially large plates, so shouldn't be merged unless we're happy with that.
cc @joshmoore
cc @camFoltz