Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load each Well to get path to first image for each #119

Merged
merged 2 commits into from
Sep 26, 2021

Conversation

will-moore
Copy link
Collaborator

Fixes #118.

Instead of just using the first Well of a plate to get the path from Well -> Image, this PR loads every Well to handle the case when each Well has a different path to Image.

However, this makes 3 extra calls for each Well e.g. A/1/.zattrs (loads the data we need), A/1/.zgroup (not needed) and A/1/.zarray (404). So maybe there's a better way to load the .zattrs for each Well @manzt ?

This will have performance implications for loading Plates, especially large plates, so shouldn't be merged unless we're happy with that.

cc @joshmoore
cc @camFoltz

@will-moore
Copy link
Collaborator Author

@camFoltz You should be able to test this PR using the app deployed at https://deploy-preview-119--vizarr.netlify.app/ although the netlify build seems to be a bit flaky for me.

@manzt
Copy link
Member

manzt commented Sep 10, 2021

However, this makes 3 extra calls for each Well e.g. A/1/.zattrs (loads the data we need), A/1/.zgroup (not needed) and A/1/.zarray (404). So maybe there's a better way to load the .zattrs for each Well @manzt ?

Will need to think about this.

@camFoltz: I think I noticed you were using vizarr in a jupyter notebook. If that is the case, you can change the link here to the netlify built (or if developing locally (npm start) "http://localhost:3000")

type="vizarr", src="https://hms-dbmi.github.io/vizarr"

If you want to try out these changes in a notebook.

Copy link
Member

@manzt manzt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea. I think it's a fair optimization to avoid requests for .zarray or .zgroup, since we know were are in an OME-NGFF hierarchy.

return join(wellPath, wellAttrs.well.images[0].path);
}
const wellImagePaths = await Promise.all(wellPaths.map(getImgPath));

Copy link
Member

@manzt manzt Sep 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't need the group node other than for the attrs, I think we could make a util to handle the specific use case:

// src/utils.ts
const decoder = new TextDecoder();
export function getAttrsOnly<T = unknown>(grp: ZarrGroup, path: string) {
  return (grp.store as AsyncStore<ArrayBuffer>)
    .getItem(join(grp.path, path, ".zattrs"))
    .then((b) => decoder.decode(b))
    .then((text) => JSON.parse(text) as T);
}
  async function getImgPath(wellPath:string) {
    // This loads .zattrs for each well but also tries to load .zarray (404) and .zgroup
    const wellAttrs = await getAttrsOnly<{ well: Ome.Well }>(grp, wellPath);
    return join(wellPath, wellAttrs.well.images[0].path);
  }
  const wellImagePaths = await Promise.all(wellPaths.map(getImgPath));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@will-moore any thoughts on this? I am happy to push this PR through and open up a follow up PR regarding this performance enhancement. My main concern is that I don't have many HCS datasets to experiment with, and the IDR links have been somewhat unstable, so it's difficult to test locally.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started to look at this. I know IDR links have been unstable but https://hms-dbmi.github.io/vizarr/v0.1?source=https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/plates/5966.zarr was working just now (Firefox).
For that big plate it's already quite slow (over a minute) so this might be a killer.
Just trying now with this PR built locally, and it's taking a while (home internet isn't the fastest)!
Nearly 3 mins before it even starts to load chunks!
OK, so it finally loaded after 7 minutes (4618 requests). v0.1 vizarr it was 3297 requests and less that 2 mins. But YMMV.
In either case, most users would probably give up since there's no sign of progress.
This plate is probably a bit too ambitious and maybe shouldn't be a blocker if this PR is a critical fix for @camFoltz.
I won't have time to dig any deeper before next week.

Would that performance enhancement help even before this PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@camFoltz camFoltz Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the viewer is to be flexible then it should also have flexibility in parsing the structure. Perhaps theres a method in which the loader infers whether or not the underlying positions/arrays/resolutions are of the same name scheme (perhaps after looking at the first 2-3 positions and noticing they're all the same). That way we can have the best of both worlds for now. It is not the most elegant fix, but could help here.

In my case, I would not have any two groups below the column level with the same name, and I am happy to test the performance locally as the datasets scale up. I can generate pretty large arbitrary HCS datasets at this point (now that I have a writer in place here) so I can give this a go.

In the far future we do plan on hosting data on the IDR, so I agree that the performance should be optimized.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also happy to share / generate these datasets at will for development purposes

@camFoltz
Copy link

Thank you both! Sorry for the delayed response. I will test this out with my datasets this afternoon.

@camFoltz
Copy link

I've noticed on the OME-HCS blog post that the embedded viewer can take a while to load large well plates. Is this performance hit based upon metadata reading or querying the pixel data? Reading the JSON .attrs shouldn't take too long here correct? If it is based upon calls to .zgroup or .zattrs then this could potentially make this a bit worse.

@camFoltz
Copy link

Can confirm that this fixes the issue that I had mentioned in #118. My dataset was small so I did not notice any performance hit.

@manzt
Copy link
Member

manzt commented Sep 13, 2021

I've noticed on the OME-HCS blog post that the embedded viewer can take a while to load large well plates. Is this performance hit based upon metadata reading or querying the pixel data? Reading the JSON .attrs shouldn't take too long here correct? If it is based upon calls to .zgroup or .zattrs then this could potentially make this a bit worse.

Great question, and something we have discussed somewhat previously in #75. Currently we "open" each resolution independently since the HCS specification does not specify that all wells are identically sized. This leads to a substantial overhead in loading a plate for reasons described in that issue.

We (incorrectly) made the assumption that we could reuse metadata for other parsing which ultimately led to #118.

@will-moore
Copy link
Collaborator Author

With Trevor's getAttrsOnly() performance improvement in 7cc8ba0, things are looking a lot better on the 5966.zarr big plate tested above.
Number of requests was reduced to 3535 (from 4618) and the plate loaded in 1.2 mins, so this is comparable to before this PR. I'm not sure how much performance variation is coming from the EBI backend but this PR isn't causing a problem. I tested the other plates at https://www.openmicroscopy.org/2020/12/01/zarr-hcs.html and all looking good, so I think this PR is 👍 .

@manzt
Copy link
Member

manzt commented Sep 26, 2021

Apologies, I've had a busy week! Thanks for the contributions and input. Merging to support your use-case @camFoltz, and we can discuss separately about further optimizations/assumptions re: metadata to improve performance :)

@manzt manzt merged commit 3558e73 into hms-dbmi:main Sep 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OME-HCS Compatibility with vizarr
3 participants