-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for consolidated remote zarr #278
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #278 +/- ##
==========================================
- Coverage 90.75% 90.59% -0.16%
==========================================
Files 36 36
Lines 4630 4670 +40
==========================================
+ Hits 4202 4231 +29
- Misses 428 439 +11
|
I will test it with napari-spatialdata for the consolidated version at https://dl01.irc.ugent.be/spatial/cosmx/data.zarr/, but someone with an S3 bucket should test it with |
Thanks @berombau this is amazing! 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @berombau ! This is super nice. I left a couple of very minor comments below. I think you can wait until @LucaMarconato does a full review to address them. I just wanted to write them down while I was having a look.
Thanks!
cc @cavenel (since we were discussing consolidated metadata today) |
fyi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot for this @berombau ! Approve in principle but this doesn't work for me,
python -m spatialdata peek https://dl01.irc.ugent.be/spatial/cosmx/data.zarr
Error: .zarr storage not found at https://dl01.irc.ugent.be/spatial/cosmx/data.zarr. Please specify a valid OME-NGFF spatial data (.zarr) file. Example "python -m spatialdata peek data.zarr"
although I can click on the link and take a look at the HTML. Do you happen of any idea why? I will try from another network just in case tomorrow
I committed a change to the CLI a forgot to push. This should work now and is also only 10 seconds instead of minutes: |
TLDR; I think it's ready to merge, I'll just wait for someone to have a look at my changes. @berombau thanks for the PR (and sorry for the long silence on this). I tried it with the two urls you gave and the functionality is super cool! Extended comments below. Minor
Bug with .zmetadataI tried to use this on the mibitof dataset, after I reuploaded it in S3 with the consolidated metadata but it didn’t work (it showed an empty dataset without any error). The reason was a bug in Bug in
|
Reported the bug in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nitpick on type and a suggestion that might not be the exact same 😅
path = os.path.join(f._store.path, f.path, "points.parquet") | ||
# cache on remote file needed for parquet reader to work | ||
# TODO: allow reading in the metadata without caching all the data | ||
table = read_parquet("simplecache::" + path if "http" in path else path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
googled this cause i didn't know what it should do and found this option catalyst-cooperative/pudl#1496 (comment) and intake/intake-parquet#18 (comment) , maybe better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would need to benchmark that. I'd do that in a follow up PR. @berombau WDYT?
Fixed. Thanks for the review Giovanni, merging now. |
Closes #275
._store.path
and Path to get to subelements.