Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: S3 streaming support via fsspec #134

Closed
3 tasks done
oruebel opened this issue Oct 17, 2023 · 1 comment · Fixed by #138
Closed
3 tasks done

[Feature]: S3 streaming support via fsspec #134

oruebel opened this issue Oct 17, 2023 · 1 comment · Fixed by #138
Assignees
Labels
category: enhancement improvements of code or code behavior priority: high impacts proper operation or use of feature important to most users
Milestone

Comments

@oruebel
Copy link
Contributor

oruebel commented Oct 17, 2023

What would you like to see added to HDMF-ZARR?

Support streaming using fsspec

Is your feature request related to a problem?

NeurodataWithoutBorders/helpdesk#56 (comment)

What solution would you like?

Add FSStore to support fsspect-based streaming from S3 and/or allow passing of a zarr.Group with the parent group as a read-only store for ZarrIO.__init__(path=...)

https://github.com/hdmf-dev/hdmf-zarr/blob/70bf35b60ed2c2eaae7b12080bc1f4cc3d89ba3e/src/hdmf_zarr/backend.py#L66C1-L69C47

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

@oruebel oruebel added category: enhancement improvements of code or code behavior priority: high impacts proper operation or use of feature important to most users labels Oct 17, 2023
@oruebel oruebel added this to the Next Release milestone Oct 17, 2023
@oruebel oruebel self-assigned this Oct 17, 2023
@alejoe91
Copy link
Collaborator

@oruebel this came up elsewhere (see comment). Note that zarr natively supports reading from the cloud. On my side, this works

import zarr

remote_zarr_location =  = "s3://aind-open-data/ecephys_625749_2022-08-03_15-15-06_nwb_2023-05-16_16-34-55/ecephys_625749_2022-08-03_15-15-06_nwb/ecephys_625749_2022-08-03_15-15-06_experiment1_recording1.nwb.zarr/"

zarr_root = zarr.open(remote_zarr_location)

When trying with the NWBZarrIO wrapper, some links failed to be resolved because the resolve link function assumes the file/folder is sitting on disk. It would be probably an easy fix to make it work! I can give it a try!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior priority: high impacts proper operation or use of feature important to most users
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants