Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested cloud storage #395

Closed
joshmoore opened this issue Jan 21, 2019 · 11 comments
Closed

Nested cloud storage #395

joshmoore opened this issue Jan 21, 2019 · 11 comments

Comments

@joshmoore
Copy link
Member

Following on from #177, is it possible to also use the NestedDirectoryStorage with s3fs? I'm currently looking at code roughly equivalent to:

store = s3fs.S3Map(
    root='example.zarr',
    s3=s3, check=True, create=True
)

ds.to_zarr(store=store, mode='w', group=name)

and am missing if there's a parameter and/or a delegation pattern that I could use to go from . to / notation.

see also:

@rabernat
Copy link
Contributor

This is a good idea.

In theory, nesting and the storage medium are orthogonal. Nesting has to do with the structure of the keys. We should be able to make any store a nested store.

Just out of curiosity, what's your motivation here? In my understanding, object stores like S3 don't actually have directories...so what would be the advantage of nesting?

@jakirkham
Copy link
Member

Right the NestedDirectoryStore merely reinterprets the .s as nested directories. The keys remain the same as in DirectoryStore.

@joshmoore
Copy link
Member Author

@jakirkham: primarily Chunk storage zarr-developers/zarr-python#4 from zarr-developers/zarr-specs#3

@jakirkham
Copy link
Member

@joshmoore, I'm afraid I still don't follow. What is gained by using a . instead of a / given the key/value pair is more or less the same?

@joshmoore
Copy link
Member Author

joshmoore commented Jan 29, 2019

@jakirkham: sorry, I'm trying to use / instead of . to be a step closer to the N5 spec. (In general, I also think it's a sane default since it reduces the number of objects in a directory when working with the data locally, e.g. via aws s3 sync)

@alimanfoo
Copy link
Member

It should be pretty straightforward to pull out the functionality to transform chunk keys to use / instead of . into a separate mapping, which you could then layer on top of any store. E.g., you'd then need to do something like:

store = zarr.NestedChunkStore(zarr.DirectoryStore('/path/to/data.zarr'))
# or ...
store = zarr.NestedChunkStore(s3fs.S3Map('my.bucket', ...))

...a bit verbose, but illustrates the idea of composing the mappings.

(...and now I'm wondering if the N5 store being implemented in #309 should be implemented as a transformer layer over any other store, rather than fixed to file system storage. That would allow zarr to access N5 data stored in S3 or GCS, for example, with no extra work.)

@joshmoore
Copy link
Member Author

joshmoore commented Feb 1, 2019

I'll give this a try next week while traveling if no one else has started.

See #395 (comment) (2020-02-11)

@jakirkham
Copy link
Member

FWIW this came up in issue ( https://github.com/zarr-developers/zarr/issues/410 ) as well.

@chrisroat
Copy link

I'd like to understand what the roadmap is here.

I came across this while assuming everything worked with cloud buckets, but painted myself into a corner in using N5Store (for neuroglancer viz), which does not. My first inclination was that the storage.py layer could be adjusted to replace os calls with fsspec-type code. But in reading this and zarr-developers/n5py#9, it looks like something bigger is in the works.

How can I help?

@joshmoore
Copy link
Member Author

Sorry, this got backburner-ed from my side some time ago. Nothing substantial to report.

@joshmoore
Copy link
Member Author

With the dimension_separator epic complete (#707), nested cloud storage is now possible with the FSStore implementation. The default value has not been changed, but can be passed on creation. The setting will be stored in the .zarray metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants