Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v3] Design and implement storage transformer API #1718

Open
jhamman opened this issue Mar 20, 2024 · 0 comments
Open

[v3] Design and implement storage transformer API #1718

jhamman opened this issue Mar 20, 2024 · 0 comments
Labels
V3 Affects the v3 branch
Milestone

Comments

@jhamman
Copy link
Member

jhamman commented Mar 20, 2024

Summary

The V3 specification introduced a new Zarr abstraction -- the storage transformer. Storage transformers modifies a request to read or write data before passing that request to the following transformer or store. They can be sequenced to support a pipeline of operations as shown in the following diagram:

image

The initial implementation of the v3 spec in Zarr-Python did implement a first pass at storage transformers (#1096) but a fresh start is likely needed due to the evolution of the spec and internal design of Zarr-Python.

Will Zarr-Python 3 support any storage transformers? Initially, probably no -- but the intent is to support them eventually, even if only via plug in.

Initial storage transformers

Designing the storage transformer API without any target transformers is probably not a good idea. And in fact, there have been a few proposals spec extensions that would fit well here.

Are there others that have been discussed that this list misses? Is there v3 data in the wild that utilizes storage transformers?

Design

The basic flow of the storage transformers is fairly obvious:

  1. array metadata is decoded to produce a pipeline of transformers (0->N)
  2. when the array goes to fetch data, the keys are transformed by each element of the transformer pipeline then passed through to the store
  3. when the array goes to write data, the key and data are passed through to the transformer pipeline then through to the store

From here, we need to settle on an internal API (e.g. StorageTransformerPipeline) and a position for how new storage transformers will be developed and/or registered with Zarr-Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
V3 Affects the v3 branch
Projects
Status: Todo
Development

No branches or pull requests

1 participant