-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZEP0001 Review #227
Comments
Is the version of zarrita linked above intended to be compliant with the spec linked above? The way that sharding is handled seems slightly different in a way which is confusing to me (although not necessarily anyone else 😅 ). Is the intention that the array metadata specifies the shape of chunks (of which several make up a shard) and the sharding transformer specifies the number of shards within it? So the array would make a request to its transformers of, say, chunk (2, 2); if there were no transformers, that request would go to the store, but if there was a sharding transformer with chunks_per_shard (2,2), then that transformer would make a partial read request to the store for the end of shard (1, 1) to find where the shard's chunk (2, 2) was and then read that? As an aside, does that mean the dumb store without a partial read implementation would do 2 full shard reads (possibly over a network, if it didn't have a caching layer)? I suppose there's no point in sharding if you don't have a partial read implementation. I'm also being thrown off a bit as I think some of the explanatory text in the sharding transformer is out of sync with some other parts of the spec - it refers to the The other thing which isn't clear to me is what codecs ingest and produce - some more explanation here would be useful. The compression codecs seem to go bytes -> bytes, the endianness codec goes value -> value (as it's an optional codec, it can't be the thing which goes bytes -> value), and the transpose codec goes ndarray -> ndarray. How does an implementation know when to interrupt the codec chain to convert bytes to linear values and linear values to an ndarray? The explanation which works in my head is that every codec ingests and produces a tuple of Apologies if I've missed the answers to any of these questions in previous discussions! |
Thanks for your review.
To be clear, this review does not cover sharding --- there will be a separate review for ZEP2. However, the way sharding fits in with the base v3 spec is certainly relevant. The sharding proposal has changed recently --- see the updated version here:
The new sharding specification is defined a bit differently but ultimately works the same in terms of I/O. Yes, sharding was designed specifically for cases where partial I/O is supported. Without partial I/O it would still allow you to decode just a single sub-chunk, but that isn't likely to be particularly useful.
That should be fixed in the updated sharding specification.
This section attempts to explain the process: This PR (which needs to be rebased) tries to clarify the language: But we do need to clarify for each codec whether it is "bytes -> bytes" or "array -> bytes" or "array -> array". As you noted, transpose is "array -> array". Essentially, when writing (i.e. encoding): We can think of the codec as a simple function that maps: (bytestring, element_size) -> (bytestring, element_size); The reason we have to associate an If we are trying to apply a codec that expects a bytestring as input, but we still have an array, then we convert the array to a (bytestring, element_size) pair by using the default binary encoding of the data type (little endian for all currently defined data types), in lexicographical (row major) order,. This is equivalent to inserting The spec only deals with "logical" arrays, that have a logical data type and shape. The physical layout in memory is up to the implementation. In practice I expect implementations to use a representation similar to numpy, that allows any arbitrary memory order, so that Implementations may also wish to use a streaming interface for handling the bytestring input/output of a codec. When reading, we have the additional complication of supporting partial I/O: Logically an array -> array codec receives as input a "lazy array" interface provided by the next codec in the chain, and returns a "lazy array" interface representing the decoded array. The "lazy array" interface should provide operations to read portions of an array, similar to the array interface provided by zarr itself. For example, the lazy array returned by the "transpose" codec would need to translate read requests that it receives into transposed read requests, perform them on the lazy array provided by the next codec in the chain, and then return the result after transposing it. An array -> bytes codec receives as input a "file handle" interface provided by the next codec in the chain (or the store/storage transformer itself), and returns a "lazy array" interface representing the decoded array. For example, the "endian" codec, if it supported partial I/O, could translate a request to read a region of the array into a request for a particular byte range, or set of byte ranges, on the "file handle". Then after receiving the response it can perform the endianness conversion. A bytes -> bytes codec receives as input a "file handle" interface provided by the next codec in the chain, and returns a "file handle" interface representing the decoded data. Codecs don't have to support partial I/O, and in practice most implementations probably won't support partial I/O except for specific codecs like the sharding codec. A codec that does not support partial I/O can simply immediately read the entire array or file handle it receives from the next codec in the chain, and then decode the result into an in-memory byte string or array. Finally, it can return a lazy file handle / lazy array interface that simply accesses to the in-memory byte string or array. |
Thank you for the explanation and for merging the PRs to update the spec! That does make more sense. As you say, sharding does seem to impact the design of the core spec so forgive me if I keep asking questions about it! And also apologies for not getting involved in the discussion earlier.
With the inclusion of sharding, "codec" seems to cover half a dozen different APIs (bytes -> bytes, array -> bytes, array -> array, plus partial variants which in the array case need to cover relatively complex slicing logic), where the partials need to be able to pass through each other's arguments and so on. This is more of a problem in stricter languages (like rust) where those interfaces need to be quite explicit. |
Additionally, for the blosc codec, rather than giving a shuffle mode of -1 and then having to pass that and the item size down the codec chain, could we instead just store the concrete shuffle mode, and rather have clients which add codecs to an array convert a -1 into whatever concrete mode is appropriate? This still means that the codec constructor needs to know the array context, but it's a lot easier to pass it in there than to change the whole codec API to support a serialisation format which doesn't need that flexibility. |
Servers that support HTTP range requests typically allow you to request a suffix with a specified length. So, you could request just the last N bytes from the server. This includes the main cloud storage systems such as S3 and GCS. Since the shard index size can be determined just by looking at the json metadata, you're down to 2 requests to get an inner chunk. |
Probably the array should pass in some description of the region that is being requested, e.g. a bounding box, or perhaps something analogous to a numpy selection tuple or a tensorstore index transform.
Definitely the intention is not to require always downloading the entire shard --- that would defeat the whole purpose of sharding. I'd suggest that your "store" abstraction provide a way to request specific byte ranges, and as part of that byte range request interface, allow you to request the last
Yes, you can think of it as 3 types of codecs --- you have to apply the "array -> array" codecs, then at most one "array -> bytes" codec, then the "bytes -> bytes" codecs. If there is no "array -> bytes" codec specified, then you can use the default one implied by the data type, which is As far as the distinction between all-at-once and partial I/O codecs, you could, for each of the 3 codec types, have one all-at-once interface and one partial I/O interface, and create an adapter for each of the 3 types that implements the partial I/O interface in terms of the all-at-once interface. That way when applying the codecs you only need to worry about the partial I/O interface. Or you might find that you don't have any "bytes -> bytes" codecs for which you want to support partial I/O, and therefore don't need to bother defining a partial I/O interface for "bytes -> bytes" codecs. I think as part of the implementation you will definitely want an in-memory array type that allows the data type, number of dimensions, and memory layout to all be determined at runtime. Using static typing for any of those 3 things isn't going to work very well since an "array -> array" codec might use any intermediate data type. |
Yes, that is indeed what I originally proposed in #225 Perhaps we can continue the discussion in that PR. |
So here the Array class can't treat the codec chain as a black box which reads from the store and returns a chunk's worth of decoded data - the Array needs to know what each of the codecs are so that it can decide whether to pass any region descriptions into any of them, which may be affected by what order they're defined and which other codecs are present. So the Array will end up very tightly bound to codec implementations, including how they combine with each other (e.g. keeping track of offsets between different codecs supporting region descriptions) and how to map from which voxels it needs into which regions it needs from each codec. I guess my concern is that if sharding is going to slot into a generic extension point (as a codec or storage transformer), then zarr implementations need to be ready for any number of extensions which (ab)use the gates opened to allow sharding to fit. If other extensions ought not to use those features, sharding would need to be its own extension point, even if it's really a few existing extension points sharing a trenchcoat. |
It is true that the codec interface is definitely more complicated, but I think it can nonetheless still be an abstraction layer. For example, let's say that our Array class will support reading arbitrary rectangular regions. Then we might define the following interfaces, using Rust pseudo-syntax (probably a lot of details wrong here, as I am not super familiar with Rust, but I think this conveys the idea): trait ArrayReader {
fn read(&self, region: &Region) -> Result<NDArray, Error>;
}
trait ArrayToArrayCodec {
fn decode(&self, source: Box<dyn ArrayReader>) -> Result<Box<impl ArrayReader>, Error>;
fn encode(&self, source: NDArray) -> Result<NDArray, Error>;
}
trait ArrayToBytesCodec {
fn decode(&self, source: Box<dyn BytesReader>) -> Result<Box<impl ArrayReader>, Error>;
fn encode(&self, source: NDArray) -> Result<Vec<u8>, Error>;
}
trait BytesToBytesCodec {
fn decode(&self, source: Box<dyn BytesReader>) -> Result<Box<impl BytesReader>, Error>;
fn encode(&self, source: Vec<u8>) -> Result<Vec<u8>, Error>;
}
trait SimpleArrayToArrayCodec {
fn encode(&self, source: NDArray) -> Result<NDArray, Error>;
fn decode(&self, source: NDArray) -> Result<NDArray, Error>;
}
impl ArrayToArrayCodec for SimpleArrayToArrayCodec {
fn decode(&self, source: Box<dyn ArrayReader>) -> Result<Box<impl ArrayReader>, Error> {
Ok(box SimpleArrayReader(SimpleArrayToArrayCodec::decode(source.read(Region::full())?)))
}
fn encode(&self, source: NDArray) -> Result<NDArray, Error> {
SimpleArrayToArrayCodec::encode(self, source)
}
}
struct SimpleArrayReader(NDArray);
impl ArrayReader for SimpleArrayReader {
fn read(&self, region: &Region) -> Result<NDArray, Error> {
self.0.get_region(region)
}
}
struct TransposeCodec {
permutation: Vec<i32>;
}
struct TransposeCodecArrayReader {
permutation: Vec<i32>;
inv_permutation: Vec<i32>;
source: Box<dyn ArrayReader>;
}
impl ArrayReader for TransposeCodecArrayReader {
fn read(&self, region: &Region) -> Result<NDArray,Error> {
self.source.read(region.transpose(self.inv_permutation)?)?.transpose(self.permutation)?
}
}
impl ArrayToArrayCodec for TransposeCodec {
fn decode(&self, source: Box<dyn ArrayReader>) -> Result<Box<dyn ArrayReader>, Error> {
Ok(TransposeCodecArrayReader{permutation: &self.permtuation, inv_permutation: invert_permutation(&self.permutation)?, source: source})
}
fn encode(&self, source: NDArray) -> Result<NDArray, Error> {
self.source.transpose(&self.permutation)
}
} |
Similar to what @jbms just posted, in zarrita instead of passing arrays or bytes to the codecs, I pass a value handle (like the Bytes/ArrayReader from above) that holds a reference to a memory buffer, file or object and provides a uniform interface for reading and writing. Essentially, this defers the IO operation until all the details are known (e.g. byte ranges from codecs). This allowed me to make the codec abstraction work cleanly (see https://github.com/scalableminds/zarrita/blob/v3/zarrita/array.py#L265-L267). |
First of all it's great to see all this effort going into the V3 specification and I think this is a great improvement over V2. Officially I will vote with |
Hi everyone. I just wanted to re-surface the deadline for the ZEP0001 review, i.e. If you have any questions/issues, feel free to ask them here or create a new issue as mentioned in the description of this issue, CC: @zarr-developers/implementation-council |
I vote YES for tensorstore and Neuroglancer. |
I should note that the above proposed interface to codecs is painfully difficult to implement |
This statement in the V3 spec is a really bad idea:
It completely violates the "read-broadly, write-narrowly" principle. Instead, it should read something like this:
|
The intended purpose of that statement is to allow spec evolution as follows. For example, say we wish to add support for a non-zero origin of the coordinate space, or some other coordinate space transformation, as an optional feature in the future. Implementations that don't support this feature should fail when trying to open an array that uses this feature. With the current restriction that implementations should fail if they encounter an unknown field, we can simply add an If we instead say that unknown fields are ignored, we will need some other way to cause implementations that don't support the feature to fail. There are various options:
Any of these options is potentially valid. The drawback to incrementing Introducing a separate I assume that you are worried about adding optional features that implementations don't need to know about. What do you see as the problem with using |
But the must-understand feature solves this problem, right? If the implementation encounters |
Yes, that is right, except that currently The use of must_understand as a nested member does introduce one other unfortunate limitation: we can't add any new members of object type where the keys are arbitrary user-defined strings, since we would need to exclude "must_understand" as a valid string. |
You are making this way too complex. I believe you should change the default for must-understand to false. |
I definitely agree that the codec interface is pretty complicated if you want to support partial reading, but are there issues specific to C? If we want to allow partial reading and hierarchical chunking such as sharding I think the complexity may be unavoiable. |
Actually the issue isn't C per-se, but rather compiled vs interpreter languages. |
Is that in regards to the fact that there are 3 different types of codecs (array -> array, array -> bytes, and bytes -> bytes)? I haven't yet implemented the spec in tensorstore, but I plan to parse the codecs list into a list of array -> array codecs, an array -> bytes codec, and a list of bytes -> bytes codecs. At that point they can be used in sequence without needing the equivalent of an algebraic sum type. |
This is what I've tried to do in rust; raising an error if they're defined in the wrong order and inserting the default endian codec if it's not there. However, I do start from the algebraic sum type representation because that makes it easier to deserialise.
The storage transformer specification for sharding (i.e. super-chunks and chunks, where the codec implementation is chunks and sub-chunks) seems simpler to me, at least in terms of the additional features it requires all codecs to implement. However, that could easily be because I've spent less time thinking about/ trying to implement it. I appreciate it leaves some possible features on the table, though, based on previous discussions. |
Another question - where is caching expected to happen? Codecs which implement Is the idea that sharding codecs' |
The general idea is that shards are the unit of writing. However, if the underlying storage permits partial writes and the shard file contains enough space to hold the chunks to write, partial writes might be feasible. |
There are multiple ways of solving the issue of IO overload. In zarrita, I am planning to coalesce requests in the storage layer. If multiple byte ranges are requested at the same time (or within a short time window), they are coalesced into one request. I believe fsspec has a similar mechanism. Reading full shards and caching them is another way of solving this issue. Some applications may actually benefit from the increased parallelization of reads against the IO layer. It is up to the implementations to evaluate these tradeoffs and make internal design decisions. |
In general, for caching reads, I'd say you want to cache at the finest granularity at which partial I/O is supported. At the same time, you want to cache the decoded arrays, after performing any non-trivial decoding steps, so that read requests can be satisfied from the cache directly without any further decoding. For example, if we have: {...,
"codecs": [
{"name": "transpose", "order": [2,0,1]},
{"name": "sharding_indexed",
"chunk_shape": [100, 100, 100],
"codecs": [{"name": "transpose", "order": [2, 1, 0]}, {"name": "gzip", "level": 5}]
}
]
} Then you'd want to cache at the level of the individual decoded gzip sub-chunks within each shard, and also cache the shard indices. If the transpose is done virtually with no actual transpose of the in-memory representation, then there is no need to cache the result of the transpose, because it can be handled while reading at zero cost. If transpose is implemented as a physical transpose in memory, then it would be better to cache the decoded inner transpose result rather than the decoded inner gzip result, but supporting partial I/O through the outer transpose would then be problematic. |
Registering the Unidata vote of |
I'll admit I'm still struggling with what seems like a huge step up in complexity required by sharding, passing partial reads through unpredictable codec chains etc., but not enough to block the great work by everyone involved in pushing this over the line. I'll take your collective word that it's feasible and beg your forgiveness if I can't drive a rust implementation to completion any time soon. I am still figuring things out! |
I vote YES for Zarr-Python |
Thanks for the general support! Just for clarification: ZEP 1 and the v3 core spec don't require implementations to support partial reads or writes, and also no sharding. Sharding is being introduced in ZEP 2 and I'd recommend to move larger discussions of it to the respective review in #152. A v3 compatible implementation does not need to support sharding necessarily. (However, it would of course be nice to consider this in the future.) The same holds for other extensions which might come. The focus of v3 is to specifically describe a common denominator (v3 core / ZEP 1), and at the same time allow extensibility, so that each implementation can define the extensions they support, and it's clear for written data which extension it relies on. If you see room for making any of those points more clear (either in the ZEP, or in the spec), or reducing complexity anywhere, please feel free to suggest changes (also after the vote, especially if something comes up during implementation). |
I vote YES for sci-rs/zarr.
I considered abstention because I've not yet implemented changes since the
original V3 draft and share some of @clbarnes concerns (and greatly value
the discussion he's raised here since he has been hands on with the current
spec). However, a settled spec is likely necessary for many implementers
(myself included) to invest the work to build production implementations
and discover how significant these potential issues are in practice.
…On Sat, May 6, 2023, 22:55 Jonathan Striebel ***@***.***> wrote:
I'll admit I'm still struggling with what seems like a huge step up in
complexity required by sharding, passing partial reads through
unpredictable codec chains etc., but not enough to block the great work by
everyone involved in pushing this over the line. I'll take your collective
word that it's feasible and beg your forgiveness if I can't drive a rust
implementation to completion any time soon. I am still figuring things out!
Thanks for the general support! Just for clarification: ZEP 1 and the v3
core spec don't require implementations to support partial reads or writes,
and also no sharding. Sharding is being introduced in ZEP 2
<https://zarr.dev/zeps/draft/ZEP0002.html> and I'd recommend to move
larger discussions of it to the respective review in #152
<#152>.
A v3 compatible implementation does not need to support sharding
necessarily. (However, it would of course be nice to consider this in the
future.) The same holds for other extensions which might come. The focus of
v3 is to specifically describe a common denominator (v3 core / ZEP 1), and
at the same time allow extensibility, so that each implementation can
define the extensions they support, and it's clear for written data which
extension it relies on.
If you see room for making any of those points more clear (either in the
ZEP, or in the spec), or reducing complexity anywhere, please feel free to
suggest changes (also after the vote, especially if something comes up
during implementation).
—
Reply to this email directly, view it on GitHub
<#227 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADQLLWIKJAYID7NNHS2OTDXE3CDJANCNFSM6AAAAAAWVUONLY>
.
You are receiving this because you are on a team that was mentioned.Message
ID: ***@***.***>
|
Thanks a lot for preparing this and taking into account the concerns that were raised also from the Julia (1-based and column-major). I vote YES for JuliaIO/Zarr.jl. |
apologies for casting the vote later than the stated deadline. I vote |
FYI GDAL master has just been updated to latest Zarr V3 spec (sharding not implement, which will be quite tricky to implement due to not being a regular "black box" codec) per OSGeo/gdal#7706 |
Hi all, sincere congratulations for all the work on ZEP1, it's very exciting to be reaching this point. As a member of the ZSC I vote |
|
YES for n5-zarr |
YES (as a member of ZSC) |
YES (as a member of ZSC), very appreciative of everyone's work on this 🙏 |
And with a final Now the exciting implementation phase begins. 👷♀️🤝👷♂️ For the @zarr-developers/implementation-council members and their collaborators, please don't hesitate to raise issues that arise. Clarification and minor adjustment PRs are of course still within scope. By significant issues, further ZEPs may be necessary in order to align all the implementations but with ZEP0001 behind us I imagine that will be a breeze. 😉 |
A bit late, but I vote |
(Thanks, @davidbrochart. And sorry for having overlooked you! 😨) |
Can this issue be closed now? And can the online version of the docs be updated to reflect that the spec as documented by ZEP0001 is no longer in DRAFT status? |
Came here to say this. |
Definitely closing this issue now 🎉 Currently the "draft" watermark functionality is an all-or-nothing kind of thing. @jstriebel was looking into having it on a per page basis. |
Thanks @joshmoore! Two PRs to make this clear: |
Hi everyone,
I hope you’re doing well.
After working continuously for several months, the ZEP0001 - V3 Specification is ready for review by you, the Zarr implementations! 🎉 The goal of this process is to move towards supporting the V3 specification in as many implementation as possible.
There have been significant changes to the V3 since it was proposed in July 2022. We have maintained a GitHub project board to keep track of the progress. We’re pleased to confirm that all crucial issues have been marked completed after holding several ZEP meetings.
Let me review how the process will work. We have created this issue to track the approvals from the ZSC, ZIC and the broader Zarr community.
Specific technical feedback about the specification should be made via narrowly scoped issues on the Zarr specs repo that link to this issue.
Now, according to the section, ‘How does a ZEP becomes accepted’ - ZEP0000, a ZEP must satisfy three conditions for approval:
As an implementation council member, you have three options for your vote:
We request you, the ZIC, review the new version of the Zarr specification and let us know your thoughts. We’ve listed some steps to read and understand the new specification version more quickly. They are as follows:
We understand that the tasks mentioned above take time, accounting for the daily work life of the council members. Therefore, we’d greatly appreciate it if you could cast your vote by no later than
6th May 2023, 23:59:59
AoE.And that’s it! Once we’re past the voting phase, the V3 Specification will be officially adopted!
Feel free to check out an experimental Python implementation of V3 here.
Please let us know if there are any questions. Thank you for your time.
@zarr-developers/implementation-council
The text was updated successfully, but these errors were encountered: