-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
endian codec always required? #8
Comments
After receiving feedback on ZEP1 from implementors, @jbms proposed to make one array->bytes codec mandatory zarr-developers/zarr-specs#249. I implemented that change in zarrita. It is true that the endian codec doesn't make sense for 1-byte data types, but it is the only array->bytes codec at the moment. Looking forward to seeing v3 in zarrita.js! And, don't forget to vote on ZEP2 :-) |
It is true that it is a bit unfortunate that for 1-byte types like bool and int8 the codec is still called endian and requires that an endianness be specified even though it does not have an effect. Arguably it would be better to have a different codec that is used for 1-byte types, e.g. "byte" or "raw", that behaves the same as "endian" but is only valid for 1-byte types and does not have any configuration options. |
I don't think it is a good idea to have 2 codecs that essentially do the same. I would rather rename the "endian" codec into some thing else (maybe "byte" or "raw" or "continuous") and make the "endian" configuration only required for data types with > 1 byte. |
Perhaps an "data_type" codec that is an EDIT: Ah yes, similar in spirit to @normanrz idea, which I think I like better since endianess is related to the interpretation of the bytes, and maybe a leaky abstraction at the "array" level. |
# bytes -> array
arr = np.array(bytes, dtype="int16", shape=shape)
# array -> array
arr.dtype.byteorder = "<" if config["endian"] === "little" else ">" EDIT: oops, |
I think given how arrays and data types are specified for zarr v3, as abstract types rather than specific byte representations in memory, that the suggestion from @normanrz would work better. "raw" isn't very descriptive for bool but that is not too big of a deal. |
We could call it "bytes". |
I think |
Thanks for all your work on zarr v3. I am using this project to generate fixtures for zarrita.js and ensure it is compatible moving forward.
I noticed that the endian codec is always used, even for single byte datatypes that do no have endianness.
I guess right now the endian codec is the pass through to generate the array from bytes, but I think the v3 metadata for this array should ideally be an empty list of codecs? The endian codec only makes sense for data types > 1 byte.
The text was updated successfully, but these errors were encountered: