NCZarr - Netcdf Support for Zarr #41

DennisHeimbigner · 2019-01-12T21:13:10Z

I am moving the conversation about NCZarr to its own issue. See Issue https://github.com/zarr-developers/zarr/issues/317 for initial part of this discussion.

DennisHeimbigner · 2019-01-12T21:26:02Z

Naming issue:
I have about convinced myself that rather than creating KVP level objects
like .zdimensions, I should just use the existing Zarr attribute mechanism.
In order to do this, it is necessary to setup some naming conventions for such
attributes. Basically, we need to identify that an attribute is special (and probably
hidden) and for which extension(s) it applies.
For NCZarr, let me propose this:

All such attributes start with two underscores
Next is a 2-4 character tag specific to the extension: "NCZ" for NCZarr.
another underscore
the rest of the attribute name.

So, we might have "__NCZ_dimensions" instead of .zdimensions.

jakirkham · 2019-01-12T22:18:59Z

Thanks for opening this @DennisHeimbigner.

Encountered issue ( zarr-developers/zarr-python#280 ) again recently. So figured that might interest you given some of this discussion about how to manage specs. Though each issue has its own place I think.

If we do go down the attribute road, agree that having some non-conflicting name convention is important. The other option might be to widen the spec of things like .zarray to allow specs subclassing Zarr's spec to add additional relevant content here as others have mentioned. A third option similar to what you have done would be to add something like .zsubspec, which users can fill as needed. We might need certain keys in there like subspec name, subspec version, etc., but otherwise leave it to users to fill these out as needed.

alimanfoo · 2019-01-14T10:18:19Z

Thanks @DennisHeimbigner.

Just to add that, on the question of whether to pack everything into attributes (.zattrs) or whether to store metadata separately under other store-level keys (.zdims, .ztypdefs, etc.), I think both are reasonable and personally I have no objection to either.

I lean slightly towards using attributes (.zattrs) because it plays nicely with some existing API features. E.g., the NCZ metadata can be accessed directly via the attributes API. And, e.g., the NCZ metadata would get included if using consolidated metadata, which is an experimental approach to optimising cloud access, available in the next release of Zarr Python. But neither of these are blockers to the alternative approach, because it is straightforward to read and decode JSON objects directly from a store, and it would also be straightforward to modify the consolidated metadata code to include other objects.

DennisHeimbigner · 2019-01-14T18:03:26Z

We have learned from the existing netcdf-4 that datasets exist with
very large (~14mb) metadata.
I was looking at the Amazon S3 query capabilities and they are extremely limited.
So the idea of consolidated metadata seems like a very good idea.
This reference:
https://zarr.readthedocs.io/en/latest/tutorial.html#consolidating-metadata
does not provide any details of the form of the proposed consolidated metadata.
Note that there may not be any point in storing all of the metadata, especially
if lazy reading of metadata is being used (as it is in the netcdf-4 over hdf5
implementation).
Rather I think that what is needed is just a skeleton so that query is never needed:
we would consolidate the names and kinds (group, variable, dimension, etc)
and leave out e.g. attributes and variable types and shapes.

DennisHeimbigner · 2019-01-14T18:58:08Z

here is a proposed consolicated metadata structure for NCZarr.
It would be overkill for standard Zarr, which is simpler.
Sorry if it is a bit opaque since it is a partial Antlr grammar.
nczmetadata.txt

alimanfoo · 2019-01-14T23:15:51Z

We have learned from the existing netcdf-4 that datasets exist with very large (~14gb) metadata.

Wow, that's big. I think anything near that size will be very sub-optimal in zarr, because of metadata being stored as uncompressed JSON documents. I wonder if in cases like that, it might be necessary to examine what is being stored as metadata, and if any largish arrays are included then consider storing them as arrays rather than as attributes.

I was looking at the Amazon S3 query capabilities and they are extremely limited. So the idea of consolidated metadata seems like a very good idea. This reference: https://zarr.readthedocs.io/en/latest/tutorial.html#consolidating-metadata does not provide any details of the form of the proposed consolidated metadata.

Apologies the format is not documented as yet. There's an example here: zarr-developers/zarr-python#268 (comment)

DennisHeimbigner · 2019-01-14T23:24:24Z

That was a typo. The correct size is 14 mb.

alimanfoo · 2019-01-14T23:50:20Z

That was a typo. The correct size is 14 mb.

Ah, OK! Although 14MB is still pretty big, it's probably not unmanageable.

DennisHeimbigner · 2019-01-14T23:55:29Z

Depends on what manageable means, I suppose. We have situations where
projects are trying to load a small part of the metadata from thousands of files
each of which has the amount of metadata. Needless to say, this is currently
very slow. We are trying various kinds of optimizations around lazy loading
of metadata but the limiting factor will be HDF5. A similar situation
is eventually going to occur here, so thinking about various optimizations
is important.

alimanfoo · 2019-01-15T09:49:05Z

Depends on what manageable means, I suppose. We have situations where projects are trying to load a small part of the metadata from thousands of files each of which has the amount of metadata. Needless to say, this is currently very slow. We are trying various kinds of optimizations around lazy loading of metadata but the limiting factor will be HDF5. A similar situation is eventually going to occur here, so thinking about various optimizations is important.

That's helpful to know. FWIW the consolidated metadata feature currently in zarr python was developed for the xarray use case, where the need (as I understand it) is to load *all* metadata up front. So that feature combines the content from all .zarray, .zgroup and .zattrs objects from the entire group and dataset hierarchy into a single object, which can then be read from object storage in a single HTTP request. If you have use cases where you have a large amount of metadata but only need to read parts of it at a time, that obviously might not be optimal. However, 14MB is not an unreasonable amount to load from object storage, would probably be fine to do interactively (IIRC bandwidth to object storage from compute nodes within the same cloud is usually ~100MB/s). I'm sure there would be other approaches that could be taken too that could support partial/lazy loading of metadata. Happy to discuss at any point.

jakirkham · 2019-01-16T04:25:08Z

Are you able to provide data on where most of the time is being spent, @DennisHeimbigner?

DennisHeimbigner · 2019-01-16T17:50:37Z

Issue: Attribute Typing
I forgot to address one important difference between the netcdf-4 model and Zarr:
attribute typing. In netcdf-4, attributes have a defined type. In Zarr, attributes are
technically untyped, although in some case it is possible to infer a type from the value
of the attribute.

This is most important with respect to the _FillValue attribute for a variable.
There is an implied constraint (in netcdf-4 anyway) that the type of the attribute
must be the same as the type of the corresponding variable. There is no way to
guarantee this for Zarr except by doing inferencing.,

Additionally, if the variable is of a structured type, there is currently no standardized
way to define the fill value for such a type nor is there a way to use structured types
with other, non-fillvalue, attributes.

Sadly, this means that NCZarr must add yet another attribute that specifies
the types of other attributes associated with a group or variable.

alimanfoo · 2019-01-21T16:46:23Z

Hi @DennisHeimbigner,

Regarding the fill value specifically, the standard metadata for a zarr array includes a fill_value key. There are also rules about how to encode fill values to deal with values that do not have a natural representation in JSON. This includes fill values for arrays with a structured dtype. If possible, I would suggest to use this feature of standard array metadata, rather than adding a separate _FillValue attribute. If not, please do let us know what's missing, that would be an important piece of information to carry forward when considering spec changes.

Regarding attributes in general, we haven't tried to standardise any method to encode values that do not have a natural JSON representation. Currently it is left to the application developer to decide their own method for encoding and decoding values as JSON, e.g., I believe xarray has some logic for encoding values in zarr attributes. There has also been some discussion of this at #354 and #156.

Ultimately it would be good to standardise some conventions (or at least define some best practices) for representing various common value types in JSON, such as typed arrays. I'm more than happy for the community to lead on that.

DennisHeimbigner · 2019-01-21T18:34:01Z

This reference -- https://zarr.readthedocs.io/en/stable/spec/v2.html#fill-value-encoding --
does not appear to address fill values for structured types. Did you get the reference wrong?

alimanfoo · 2019-01-21T21:11:47Z

If an array has a fixed length byte string data type (e.g., "|S12"), or a structured data type, and if the fill value is not null, then the fill value MUST be encoded as an ASCII string using the standard Base64 alphabet.

I.e., use base 64 encoding.

DennisHeimbigner · 2019-01-21T22:20:28Z

So it would be nice if we had a defined language-independent algorithm
that defines how to construct the fill value for all possible struct types
(including recursion for nested structs). This should be pretty straightforward/
Also, why force a string (base64) encoding? Why not make the fill value
be just another Json structure?
It worries me how python-specific is much of the spec around types.

alimanfoo · 2019-01-31T14:24:29Z

So it would be nice if we had a defined language-independent algorithm
that defines how to construct the fill value for all possible struct types
(including recursion for nested structs). This should be pretty straightforward

That would be good. I believe numpy mimics C structs, further info here.

Looking again at the numpy docs, there is support for an align keyword when constructing a structured dtype, which changes the itemsize and memory layout. This hasn't been accounted for in the zarr spec, I suspect that things are currently broken if someone specifies align=True (default is False).

Also, why force a string (base64) encoding? Why not make the fill value
be just another Json structure?

That's a nice idea, would fit with the design principle that metadata are human-readable/editable.

It worries me how python-specific is much of the spec around types.

The zarr spec does currently defer to numpy as much as possible, assuming that much of the hard thinking around things like types has been done there already.

If there are clarifications that we could make to the v2 spec that would help people develop compatible implementations in other languages then I'd welcome suggestions.

Thinking further ahead to the next iteration on the spec, it would obviously be good to be as platform-agnostic as possible, however it would also be good to build on existing work rather than do any reinvention. The work on ndtypes may be relevant/helpful there.

alimanfoo · 2020-05-21T07:36:06Z

Surfacing here notes on the NetCDF NCZarr implementation, thanks @DennisHeimbigner for sharing.

alimanfoo · 2020-05-21T07:38:22Z

Also relevant here, documentation of xarray zarr encoding conventions, thanks @rabernat.

rsignell-usgs · 2021-05-05T15:12:01Z

@DennisHeimbigner: It looks like Unidata's Netcdf C library can now read data with the xarray zarr encoding conventions, right?

@rabernat, should I raise an issue for xarray to also support the Unidata NcZarr conventions?

WardF · 2021-05-05T21:21:04Z

The ability to read xarray is in the main branch, and will be in the upcoming 4.8.1 release. I am shaving the yak to get our automated regression and integration test infrastructure back up and running but we hope to have 4.8.1 out shortly.

rabernat · 2021-05-05T21:39:08Z

@rabernat, should I raise an issue for xarray to also support the Unidata NcZarr conventions?

I see this as very difficult. The reason is that the ncZarr conventions use files outside of the zarr hierarchy. We would probably need to implement a compatibility layer as a third-party package, similar to h5netcdf.

p.s. but yes, please open an xarray issue to keep track of it.

shoyer · 2021-05-05T21:46:26Z

One thing I'll note on Xarray's convention for the Zarr is that we will likely change things in the near future to always write and expect "consolidated metadata" (see pydata/xarray#5251). This is almost completely backwards compatible, but if NcZarr writes these consolidated metadata fields in Xarray compat mode we could load these Zarr stores much quicker in Xarray.

Consolidated metadata would probably be a nice feature for NcZarr, too, because it reduces the number of files that need to be queried for metadata down to only one. I think there was a similar intent behind the .nczgroup JSON field. Consolidated metadata is sort of a super-charged version of that.

DennisHeimbigner · 2021-05-06T02:15:08Z

NCZarr get similar improvement by doing lazy read of metadata objects. That is one problem with _ARRAY_DIMENSIONS -- it requires us to read all attributes even if otherwise unneeded. NCZarr avoids this by keeping the dimension names separate.
As for consolidated metadata, I assume you are NOT saying that any pure zarr container that does not contain the consolidated metadata will be unreadable by Xarray.

shoyer · 2021-05-06T02:46:06Z

NCZarr get similar improvement by doing lazy read of metadata objects. That is one problem with _ARRAY_DIMENSIONS -- it requires us to read all attributes even if otherwise unneeded. NCZarr avoids this by keeping the dimension names separate.

In Xarray, we have to read nearly all the metadata eagerly to instantiate xarray.Dataset objects.

As for consolidated metadata, I assume you are NOT saying that any pure zarr container that does not contain the consolidated metadata will be unreadable by Xarray.

This is correct, you don't need to write consolidated metadata. But if you do, Xarray will be able to read the data much faster.

As for whether netCDF users would notice a difference with consolidated metadata, I guess it would depend on their use-cases. Lazy metadata reads are great, but for sure it is faster to download a single small file than to download multiple files in a way that cannot be fully parallelized, even if they add up to the same total size.

DennisHeimbigner · 2021-05-06T03:09:32Z

faster to download a single small file than to download multiple files

true, but we have use cases where the client code is walking a large set of netcdf files and reading a few pieces of information out of each of them and where the total metadata is large (14 megabytes). This can occur when one has a large collection of netcdf files covering some time period and each netcdf file is a time slice (or slices).
Perhaps Rich Signell would like to comment with his experience.

joshmoore · 2021-05-06T14:48:09Z

#41 (comment) I see this as very difficult. The reason is that the ncZarr conventions use files outside of the zarr hierarchy. We would probably need to implement a compatibility layer as a third-party package, similar to h5netcdf.

For what it's worth, I could see making some movement (June-ish?) on #112 (comment) to permit the additional files. But either way, certainly ome/ngff#46 (review) (related issue) would suggest hammering out a plan for this difference before another package introduces a convention.

#41 (comment) One thing I'll note on Xarray's convention for the Zarr is that we will likely change things in the near future to always write and expect "consolidated metadata" (see pydata/xarray#5251). This is almost completely backwards compatible, but if NcZarr writes these consolidated metadata fields in Xarray compat mode we could load these Zarr stores much quicker in Xarray.

Having gone through pydata/xarray#5251 I'm slightly less worried about this then when I first read it (assuming it mean it would only support consolidated metadata), but having just spent close to 2 months trying to get dimension_separator "standardized", I'd like to raise a flag that consolidated metadata is a similar gray area. It'd be nice to get it nailed down.

rsignell-usgs · 2021-05-06T15:32:35Z

@DennisHeimbigner, just a quick comment that I too always use consolidated metadata when writing Zarr. Here's a recent example with coastal ocean model output we are publishing, where consolidated metadata is an order of magnitude faster to open:

DennisHeimbigner · 2021-05-06T18:39:15Z

Note that the issue for me is: for what use-cases is lazy metadata download better than consolidated metadata. The latter is better in the cases where you know that you need to access almost all of the meta-data or where the total size of the metadata is below some (currently unknown) size threshold. My speculation is that the access patterns vary all over the place and are highly domain dependent. I infer that Rich's use case is one where all the metadata is going to be accessed.

In any case, once .zmetadata is well-defined (see Josh's previous comment) I will be adding it to nczarr. However, we will probably give the user the choice to use it or not if lazy download makes more sense for their use-case.

On the other side, it seems to me that zarr-python might profitably explore lazy download of the metadata.

shoyer · 2021-08-01T19:31:28Z

I agree, I think any reasonable implementation should *ignore* unrecognized keys or files. Hopefully this will be codified in zarr v3.

…

On Sun, Aug 1, 2021 at 11:30 AM Dennis Heimbigner ***@***.***> wrote: There was a long conversation about this on the Zarr spec github. I pointed out that the new "dimension_separator" key violated this constraint. The consensus seemed to be that extra keys would be allowed, but must be ignored if they are not recognized by the implementation. I have not checked to see if that change has made it into the spec yet. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#41 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJJFVXS6D3GDCMQPJULL4DT2WHDLANCNFSM4H5L7P7A> .

rouault · 2021-08-01T19:38:10Z

The consensus seemed to be that extra keys would be allowed, but must be ignored if they are not recognized by the implementation.

ok, thanks for the clarification

joshmoore · 2021-08-03T08:18:30Z

for reference: zarr-developers/zarr-python#715 (comment)

That didn't make it into a zarr-specs issue (neither v2 nor v3) as far as I can tell. Anyone up for shepherding that?

joshmoore · 2022-03-24T00:40:13Z

See the related conversation in pydata/xarray#6374 ("Should the [xarray-]zarr backend support NCZarr conventions?")

halehawk · 2022-04-08T20:36:14Z

@DennisHeimbigner does NCZarr support any filter now?

DennisHeimbigner · 2022-04-08T20:45:50Z

Yes although there are some complications because the code uses HDF5 filters to perform
the actual filter code, but it needs extra code to convert a Zarr codec JSON format
to the HDF5 unsigned integer parameters.
What specific filter(s) do you need?

halehawk · 2022-04-08T20:51:00Z

@DennisHeimbigner Do you have documentations about how to enable and use the filter through NCZarr, and we have a new codec which is not binding to any codec yet. Do you have suggestion on how to enable it in your NCZarr?

shaomeng · 2022-04-08T21:28:41Z

@DennisHeimbigner @halehawk Maybe I should jump in now ;)

I have a lossy compressor product (SPERR: https://github.com/shaomeng/SPERR) that I'm looking at paths to integrate into the Zarr format. I haven't spent too much time on it, but my understanding is that I need to make it a Zarr filter. Our immediate application of it, an ASD run of MURam, has decided to use NCZarr to output Zarr files, so the question arises that if Zarr filters are supported by NCZarr.

I guess the most direct question to @DennisHeimbigner as the NCZarr developer is, what approach do you recommend to integrate a lossy compressor to an NCZarr output?

DennisHeimbigner · 2022-04-08T21:56:15Z

If the compressor is (or easily could be) written in python, then see the NumCodecs web page.
If the compressor is in C or C++, and you decide to use netcdf-c NCZarr, then you need
to build an HDF5 compressor wrapper plus the corresponding codecs API.
I have attached the relevant documentation. If this compressor is similar to some existing
compressor such as bzip2 or zstandard, then you can copy and modify the corresponding
wrapper in netcdf-c/plugins directory -- H5Zzstd.c, for example.
filters.md

shaomeng · 2022-04-08T22:37:13Z

That's super helpful, thanks for the pointer! One more question: can the compression-enabled NCZarr output read by zarr tools in the python ecosystem?

…

On Fri, Apr 8, 2022, 3:56 PM Dennis Heimbigner ***@***.***> wrote: If the compressor is (or easily could be) written in python, then see the NumCodecs web page. If the compressor is in C or C++, and you decide to use netcdf-c NCZarr, then you need to build an HDF5 compressor wrapper plus the corresponding codecs API. I have attached the relevant documentation. If this compressor is similar to some existing compressor such as bzip2 or zstandard, then you can copy and modify the corresponding wrapper in netcdf-c/plugins directory -- H5Zzstd.c, for example. filters.md <https://github.com/zarr-developers/zarr-specs/files/8455495/filters.md> — Reply to this email directly, view it on GitHub <#41 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGG6JNSZGN6IUUZAPCW6ODVECTQTANCNFSM4H5L7P7A> . You are receiving this because you commented.Message ID: ***@***.***>

DennisHeimbigner · 2022-04-08T23:03:23Z

Yes, IF the filters are available in NumCodecs.

halehawk · 2022-04-09T01:14:14Z

Does it mean the compressor would be better to be integrated in numcodecs if it wants to be used by nczarr and zarr/xarray? Haiying

…

Sent from my iPhone

On Apr 8, 2022, at 5:03 PM, Dennis Heimbigner ***@***.***> wrote: Yes, IF the filters are available in NumCodecs. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

DennisHeimbigner · 2022-04-09T02:09:12Z

Sorry, I wasn't clear.
Suppose you use nczarr to write a Zarr file where some of its arrays apply a filter.
Then you can obviously read that file with nczarr.
However, suppose you write the array with nczarr and then want
others to read it using python-zarr. In that case, you will need to create
a NumCodecs compatible version of your filter written in python so that
the python-zarr users can read the data written by nczarr.

shaomeng · 2022-04-12T17:38:16Z

Hi @DennisHeimbigner , there is some confusion our team has and we would love to know if you can comment on it.

The confusion is that do we even need to make a HDF5 filter for SPERR compressor? Because NCZarr supports NumCodecs filters, isn't it the case that once we make a NumCodecs filter for SPERR, both NCZarr and Python-Zarr can read and write SPERR compressed zarr files? More generally, are there any advantages/disadvantages to produce a HDF5 filter for SPERR, if all we want is SPERR-compressed Zarr files?

DennisHeimbigner · 2022-04-12T18:20:02Z

There are two pieces here and I am sorry I was unclear.
The first piece is the declaration of the compressor for a variable
in the Zarr metadata, This is specified in the "compressor" key for
the .zarray metadata object for the variable. The format for this
is defined by NumCodecs and generally has the form

{"id": "<compressor name>", "parameter1": <value>, ... "parametern": <value>}

So for zstd, we might have this: {"id": "zstd", "level:": 5}

The second part is the actual code that implements the compressort.

NCZarr supports the first part so that it can read/write legal Zarr metadata.
BUT, NCZarr requires its filter code to be written in C (or C++).
More specifically, it does not support the Python compressor code implementations.
Sorry for the confusion

halehawk · 2022-04-12T19:06:23Z

@dennis Heimbigner ***@***.***> I still need you to clarify something here. So I looked at your H5Zzstd.c which is a HDF5 plugin for zstd and supports numcodec zstd read/write. Then I got this idea, if Samuel's new compressor need not get a formal HDF5 filter ID, but should add a similar H5Zsperr.c to your netcdf-c if he need not write to HDF5/netcdf file now. But he needs to be in numcodecs to get Zarr/numcodecs support.

…

On Tue, Apr 12, 2022 at 12:20 PM Dennis Heimbigner ***@***.***> wrote: There are two pieces here and I am sorry I was unclear. The first piece is the declaration of the compressor for a variable in the Zarr metadata, This is specified in the "compressor" key for the .zarray metadata object for the variable. The format for this is defined by NumCodecs and generally has the form {"id": "<compressor name>", "parameter1": <value>, ... "parametern": <value>} So for zstd, we might have this: {"id": "zstd", "level:": 5} The second part is the actual code that implements the compressort. NCZarr supports the first part so that it can read/write legal Zarr metadata. BUT, NCZarr requires its filter code to be written in C (or C++). More specifically, it does not support the Python compressor code implementations. Sorry for the confusion — Reply to this email directly, view it on GitHub <#41 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACAPEFEAYJPY2GZOFJWM6ODVEW5F5ANCNFSM4H5L7P7A> . You are receiving this because you were mentioned.Message ID: ***@***.***>

shaomeng · 2022-04-12T19:10:15Z

NCZarr requires its filter code to be written in C (or C++).

Just to clarify, did you mean that NCZarr requires its filter code to be in C AND also exposed to NCZarr as HDF5 filters? E.g., NumCodecs filters won't work.

Sorry for the back and forth in this github thread. I think this is my last try and if there's still confusion, I'll try to set up a meeting and resolve it more directly :)

jbms · 2022-04-12T19:13:19Z

I don't know the details of how codecs are defined for NCZarr, but in general you will need to provide a separate implementation of a codec for each zarr implementation that you want to support it.

Zarr-python provides a mechanism by which codecs can be registered --- numcodecs defines many codecs, and zarr-python pulls in numcodecs as a dependency, but it is actually possible to define a codec for zarr-python outside of the numcodecs package --- see for example the imagecodecs Python package.

DennisHeimbigner · 2022-04-12T20:19:58Z

@dennis Heimbigner @.***> I still need you to clarify something here. So I looked at your H5Zzstd.c which is a HDF5 plugin for zstd and supports numcodec zstd read/write. Then I got this idea, if Samuel's new compressor need not get a formal HDF5 filter ID, but should add a similar H5Zsperr.c

That is correct. THe HDF group reserves ids 32768 to 65535 for unregistered use.
So Samuel can pick a number in that range for his filter; later, if desired, a forml
HDF5 filter id can be assigned.

joshmoore · 2022-04-13T08:00:21Z

First a big 💯 for the discussion, since this is exactly what we want to see happening for cross-implementation support of codecs. @shaomeng & @halehawk, don't hesitate to keep asking.

I do wonder, @DennisHeimbigner, if we don't want to establish the channel you'd like for more nczarr questions. If so, I'd say we update the start and end of this thread with that URL and close this issue.

Others may want to express an opinion, but if it's useful, we can have a no-code location like github.com/zarr-developers/nczarr for people to find a README pointing to the netcdf-c implementation's resources.

cc: @WardF

WardF · 2022-06-01T17:08:20Z

Sorry for the late comment on this; I would agree that maybe a 'Github Discussions' post would be a better place for this, instead of the issue we are working within. We can create that over at the netcdf-c repository, or we could create one here in the appropriate zarr-* repositories. There are arguments to be made for either, so I am happy to go with what makes the most sense for the broader group :).

briannapagan · 2023-04-26T15:50:51Z

21-050r1_Zarr_Community_Standard.pdf
Adding this here for reference in the convo.

dblodgett-usgs · 2023-04-26T19:53:32Z

Pertinent text from @briannapagan's link above...

Beginning with NetCDF-C version 4.8.0, Unidata introduced experimental Zarr support
into the NetCDF-C library. This was accomplished via creating a new specification -
NCZarr - which is “similar to, but not identical with the Zarr Version 2 Specification.”
Specifically, NCZarr adds two additional metadata files (“.nczarray" and ".nczattr”),
which are not part of the Zarr V2 Spec. Since NCZarr stores are not fully compatible and
interoperable with Zarr V2, this community standard excludes NCZarr. Work is ongoing
to reconcile NCZarr and the architectural reasons that motivated its development with the
forthcoming Zarr V3 Specification.
Fortunately, the NetCDF-C library also supports reading / writing of data using the
simpler Named Dimension convention described in 4.1.

DennisHeimbigner · 2023-04-26T20:00:28Z

That information is out-of-date in a couple of ways.

the metadata files (“.nczarray" and ".nczattr”) are no longer used; they were replaced with special dictionary entries.
I believe the spec was changed to specify that unrecognized elements (objects and dictionary entries) should be ignored by any implementation that does not recognize them.
With License? #2 in effect, nczarr created files can be read by pure zarr implementations and nczarr can read pure zarr files.

dblodgett-usgs · 2023-04-26T20:12:53Z

Thanks for calling that out, @DennisHeimbigner. This came out of a conversation over here. zarr-developers/geozarr-spec#22

There are very few people who have a deep enough understanding of the moving parts here to answer all the questions. It's good to hear that we basically have interoperability.

Two questions:

Do you feel like we even need to worry about the distinction right now?
Is there a current document we should be using to learn about the nuances between "pure zarr" and "nczarr"?

…#751) This came up in: zarr-developers/zarr-specs#41 (comment)

alimanfoo transferred this issue from zarr-developers/zarr-python Jul 3, 2019

This was referenced Mar 17, 2022

Should the zarr backend support NCZarr conventions? pydata/xarray#6374

Closed

Incompatibility with zarr data created by xarray Unidata/netcdf-c#2252

Closed

dblodgett-usgs mentioned this issue Apr 26, 2023

Compatibility with NCZarr. zarr-developers/geozarr-spec#22

Closed

christophenoel mentioned this issue Jan 25, 2024

Specifying the Organizational Structure of GeoZarr zarr-developers/geozarr-spec#34

Open

enthusiastdev121 added a commit to enthusiastdev121/zarr-python that referenced this issue Aug 19, 2024

[spec] Clarify that arbitrary key/value pairs are OK for attributes (…

e3d317b

…#751) This came up in: zarr-developers/zarr-specs#41 (comment)

NCZarr - Netcdf Support for Zarr #41

NCZarr - Netcdf Support for Zarr #41

Comments

DennisHeimbigner commented Jan 12, 2019

DennisHeimbigner commented Jan 12, 2019

jakirkham commented Jan 12, 2019

alimanfoo commented Jan 14, 2019

DennisHeimbigner commented Jan 14, 2019 • edited Loading

DennisHeimbigner commented Jan 14, 2019

alimanfoo commented Jan 14, 2019 via email

DennisHeimbigner commented Jan 14, 2019

alimanfoo commented Jan 14, 2019

DennisHeimbigner commented Jan 14, 2019

alimanfoo commented Jan 15, 2019 via email

jakirkham commented Jan 16, 2019

DennisHeimbigner commented Jan 16, 2019

alimanfoo commented Jan 21, 2019

DennisHeimbigner commented Jan 21, 2019

alimanfoo commented Jan 21, 2019

DennisHeimbigner commented Jan 21, 2019

alimanfoo commented Jan 31, 2019

alimanfoo commented May 21, 2020

alimanfoo commented May 21, 2020

rsignell-usgs commented May 5, 2021 • edited Loading

WardF commented May 5, 2021

rabernat commented May 5, 2021 • edited Loading

shoyer commented May 5, 2021

DennisHeimbigner commented May 6, 2021

shoyer commented May 6, 2021

DennisHeimbigner commented May 6, 2021

joshmoore commented May 6, 2021 • edited Loading

rsignell-usgs commented May 6, 2021 • edited Loading

DennisHeimbigner commented May 6, 2021

shoyer commented Aug 1, 2021 via email

rouault commented Aug 1, 2021

joshmoore commented Aug 3, 2021

joshmoore commented Mar 24, 2022 • edited Loading

halehawk commented Apr 8, 2022

DennisHeimbigner commented Apr 8, 2022

halehawk commented Apr 8, 2022

shaomeng commented Apr 8, 2022

DennisHeimbigner commented Apr 8, 2022

shaomeng commented Apr 8, 2022 via email

DennisHeimbigner commented Apr 8, 2022

halehawk commented Apr 9, 2022 via email

DennisHeimbigner commented Apr 9, 2022

shaomeng commented Apr 12, 2022

DennisHeimbigner commented Apr 12, 2022

halehawk commented Apr 12, 2022 via email

shaomeng commented Apr 12, 2022

jbms commented Apr 12, 2022

DennisHeimbigner commented Apr 12, 2022

joshmoore commented Apr 13, 2022

WardF commented Jun 1, 2022

briannapagan commented Apr 26, 2023

dblodgett-usgs commented Apr 26, 2023

DennisHeimbigner commented Apr 26, 2023

dblodgett-usgs commented Apr 26, 2023

DennisHeimbigner commented Jan 14, 2019 •

edited

Loading

rsignell-usgs commented May 5, 2021 •

edited

Loading

rabernat commented May 5, 2021 •

edited

Loading

joshmoore commented May 6, 2021 •

edited

Loading

rsignell-usgs commented May 6, 2021 •

edited

Loading

joshmoore commented Mar 24, 2022 •

edited

Loading