Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implicit vs explicit groups #184

Closed
jbms opened this issue Nov 28, 2022 · 3 comments
Closed

Implicit vs explicit groups #184

jbms opened this issue Nov 28, 2022 · 3 comments
Labels
core-protocol-v3.0 Issue relates to the core protocol version 3.0 spec

Comments

@jbms
Copy link
Contributor

jbms commented Nov 28, 2022

Currently the zarr v3 spec does not require that every group have a metadata document, and instead allows a group to be defined implicitly by the presence of a descendant.

Pros:

  • No need to write an empty metadata document just to have a group.
  • Multiple machines can create distinct arrays in parallel within the same, possibly not-yet-existing group without any coordination or store-level locking/atomic operations to create the group metadata document.

Cons:

  • Possibility of simultaneously creating a group and array with the same name. This could be fixed by allowing arrays to also be groups.
  • If group-level storage transformers are allowed, there is a possible race condition: one machine may attempt to create "a/b" as an array, and another machine may simultaneously create "a" as a group with a storage transformer. We may end up with a corrupted result, where "a/b" does not correctly take into account the storage transformer.

An alternative to implicit groups is to use a separator other than "/", such as ".", if the equivalent of implicit groups are desired.

@jbms
Copy link
Contributor Author

jbms commented Nov 29, 2022

Also, even if racy due to a lack of support for atomic read/modify/write, writing the empty group files from multiple machines concurrently could still be fine; since the contents would be the same from all writers, it doesn't matter which writer wins.

@jstriebel jstriebel added the core-protocol-v3.0 Issue relates to the core protocol version 3.0 spec label Nov 29, 2022
@jstriebel jstriebel added this to ZEP1 Nov 29, 2022
@jstriebel jstriebel moved this to In Discussion in ZEP1 Nov 29, 2022
@jstriebel
Copy link
Member

  • Possibility of simultaneously creating a group and array with the same name. This could be fixed by allowing arrays to also be groups.

Since the names of the metadata documents will be both zarr.json for arrays and groups as of #200 this case seems to be eliminated.

  • If group-level storage transformers are allowed, there is a possible race condition: one machine may attempt to create "a/b" as an array, and another machine may simultaneously create "a" as a group with a storage transformer. We may end up with a corrupted result, where "a/b" does not correctly take into account the storage transformer.

True, but I don't see how this can be circumvented. Even if implicit groups use a different delimiter than /, both could be created simultaneously, and one of both would win, the other would probably not exist. I don't see a way around this race condition.

@jbms Do you think there is something to do as part of the v3 core spec? Can we close this?

@jbms
Copy link
Contributor Author

jbms commented Feb 9, 2023

Yes we can close this.

@github-project-automation github-project-automation bot moved this from In Discussion to Done in ZEP1 Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-protocol-v3.0 Issue relates to the core protocol version 3.0 spec
Projects
Status: Done
Development

No branches or pull requests

2 participants