-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consolidate_metadata for tables #12
consolidate_metadata for tables #12
Conversation
consolidate_metadata isn't in the v2 spec and isn't supported by all implementations. I would not suggest we go that route unless absolutely necessary. |
My main thinking here is that I don't want to be on the hook for essentially maintaining a copy of information already encoded by the
Did this come up as a possible backport to v2? It seems quite important to
What's the support like among implementations that are currently used for OME-NGFF? IIRC netcdf/ xarray's zarr format doesn't require something like this, right? I'm wondering if it just didn't come up, or if they just implemented consolidated metadata. |
a08c54d
to
866242d
Compare
OK, so I've removed the This now writes
which is a small change that allows me to at least find the tables, and is consistent with the approach taken with labels. |
866242d
to
d2ce7de
Compare
I agree it's not ideal but the other alternatives I don't see as improvements.
There's zarr-developers/zarr-specs#113 but it's a pretty major undertaking. (I refer to my oft cited metric of 6 months to introduce
Python only as far as I know.
xarray relies heavily on consolidated_metadata. I don't think netcdf does, but they have other extensions in "nczarr" that may be similar. |
@will-moore, responding to your comment: ome/ngff#64 (comment) Your suggestion
# write group names into .zattrs
table_path = os.path.join(group.store.dir_path(), group.path, table_group_name)
for sub_dir in ["layers", "obsm", "obsp", "raw", "uns"]:
sub_path = os.path.join(table_path, sub_dir)
if not os.path.exists(sub_path):
continue
children = [f.name for f in os.scandir(sub_path) if f.is_dir()]
sub_group = table_group[sub_dir]
sub_group.attrs[sub_dir] = children
E.g. obsm/.zattrs now looks like this:
{
"encoding-type": "dict",
"encoding-version": "0.1.0",
"obsm": [
"X_scanorama",
"X_umap",
"spatial"
]
} You could instead do: group: zarr.Group
keys = []
group.visit(lambda x: keys.append(x))
group.attrs["keys"] = keys But this is basically just a worse version of consolidated metadata. I think it would make more sense to just add consolidated metadata to the spec than having to design an alternative. Especially since there aren't that many cases in the anndata specification where you know whether the child nodes are Groups or Arrays.
My read here is that interest has dropped in favor of instead solving it in v3?
It looks like they have an alternative: https://docs.unidata.ucar.edu/nug/current/nczarr_head.html#autotoc_md22 They store an bunch of extra jsons with information like a list of variables. These can be turned off with
I think "relying on the hierarchy provided by zarr" would be an improvement. It's just that for some implementations, with some stores, the hierarchy doesn't work. To me, this reads as an issue with zarr. Perhaps one that needs a temporary work around, but ultimately one which should be fixed in zarr. From this perspective, I would propose minimizing what needs to be done – and making it easy to revert and rely on the zarr hierarchy instead. So maybe just saying tables should store the a consolidated metadata json or list of keys at their root in OME-NGFF? Then removing this once the spec moves up to v3 or v2 gets consolidated metadata, which ever happens first. side notes
|
From the POV of the other implementors, I think this would count as a significant change (and related implementation burden).
I think it's fair to say that it's something that needs to be clarified at the Zarr Spec level, but if there remains a possibility that there's a fileset which is not listable, then the design of NGFF specifications will be forced to deal with that. |
Are the implementors up for this being solved in v3? As a broader question, how common are unlistable stores and why are people hosting data in them? |
I think there will definitely be work on this at the spec level in v3. I don't want to put words in the implementors' mouths though :)
|
Data generated with This includes adding child group names to parent group
Also, an extra step required for viewing data in the browser via zarr.js is to convert
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks @will-moore !
Also, happy new year!
See discussion at https://github.com/ome/ngff/pull/64/files#r1025073730
In order to allow full access to all the
tables
metadata when the ability tols
subdirectories is not available (e.g. over https / s3),consolidate_metadata()
can be used to add metadata to theimage.zarr/tables
group.This would also provide a work-around for the listing of tables within that top-level
tables
group. See https://github.com/ome/ngff/pull/64/files#r1022858025cc @ivirshup @joshmoore