Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental chunk manifest support #79

Closed
wants to merge 3 commits into from
Closed

Experimental chunk manifest support #79

wants to merge 3 commits into from

Conversation

LDeakin
Copy link
Owner

@LDeakin LDeakin commented Sep 22, 2024

Work in progress to enable stores that reference bytes created outside Zarr (via virtualizarr)

Notes

  1. python3 virtualizarr_gen.py
  2. Apply corrections for non-conformant zarr.json metadata
  3. cargo test chunk_manifest
  • Chunk manifests with URIs would make this much more complicated
    • URI parsing and store registration
    • Store caching
    • Auth?
    • ...
  • Should absolute paths / relative paths outside of the store be permitted?

TODO

virtualizarr

manifest.json

  • virtualizarr should write paths relative to the hierarchy root. Currently they are relative to the working directory.
    • Or is there a more general way of handling this (via metadata changes)? The store must be opened at the same location as the CWD when virtualize.to_zarr was called. Can't just open a store pointing directly at the hierarchy and then open an array at the root store prefix...
  • Absolute paths Paths as URIs zarr-developers/VirtualiZarr#243

zarr.json

  • Add variable length chunks support

    Cannot concatenate arrays with inconsistent chunk shapes: (...) vs (...) .Requires ZEP003 (Variable-length Chunks)

    • zarrs already supports variable length chunks
  • chunk key encoding should match the path encoding ("0.0" etc).
    • the chunk-manifest-json storage transformer should not need to be aware of the chunk key encoding
    "chunk_key_encoding": {
        "configuration": {
            "separator": "."
        },
        "name": "v2"
    },
    "codecs": [
        {
            "name": "bytes",
            "configuration": {
                "endian": "little"
            }
        }
    ],

zarrs

  • Test on real/multifile netCDF data
  • Support NetCDF/HDF5 compression
    • ZLIB codec
    • SZIP codec

@LDeakin LDeakin added the waiting on spec waiting on Zarr spec/ZEP label Sep 22, 2024
Copy link

codecov bot commented Sep 22, 2024

Codecov Report

Attention: Patch coverage is 51.68067% with 115 lines in your changes missing coverage. Please review.

Project coverage is 82.75%. Comparing base (46bd880) to head (43a9fa3).
Report is 69 commits behind head on main.

Files with missing lines Patch % Lines
...anifest_json/chunk_manifest_storage_transformer.rs 44.11% 114 Missing ⚠️
...c/array/storage_transformer/chunk_manifest_json.rs 95.65% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #79      +/-   ##
==========================================
- Coverage   82.97%   82.75%   -0.23%     
==========================================
  Files         162      166       +4     
  Lines       22500    22738     +238     
==========================================
+ Hits        18670    18816     +146     
- Misses       3830     3922      +92     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@LDeakin
Copy link
Owner Author

LDeakin commented Nov 15, 2024

The manifest storage transformer is unlikely to proceed. Supplanted by https://icechunk.io/spec/

@LDeakin LDeakin closed this Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting on spec waiting on Zarr spec/ZEP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant