Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature comparison vs kerchunk #80

Closed
TomNicholas opened this issue Apr 13, 2024 · 0 comments · Fixed by #81
Closed

Feature comparison vs kerchunk #80

TomNicholas opened this issue Apr 13, 2024 · 0 comments · Fixed by #81
Labels
documentation Improvements or additions to documentation Kerchunk Relating to the kerchunk library / specification itself

Comments

@TomNicholas
Copy link
Member

I made this comparison table, which should go in the docs somewhere.

Component / Feature Kerchunk VirtualiZarr
In-memory layer
In-memory representation of byte ranges for single array Part of a "reference dict" with keys for each chunk in array ManifestArray instance (wrapping a ChunkManifest instance)
In-memory representation of actual data values Encoded bytes directly serialized into the "reference dict", created on a per-chunk basis using the inline_threshold kwarg numpy.ndarray instances, created on a per-variable basis using the loadable_variables kwarg
In-memory representation of entire file / store Nested "reference dict" with keys for each array in file xarray.Dataset with variables wrapping ManifestArray instances (or numpy.ndarray instances)
On-disk serialization
Kerchunk reference format as JSON f.write(ujson.dumps(h5chunks.translate()).encode()) , then read using an fsspec.filesystem mappe ds.virtualize.to_kerchunk('combined.json', format='JSON') , then read using an fsspec.filesystem mapper
Kerchunk reference format as parquet df.refs_to_dataframe(out_dict, "combined.parq"), then read using an fsspec.implementations.reference.ReferenceFileSystem mapper ds.virtualize.to_kerchunk('combined.parq', format=parquet') , then read using an fsspec.implementations.reference.ReferenceFileSystem mapper
Zarr v3 store with manifest.json files n/a ds.virtualize.to_zarr(), then read via any Zarr v3 reader which implements the manifest storage transformer ZEP
Manipulation of in-memory references
Combining references to multiple arrays representing different variables kerchunk.combine.MultiZarrToZarr xarray.merge
Combining references to multiple arrays representing the same variable kerchunk.combine.MultiZarrToZarr using the concat_dims kwarg xarray.concat
Combining references in coordinate order kerchunk.combine.MultiZarrToZarr using the coo_map kwarg xarray.combine_by_coords with in-memory xarray indexes created by loading coordinate variables first
Combining along multiple dimensions without coordinate data n/a xarray.combine_nested
Parallelization
Parallelized generation of references Wrapping kerchunk's opener inside dask.delayed Wrapping open_virtual_dataset inside dask.delayed but eventually instead using xarray.open_mfdataset(..., parallel=True)
Parallelized combining of references (tree-reduce) kerchunk.combine.auto_dask Wrapping ManifestArray objects within dask.array.Array objects inside xarray.Dataset to use dask's concatenate
Generation of references from archival files
From a netCDF4/HDF5 file kerchunk.hdf.SingleHdf5ToZarr open_virtual_dataset, via kerchunk.hdf.SingleHdf5ToZarr or potentially hidefix
From a netCDF3 file kerchunk.netCDF3.NetCDF3ToZarr open_virtual_dataset, via kerchunk.netCDF3.NetCDF3ToZarr
From a COG / tiff file kerchunk.tiff.tiff_to_zarr open_virtual_dataset, via kerchunk.tiff.tiff_to_zarr or potentially cog3pio
From a Zarr v2 store kerchunk.zarr.ZarrToZarr open_virtual_dataset, via ``kerchunk.zarr.ZarrToZarr` ?
From a GRIB2 file kerchunk.grib2.scan_grib open_virtual_datatree, via kerchunk.grib2.scan_grib ?
From a FITS file kerchunk.fits.process_file open_virtual_dataset, via kerchunk.fits.process_file ?
@TomNicholas TomNicholas added documentation Improvements or additions to documentation Kerchunk Relating to the kerchunk library / specification itself labels Apr 13, 2024
@TomNicholas TomNicholas mentioned this issue Apr 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation Kerchunk Relating to the kerchunk library / specification itself
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant