You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In user code: I could manually figure out and rechunk before writing.
As part of maybe_rechunk in xds_to_zarr: This functionality currently forces chunks to match those already on disk. It would not be a stretch to extend this to include the case where a new zarr output is being written.
As a utility function: A function could be added e.g. rechunk_for_disk, which a user could call prior to writing an xarray.Dataset to disk.
Practically, a combination of 2 and 3 will likely be best i.e. write a utility function and then call it as part of xds_to_zarr. I will take a stab at this this week.
The text was updated successfully, but these errors were encountered:
JSKenyon
changed the title
daskms.experimental.xds_to_zarrdaskms.experimental.xds_to_zarr may fail due to large chunks
Aug 14, 2023
Description
When attempting to write an
xarray.Dataset
with large chunks to zarr, I got the following error:ValueError: Column FLAG has a chunk of dimension (132750, 4096, 4) that will exceed zarr's 2GiB chunk limit
What I Did
Possible solution
This problem can be solved in a number of ways:
maybe_rechunk
inxds_to_zarr
: This functionality currently forces chunks to match those already on disk. It would not be a stretch to extend this to include the case where a new zarr output is being written.rechunk_for_disk
, which a user could call prior to writing anxarray.Dataset
to disk.Practically, a combination of 2 and 3 will likely be best i.e. write a utility function and then call it as part of
xds_to_zarr
. I will take a stab at this this week.The text was updated successfully, but these errors were encountered: