You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Parquet does not allow for on-disk chunking of non-row dimensions. In the interests of providing as uniform an interface as possible to the casa, zarr and parquet backends, I propose that we support chunking in non-row dimensions using dask.Array.rechunk functionality. This will have some obvious limitations, as we will still end up with a row-only chunked array in memory i.e. we will read all channels for a number of rows, even if we subsequently process them in smaller chunks. This will will also introduce a shared root in resulting graph (although we may be able to circumvent this with inlining/caching).
An alternative would be to re-read and slice the data for non-row chunks. This would result in a large amount of memory and disk overhead as we would need to repeatedly allocate memory and read from disk.
My instinct is to go with the first option for now as this is still highly experimental functionality.
The text was updated successfully, but these errors were encountered:
Description
Parquet does not allow for on-disk chunking of non-row dimensions. In the interests of providing as uniform an interface as possible to the casa, zarr and parquet backends, I propose that we support chunking in non-row dimensions using
dask.Array.rechunk
functionality. This will have some obvious limitations, as we will still end up with a row-only chunked array in memory i.e. we will read all channels for a number of rows, even if we subsequently process them in smaller chunks. This will will also introduce a shared root in resulting graph (although we may be able to circumvent this with inlining/caching).An alternative would be to re-read and slice the data for non-row chunks. This would result in a large amount of memory and disk overhead as we would need to repeatedly allocate memory and read from disk.
My instinct is to go with the first option for now as this is still highly experimental functionality.
The text was updated successfully, but these errors were encountered: