You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I woudl like users to open the generated kerchunked files without using much memory. I reached the point where I could not load all single jsons anymore into memory for the MultiZarrToZarr function. So I had to create more than one, lets say combined1.parq and combined2.parq. Users then could use a catalog entry which would use code like:
however this requires GB of memory. Idk for what exactly but it is not applicable. Also, it creates another level of potential confusion: Why are thre multiple kerchunks of multiple kerchunks?
So the other option would be to reduce memory usage for merging with MultiZarrToZarr. However, a lot of kerchunk functions require json inputs. E.g., I also use the merge_vars function after the MultiZarrToZarr.
So what are your recommendations? Wouldnt it be nice to have an append kwarg for multizarrtozarr? Or is there already sth for that?
Best,
Fabi
The text was updated successfully, but these errors were encountered:
Final comment here: of course kerchunk can't magically get xarray to do the right thing, and I don't know why it used so much memory in the first place. However, we should be able to combine multiple datasets and save new metadata so that it becomes unnecessary to call open_mfdataset in the general case. We can leave that as aspiration.
Hi,
I woudl like users to open the generated kerchunked files without using much memory. I reached the point where I could not load all single jsons anymore into memory for the
MultiZarrToZarr
function. So I had to create more than one, lets saycombined1.parq
andcombined2.parq
. Users then could use a catalog entry which would use code like:however this requires GB of memory. Idk for what exactly but it is not applicable. Also, it creates another level of potential confusion: Why are thre multiple kerchunks of multiple kerchunks?
So the other option would be to reduce memory usage for merging with
MultiZarrToZarr
. However, a lot of kerchunk functions require json inputs. E.g., I also use themerge_vars
function after theMultiZarrToZarr
.So what are your recommendations? Wouldnt it be nice to have an append kwarg for multizarrtozarr? Or is there already sth for that?
Best,
Fabi
The text was updated successfully, but these errors were encountered: