-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cannot use rechunker starting from 0.4.0
#92
Comments
Thanks for reporting this. Indeed it looks like 6cc0f26 (part of #77) caused this problem. @TomAugspurger - this sounds very similar to the problems we are having in Pangeo Forge, which we are attempting to fix in pangeo-forge/pangeo-forge-recipes#153. The basic theory is that the way we are building dask graphs results post #77 results in objects that are much too large. Aurelien, if you wanted to help debug, you could run the same steps that Tom ran in pangeo-forge/pangeo-forge-recipes#116 (comment) and report the results back here. From my point of view, we could simply revert #77. The main point of that was to have a representation of pipelines we could share between rechunker and pangeo-forge-recipes. But since we are likely abandoning the rechunker dependency in pangeo-forge-recipes, this no longer seems necessary. |
Here is the result with version 0.3.3:
and with latest version (73ef80b):
Please let me know if I need to adjust anything, you find the full code in the following notebook |
Ok that definitely confirms my suspicion that the way we are generating dask graphs is not efficient. 🤦 Looks like we will have to revert quite a few changes. @apatlpo - how urgent is this problem for you? Can you just keep using 0.3.0 for now? |
absolutely no rush on my side, I am perfectly happy with 0.3.3 (sorry corrected typo in earlier post), thx for your concern. |
I just stumbled across this, having a similar use case. For now, using 0.3.3 seems to be fine for me as well. |
I also wanted to second this - I was having problems rechunking a variable that had many small chunks. Dask would get stuck in the first few moments of trying to rechunk. Reverting to 0.3.3 solved this issue for me. |
+1 on this issue. Reverting to 0.3.3 seems to solve it, though unrelatedly I'm getting workers using 2-3x the That being said, anecdotally I seem to see workflows using more memory than expected with recent updates to dask/distributed starting around 2021.6 I believe, so it might be unrelated to rechunker. Going to do some more investigating |
I haven't had a chance to look at 6cc0f26, but pangeo-forge/pangeo-forge-recipes#116 was determined to be some kind of serialization problem, and was fixed by pangeo-forge/pangeo-forge-recipes#160. Just to confirm, when people say "using more memory" is that on the workers or the scheduler? |
@TomAugspurger for me it's the workers, and it's usually just one or two workers that quickly balloon up to that size while the remainder seem to respect max_mem. Let me see if I can come up with an example (might put it in a different issue since it's separate than the main thread of this one) |
Apologies for the slow pace here. I've been on vacation much of the past month. @TomAugspurger - I'd like chart a path towards resolving this. As discussed in today's Pangeo Forge meeting, this issue has implications for Pangeo Forge architecture. We have to decide whether to maintain the I'd personally like to keep the Pipelines framework and even break it out into its own package. But that only makes sense if we can get it working properly within rechunker first. I understand that you (Tom) probably don't have lots of time to dig into this right now. If that's the case, it would be great if you could at least help outline a work plan to get to the bottom of this issue. We have several other developers (e.g. me, @cisaacstern, @alxmrs, @TomNicholas) who could potentially be working on this. But your input would be really helpful to get started. In pangeo-forge/pangeo-forge-recipes#116 (comment), Tom made a nice diagnosis of the serializability issues in Paneo Forge recipes. Perhaps we could do the same here. We would need a minimum reproducible example for this problem. |
+1 & thanks! |
@jmccreight - would you be able to provide any more details? Were you using Xarray or Zarr inputs? Which executor? Code would be even better. Thanks so much for your help. |
Hi @rabernat! The quick overview:
|
I just hit this again -- one of my USGS Colleagues had |
Sorry for all the friction everyone! This package needs some maintenance. In the meantime, should we just pull the 0.4.2 release from pypi? |
I'm not the O.P but I had a similar issue. I tried the latest version on master and the rechunking went smoothly. |
Was this fixed in recent releases / can we close? |
I apologize in advance for posting an issue that may be incomplete.
After a recent library update I am no longer able to use rechunker for my use case.
This on an hpc platform
Symptoms are that nothing happens on the dask dashboard when launching the actual rechunking with
execute
.Using the
top
command on the scheduler node indicates 100% cpu usage and a slowly increasing memory usage.On the other hand, action takes place right away on the dask dashboard with older version of rechunker (version 0.3.3).
git bisecting versions indicates:
I am not sure what I could do in order to investigate further and would welcome suggestions.
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.8.10 | packaged by conda-forge | (default, May 11 2021, 07:01:05)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 3.12.53-60.30-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.18.2
pandas: 1.2.4
numpy: 1.20.3
scipy: 1.6.3
netCDF4: 1.5.6
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.8.3
cftime: 1.5.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.06.0
distributed: 2021.06.0
matplotlib: 3.4.2
cartopy: 0.19.0.post1
seaborn: 0.11.1
numbagg: None
pint: None
setuptools: 49.6.0.post20210108
pip: 21.1.2
conda: None
pytest: None
IPython: 7.24.1
sphinx: None
The text was updated successfully, but these errors were encountered: