-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help debugging workflow with numba + xarray + dask #6
Comments
Thanks for the detailed write up and notebook @rabernat -- I'll try it out on Pangeo Cloud. Briefly looking through the linked cc @gjoseph92 |
Looking briefly at some of the associated issues it looks like Tom may have a solution. I think that the question is if we want to wait for a numpy and numba release, or if we want to do a workaround. Guido seems to have a workaround that we could implement in Dask if necessary. @rabernat what is the time pressure here? Are you looking for something fast, or are you looking for something that might take a few months but will have universal effect? If the latter then it looks like Tom may already have done all of the work here and we just need to wait for the gears of OSS to move. If the former then we probably need to add a special case inside of (also, I see that James beat me to the punch on this one) |
Thanks for the quick replies! My hunch is that there are really two distinct issues here:
|
More generally, it would be nice to know whether the code we have in https://github.com/xgcm/fastjmd95/blob/master/fastjmd95/jmd95wrapper.py is considered "good practice" or not. |
FWIW I don't think this is possible to do on the Dask end. dask/distributed#3450 (comment) is a good explanation: by the time dask is seeing the ufunc, Additionally, Guido's workaround in numba/numba#4314 (comment) will only work for Interestingly, I would have expected your I'm going to dig into this to see how the compiled ufunc is getting into the graph, which may somehow be related to
(but if it isn't, then I'll look into |
It looks like @rabernat isn't too concerned about the serializability issue. We can probably just wait on that (although I encourage @gjoseph92 and @jrbourbeau to subscribe to the various upstream issues).
Yeah, it seems like broader investigation is called for here. Thanks @gjoseph92 |
Just to clarify, I am concerned about the serializability issue to the extent that it relates to the following central question: What is the best practice for integrating numba code seamlessly with xarray + dask.distributed processing? Our goal in several packages we maintain is to provide functions (possibly implemented in numba) that "just work" with:
With python lacking a robust mechanism for multiple dispatch, this may be unrealistic. I don't understand the deeper issues well enough to know for sure. |
The
I support this overarching focus. My guess is that the problem comes down to serialization, and that things will get better when this is resolved, but the fact that you're seeing differences based on the API path that you use is curious and warrants investigation. I suspect that if @gjoseph92 spends a bit of time running things and generating performance reports then we should be able to find some insight here. |
Great! Sounds like a good plan that will be useful for everyone involved. Thanks a lot. |
Totally agree that it makes sense for @gjoseph92 to dive deeper here. As a seed for future exploration, yesterday I ran through the notebook Ryan provided on Pangeo Cloud and captured a couple of performance reports (see below) so we can compare the Dask The primary things that jumps out to me is we spend a lot more transferring dependencies in the |
It would be useful to see a visualization of the high level graph that's involved in each case. I wonder if there is some rechunking going on with ufuncs. This would happen if we used gufunc and there were core dimensions, but I'm not sure that that's what's happening here. |
Oh, another possible cause for excess communication would be that high level blockwise fusion is working in one case but not the other. This again seems like a good thing for @gjoseph92 to think about. |
Good news: it turns out the serialization error you're seeing is actually already fixed—by you! Pangeo is still running fastjmd95 v0.1, which didn't have the It seems like the last Pangeo auto-update might have failed (xref pangeo-data/pangeo-cloud-federation#947), so Pangeo is still running versions of packages from a few months ago. I'll look into the performance discrepancy next (and see whether it also has anything to do with older versions). |
Well that's just embarrassing. 🤪 Very sorry for diverting so much effort before checking on this ourselves. |
Not a problem at all. It's just an unintentional side effect of your code that it happens to sidestep the problem—as you said this should be seamless; ideally you shouldn't even have to write |
So we have the updated package on https://staging.us-central1-b.gcp.pangeo.io/ (thanks @jrbourbeau!). It looks like things are pretty much working! I can do sigma0 = fastjmd95.rho(ds.SALT, ds.THETA, 0).mean() and have it "just work" reasonably well. There are still some small differences between the xarray vs. dask graphs, but not significant differences in execution time. @stb2145 - to close the loop, can you try your workflow on https://staging.us-central1-b.gcp.pangeo.io/ and see whether things are working more smoothly? Don't use any apply_ufunc or map_blocks, just call fastjmd95.rho directly on the xarray dataarrays. |
Successfully ran my workflow on staging pangeo with just calling |
Here is a notebook that I think reproduces a problem that @stb2145 is having in her workflow.
https://gist.github.com/rabernat/a794cc1f58aa1c515e9c1c2c385c24cb
We have a package called fastjmd95 that implements the ocean equation of state in numba:
https://github.com/xgcm/fastjmd95/
We would like to be able to call it on numpy, dask, xarray, and xarray+dask data and have it "just work". Our attempts to do that are in this module: https://github.com/xgcm/fastjmd95/blob/master/fastjmd95/jmd95wrapper.py (thanks @cspencerjones!)
In the notebook, we are calling this function on zarr-backed data in the cloud via a Dask Gateway cluster. It runs through various scenarios that do and don't work well. I won't repeat them here. Hopefully the notebook is self-explanatory. (Can be run from Pangeo Cloud or anywhere else with Google Cloud requestor pays credentials.)
Unfortunately, it doesn't work as we expect. In particular, we are having problems with using xarray apply_ufunc. In some settings, we get
PicklingError: Can't pickle <ufunc 'rho'>: attribute lookup rho on __main__ failed
. In others, the computation launches but performs very slow compared to the equivalentdask.map_blocks
version.Aspects of this problem has been raised elsewhere on github:
xr.apply_ufunc()
andjmd95
xgcm/fastjmd95#9But we have never gotten to the bottom of it.
IMO this is a very useful issue to explore. In theory, the combination of numba + dask + xarray + zarr should be a killer stack for scientific computing in the cloud. But this issue reveals the challenges we have had in integrating these tools.
Any help you could provide @jrbourbeau and co would be very appreciated.
The text was updated successfully, but these errors were encountered: