-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problems with numba ufunc + distributed #3450
Comments
And I just found numba/numba#4314, which seems to be the dual of this issue in the numba repo... That issue suggests that the problem is just with dynamically defined functions (i.e. functions defined in the notebook interpreted), as in my minimal example. But I am still having the same problem with my full example, where the functions are defined in a module in an installed package. |
cc @seibert from numba and @TomAugspurger for the Anaconda/Pangeo connection |
Via gitter, @mrocklin pointed me to the official dask example with numba: I have confirmed that the official example only works with a single multi-threaded worker. So the problem is related to multiprocessing and serialization. More specifically, the cluster in that example is created like this: from dask.distributed import Client, progress
client = Client(threads_per_worker=4,
n_workers=1,
processes=False,
memory_limit='4GB') If instead, I change to client = Client(n_workers=4,
memory_limit='4GB') The workers die with the error message
My conclusion from this is that numba functions are generally incompatible with distributed because they don't serialize correctly. Can anyone from numba provide some confirmation on this? |
Related issue in numba, closed a while back, suggesting this should be resolved: numba/numba#2943 |
Looking into this now.
@rabernat as a quick workaround, you might be able to fix this by importing the functions like from mypackage import test_numba
test_numba(...) rather than import mypackage
mypackage.test_numba(...) Will figure out a proper fix. |
I have a slightly better understanding of the situation now. The call order is something like
The In [2]: b.test_numba
Out[2]: <numba._DUFunc 'test_numba'>
In [3]: b.test_numba.ufunc
Out[3]: <ufunc 'test_numba'>
In [4]: type(b.test_numba.ufunc)
Out[4]: numpy.ufunc And that's what chokes up dask's serialization In [9]: pickle.loads(pickle.dumps(b.test_numba))
Out[9]: <numba._DUFunc 'test_numba'>
In [10]: pickle.loads(pickle.dumps(b.test_numba.ufunc))
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-10-cff7e39b4aa8> in <module>
----> 1 pickle.loads(pickle.dumps(b.test_numba.ufunc))
~/Envs/dask-dev/lib/python3.7/site-packages/numpy/core/__init__.py in _ufunc_reconstruct(module, name)
130 # scipy.special.expit for instance.
131 mod = __import__(module, fromlist=[name])
--> 132 return getattr(mod, name)
133
134 def _ufunc_reduce(func):
AttributeError: module '__main__' has no attribute 'test_numba' # b.py
from numba import vectorize, float64, float32
@vectorize([float64(float64), float32(float32)], nopython=True)
def test_numba(a):
return a**2 Will start looking for solutions now. |
Thanks so much for looking into this. Before you dig too deep, it's worth confirming whether the problem in my toy example is indeed the same one as in the full example. The key difference is that in the full example, the function is defined in a module in a package, which is installed with pip. |
Thanks, my hope is that my file import pickle
import fastjmd95
if __name__ == "__main__":
pickle.loads(pickle.dumps(fastjmd95.rho.ufunc)) |
@sklam if we wanted to support this, it seems like we'd need to ensure that DUFunc.ufunc is pickleable. IIUC, that's created at https://github.com/numba/numba/blob/fd8c232bb37a1945e8dc8becef9fe05fdd78c4cf/numba/np/ufunc/_internal.c#L227-L229. Does that sound right? Or perhaps there's a deeper issue with the generated. Do you know why |
FWIW, I can't reproduce the issue going through the ufunc tutorial in https://docs.scipy.org/doc/numpy/user/c-info.ufunc-tutorial.html. |
FYI @rabernat, as a workaround you / your users can use (That of course sacrifices writing generic code that works on ndarrays, DataArrays, & dask ararys, so I'll continue to look into this). |
Right now, I think handling this is either on Numba or (unfortunately) Numba's users. I have a hack at https://gist.github.com/TomAugspurger/38c68595a91387926907a2436305c8c2 that nobody should really use, but is possibly an option for libraries like fastjmd95 (though it will need way more vetting). That gist includes a description of what's going on and why it does it. Ideally, Numba could make things better by implementing custom pickling handlers for the |
Can we change this in Numpy itself?
Can we change this in cloudpickle? @llllllllll might have thoughts on how
to serialize dynamically generated Numpy ufuncs.
…On Wed, Feb 12, 2020 at 12:46 PM Tom Augspurger ***@***.***> wrote:
Right now, I think handling this is either on Numba or (unfortunately)
Numba's users.
I have a hack at
https://gist.github.com/TomAugspurger/38c68595a91387926907a2436305c8c2
that nobody should really use, but is possibly an option for libraries like
fastjmd95 (though it will need way more vetting). That gist includes a
description of what's going on and why it does it.
Ideally, Numba could make things better by implementing custom pickling
handlers for the DUFunc.ufunc objects. I'm not sure if this is possible
though, since these are real numpy.ufunc instances, which numpy controls.
They aren't associated with a __module__, so the usual pickle mechanisms
fail. And I don't know it'd be possible to make some kind of ufunc
subclass, or attach otherwise override __reduce__, since this is all done
in C against the CPython API. Hopefully @sklam <https://github.com/sklam>
has some thoughts on whether there are options in numba.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3450?email_source=notifications&email_token=AACKZTD5C7JILE7UD2DQK6LRCRN33A5CNFSM4KQ6DA42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELSKKKA#issuecomment-585409832>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTHTCLX67F2VX574OBTRCRN33ANCNFSM4KQ6DA4Q>
.
|
I think if Numba is unable to handle this, then it's worth proposing a change to NumPy to avoid the patching in https://gist.github.com/TomAugspurger/38c68595a91387926907a2436305c8c2. |
Would it make sense to change Numpy anyway? I haven't looked too deeply at
your solution, but at least by the "number of lines of code" metric it
seems simple? cc @seberg
…On Wed, Feb 12, 2020 at 12:55 PM Tom Augspurger ***@***.***> wrote:
I think if Numba is unable to handle this, then it's worth proposing a
change to NumPy to avoid the patching in
https://gist.github.com/TomAugspurger/38c68595a91387926907a2436305c8c2.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3450?email_source=notifications&email_token=AACKZTEMD47FGYWEJYXSRN3RCRO3JA5CNFSM4KQ6DA42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELSLEHA#issuecomment-585413148>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTDBPDVZSHAQYZSPGLDRCRO3JANCNFSM4KQ6DA4Q>
.
|
Interesting issue, I have never thought about pickling of user ufuncs. It seems like NumPy would have to know the fully qualified module+name of the desired ufunc. Tom's solution seems reasonable to me, but not something that can be done in NumPy, since NumPy knows nothing about numba ufuncs. It seems like something to keep in mind when we revise the UFunc API, which I assume we have to do sooner rather than later hopefully. I guess we could add API right now in principle, although the solution from within numba seems simpler? It would have to be new C-API. I wonder if it breaks strange things like unpickling of |
Just to clarify, this is only for user-defined ufuncs that don't have a
That's the main issue from NumPy's side. My |
Ah OK, interesting, although I do not quite see how it actually knows the correct module :). Is there a way that the |
I'm not sure. AFAICT, we need to somehow tell the class DUFunc:
def __init__(self, ...):
self.ufunc = generate_ufunc(module=self.__class__.__module__)
def generate_ufunc(..., module):
ufunc = ...
ufunc.__reduce__ = custom_reduce(module)
return ufunc But the |
Right, with the wrapping logic, I have no idea if there is any chance of that working. ufuncs cannot be subclassed right now. |
Thanks. I'll attempt to summarize the current state of things. It'd be nice to write As someone who doesn't know C and doesn't know how ufuncs are implemented, it'd be nice if NumPy used something like |
I guess its tricky, since the numpy ufunc does the UFuncs do not have reasonable state you can store as such, they are more like builtin classes in that regard which are also singleton with state set at import time. I think we would have to either provide only a way to load it as a fully qualified path (basically just allowing to set |
This issue came up again on the pangeo call today. @sklam or @stuartarchibald, assuming this is difficult / impossible to solve on the NumPy side, do you have any guesses on how Numba could avoid creating the dynamically generated ufunc at https://github.com/numba/numba/blob/fd8c232bb37a1945e8dc8becef9fe05fdd78c4cf/numba/np/ufunc/_internal.c#L227-L229? The summary of the issue is at #3450 (comment). |
@seberg I looked into this again today. I'm afraid that doing things properly by implementing some kind of overrideable
diff --git a/numpy/core/__init__.py b/numpy/core/__init__.py
index c77885954..6f816747c 100644
--- a/numpy/core/__init__.py
+++ b/numpy/core/__init__.py
@@ -117,6 +117,20 @@ __all__ += einsumfunc.__all__
# Here are the loading and unloading functions
# The name numpy.core._ufunc_reconstruct must be
# available for unpickling to work.
+
+
+_ufunc_modules = {} # Dict[ufunc, Tuple[Callable, Tuple]]]
+
+def _register_ufunc(ufunc, reconstruct_function, reconstruct_args):
+ _ufunc_modules[ufunc] = (reconstruct_function, reconstruct_args)
+
+
+def _ufunc_reconstruct_registered(module, name):
+ import operator
+ mod = __import__(module)
+ return operator.attrgetter(name)(mod)
+
+
def _ufunc_reconstruct(module, name):
# The `fromlist` kwarg is required to ensure that `mod` points to the
# inner-most module rather than the parent package when module name is
@@ -128,7 +142,11 @@ def _ufunc_reconstruct(module, name):
def _ufunc_reduce(func):
from pickle import whichmodule
name = func.__name__
- return _ufunc_reconstruct, (whichmodule(func, name), name)
+ if func in _ufunc_modules:
+ reconstruct_func, args = _ufunc_modules[func]
+ return reconstruct_func, args
+ else:
+ return _ufunc_reconstruct, (whichmodule(func, name), name)
import copyreg Then numba could use it like import copyreg
import numpy as np
from numba import vectorize, float64, float32
@vectorize([float64(float64), float32(float32)], nopython=True)
def test_numba(a):
return a ** 2
# This would be done as part of numba's DUFunc.__init__, not by the user
def _reconstruct_func(module, name):
import importlib
import operator
module = importlib.import_module(module)
return operator.attrgetter(name)(module)
np.core._register_ufunc(test_numba.ufunc, _reconstruct_func, (__name__, "test_numba.ufunc")) Do you see any hope for something like that being merged into NumPy? Or would we need to wait for a proper solution doing things in C? |
@TomAugspurger, seems hackish, but maybe a band-aid is better than nothing. However, I tried around a bit and I think we are missing that pickle got better, or how reliable it actually is? I.e. I think NumPy is over-engineered and that makes the solution harder than necessary. I tried modifying NumPy like this, but you can also do it manually:
Now you need one more ingredient, and that is that Now overriding the ufunc pickling outside of NumPy seems pretty extreme, but I am not actually sure its all that bad, I did not check, but I think the above replacement is effectively identical to what NumPy does, except that it supports attributes in a |
This also allows at least in principle numba dynamically generated ufuncs to be pickled (with some hacking), see: dask/distributed#3450 If the name of the ufunc is set to a qualname, using this method, pickle should be able to unpickle the ufunc correctly. We may want to allow setting the module and qualname explicitly on the ufunc object to remove the need for the custom pickler completely.
I'm not sure I follow your example, sorry. Is there an else block missing? But if I understand the spirit of the example, a (mutable?) |
@TomAugspurger, sorry, I put the A (semi) mutable |
Thanks for numpy/numpy#17289 So IIUC, we'd update somewhere around https://github.com/numpy/numpy/blob/7e9d603664edc756f555fecf8649bf888a46d47c/numpy/core/src/umath/ufunc_object.c#L6117-L6119 to make it setable? And when numba generates the ufunc dynamically, they'd set |
I am thinking that in theory, numba could set a different name here: https://github.com/numba/numba/blob/fd8c232bb37a1945e8dc8becef9fe05fdd78c4cf/numba/np/ufunc/_internal.c#L213 at least for now. It is all a bit problematic I admit, if you want to not leak the name (Numba can probably hack this, but I am not quite sure, its all pretty brittle and since Numba copies some NumPy code, in the end numba probably just needs to always make sure to keep up with any NumPy change). But yes, adding |
waiting to be fixed upstream. dask/distributed#3450 numba/numba#6234
We have created a new software package called fastjmd95 that uses numba to accelerate computation of the ocean equation of state. Everything works find with dask and a local scheduler. Now I want to run this code on a distributed dask cluster. It isn't working, I think because the workers are not able to deserialize the numba functions properly.
Original Full Example
This example with real data can be run on any pangeo cluster
Minimal Example
I believe this reproduces the core problem
At this point I get a
KilledWorker
error. In the worker log, I can see the following error (sorry for the lack of formatting--that's how it comes out of the worker error logs)The basic error appears to be the same as in the full example.
This seems like a pretty straightforward use of numba + distributed, and I assumed this sort of usage was supported. Am I missing something obvious?
Installed versions
I'm on dask 2.9.0 and numba 0.48.0.
The text was updated successfully, but these errors were encountered: