-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More cloudpickle executor #767
Conversation
Pull Request Test Coverage Report for Build 5535218469
💛 - Coveralls |
I did some speed testing and don't see a significant slowdown from using the more complete executor that cloudpickles everything instead of just the called function. For simplicity, I'll just get rid of the more restrictive class. Test: from functools import partialmethod
from sys import getsizeof
import time
from pyiron_contrib.executors.executors import CloudProcessPoolExecutor, CloudpickleProcessPoolExecutor
class Foo:
"""
A base class to be dynamically modified for testing CloudpickleProcessPoolExecutor.
"""
def __init__(self, fnc: callable):
self.fnc = fnc
self.result = None
@property
def run(self):
return self.fnc
def process_result(self, future):
self.result = future.result()
def dynamic_foo():
"""
A decorator for dynamically modifying the Foo class to test
CloudpickleProcessPoolExecutor.
Overrides the `fnc` input of `Foo` with the decorated function.
"""
def as_dynamic_foo(fnc: callable):
return type(
"DynamicFoo",
(Foo,), # Define parentage
{
"__init__": partialmethod(
Foo.__init__,
fnc
)
},
)
return as_dynamic_foo
@dynamic_foo()
def fnc(x):
return x
f = fnc()
arg = {n: "foo"*n for n in range(10000)} # or arg = 1
print(getsizeof(arg)/1000000, "MB")
>>> 0.294992 MB %%timeit
ex = CloudpickleProcessPoolExecutor() # Just the callable
fs = ex.submit(f.run, arg)
fs.result()
>>> 13.6 s ± 168 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %%timeit
ex = CloudProcessPoolExecutor() # Callable, args, and return value
fs = ex.submit(f.run, arg)
fs.result()
>>> 14 s ± 134 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) So just outside agreeing within error. When the argument is small ( |
linux 3.8 error reads like this in a few places:
This is not the end of the world. I'm directly breaking into |
A quick draft of the idea discussed in #765 to use cloudpickle for everything being passed to the
ProcessPoolExecutor
(callable, args, and return value). I've got to stop for the day already, but the first attack got working so I wanted to push it.Todo:
Should this just supplant the simpler executor that only cloudpickles the callable? I lean towards "no", since I suspect this is slower and may be overkill a lot of the time, but I'm open to dissent and want to do some timing tests before a final decision.
EDIT: Speed tests didn't show any significant slowdown using the more extensive cloud-pickler, so I just replaced the old executor. This PR now gives a
ProcessPoolExecutor
that can handle anythingcloudpickle
can.