Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More cloudpickle executor #767

Merged
merged 17 commits into from
Jul 12, 2023
Merged

More cloudpickle executor #767

merged 17 commits into from
Jul 12, 2023

Conversation

liamhuber
Copy link
Member

@liamhuber liamhuber commented Jul 11, 2023

A quick draft of the idea discussed in #765 to use cloudpickle for everything being passed to the ProcessPoolExecutor (callable, args, and return value). I've got to stop for the day already, but the first attack got working so I wanted to push it.

Todo:

Should this just supplant the simpler executor that only cloudpickles the callable? I lean towards "no", since I suspect this is slower and may be overkill a lot of the time, but I'm open to dissent and want to do some timing tests before a final decision.

EDIT: Speed tests didn't show any significant slowdown using the more extensive cloud-pickler, so I just replaced the old executor. This PR now gives a ProcessPoolExecutor that can handle anything cloudpickle can.

@github-actions
Copy link
Contributor

Binder 👈 Launch a binder notebook on branch pyiron/pyiron_contrib/more_cloudpickle

@liamhuber liamhuber mentioned this pull request Jul 11, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jul 11, 2023

Pull Request Test Coverage Report for Build 5535218469

  • 46 of 78 (58.97%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.2%) to 12.523%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pyiron_contrib/executors/cloudpickleprocesspool.py 45 77 58.44%
Totals Coverage Status
Change from base Build 5525529556: 0.2%
Covered Lines: 1800
Relevant Lines: 14373

💛 - Coveralls

@liamhuber
Copy link
Member Author

I did some speed testing and don't see a significant slowdown from using the more complete executor that cloudpickles everything instead of just the called function. For simplicity, I'll just get rid of the more restrictive class.

Test:

from functools import partialmethod
from sys import getsizeof
import time

from pyiron_contrib.executors.executors import CloudProcessPoolExecutor, CloudpickleProcessPoolExecutor

class Foo:
    """
    A base class to be dynamically modified for testing CloudpickleProcessPoolExecutor.
    """
    def __init__(self, fnc: callable):
        self.fnc = fnc
        self.result = None

    @property
    def run(self):
        return self.fnc

    def process_result(self, future):
        self.result = future.result()


def dynamic_foo():
    """
    A decorator for dynamically modifying the Foo class to test
    CloudpickleProcessPoolExecutor.

    Overrides the `fnc` input of `Foo` with the decorated function.
    """
    def as_dynamic_foo(fnc: callable):
        return type(
            "DynamicFoo",
            (Foo,),  # Define parentage
            {
                "__init__": partialmethod(
                    Foo.__init__,
                    fnc
                )
            },
        )

    return as_dynamic_foo

@dynamic_foo()
def fnc(x):
    return x
f = fnc()

arg = {n: "foo"*n for n in range(10000)}  # or arg = 1
print(getsizeof(arg)/1000000, "MB")
>>> 0.294992 MB
%%timeit
ex = CloudpickleProcessPoolExecutor()  # Just the callable
fs = ex.submit(f.run, arg)
fs.result()
>>> 13.6 s ± 168 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
ex = CloudProcessPoolExecutor()  # Callable, args, and return value
fs = ex.submit(f.run, arg)
fs.result()
>>> 14 s ± 134 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So just outside agreeing within error. When the argument is small (1), they agree within error and sometimes the more extensive cloudpickler comes out ahead.

@liamhuber
Copy link
Member Author

linux 3.8 error reads like this in a few places:

----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/pyiron_contrib/pyiron_contrib/tests/unit/executor/test_cloudprocesspoolexecutor.py", line 99, in test_unpickleable_return
    fs = executor.submit(dynamic_dynamic.run)
  File "/home/runner/work/pyiron_contrib/pyiron_contrib/pyiron_contrib/executors/executors.py", line 58, in submit
    return self._submit(
  File "/home/runner/work/pyiron_contrib/pyiron_contrib/pyiron_contrib/executors/executors.py", line 81, in _submit
    self._executor_manager_thread_wakeup.wakeup()
AttributeError: 'CloudProcessPoolExecutor' object has no attribute '_executor_manager_thread_wakeup'

----------------------------------------------------------------------
Ran 93 tests in 11.308s

FAILED (errors=5, skipped=40)
Error: Process completed with exit code 1.

This is not the end of the world. I'm directly breaking into concurrent.futures.ProcessPoolExecutor.submit and replacing f = _base.Future() with our future and just copy-pasting the rest of the method. They've simply got different attributes in different versions. It would be nice to find a more elegant solution than copying the rest of the method directly, but it is at least simple to fix this by having a method for each version.

@liamhuber liamhuber marked this pull request as ready for review July 12, 2023 18:35
@liamhuber liamhuber added the format_black Invoke a black formatting commit label Jul 12, 2023
@liamhuber liamhuber merged commit a746fe6 into main Jul 12, 2023
13 of 14 checks passed
@delete-merged-branch delete-merged-branch bot deleted the more_cloudpickle branch July 12, 2023 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
format_black Invoke a black formatting commit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants