Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executor submodule #765

Merged
merged 3 commits into from
Jul 11, 2023
Merged

Executor submodule #765

merged 3 commits into from
Jul 11, 2023

Conversation

liamhuber
Copy link
Member

@liamhuber liamhuber commented Jul 10, 2023

Introduce a new executor submodule for our custom children of concurrent.futures.Executor. Once it's wrapped up in class form I'd like to stick @jan-janssen's hash-and-serialize executor here so I can start playing with it in the workflows. Jan, if I get to packaging it that way first (maybe tomorrow or Wednesday if life goes smoothly?) I'll open a draft PR right away so you can see what's going on.

In the meantime, here is a little executor I found on stack overflow that exploits cloudpickle so that we can handle dynamically defined classes. I did a little MWE of how workflow node decorators are running and test against that.

@github-actions
Copy link
Contributor

Binder 👈 Launch a binder notebook on branch pyiron/pyiron_contrib/executor_submodule

@liamhuber liamhuber mentioned this pull request Jul 10, 2023
4 tasks
@github-actions
Copy link
Contributor

github-actions bot commented Jul 10, 2023

Pull Request Test Coverage Report for Build 5524794387

  • 8 of 8 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.05%) to 12.319%

Totals Coverage Status
Change from base Build 5509197271: 0.05%
Covered Lines: 1762
Relevant Lines: 14303

💛 - Coveralls


def _apply_cloudpickle(fn, /, *args, **kwargs):
fn = cloudpickle.loads(fn)
return fn(*args, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you also want to cloudpickle the output on the way back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean. Can you suggest a modification to the test that makes it fail without this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a suggestion below. The other part I am surprised about is that you do not pickle the arguments, but only the function. Should not we pickle everything together?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is only necessary when the args or return values are themselves things that cannot be normally pickled (eg instances of classes created by a decorator). Tomorrow I'll come back and make the test case much crueler. Hopefully then we'll see the tests fail without your suggestion and similar treatment for the args

@jan-janssen
Copy link
Member

Just for future reference https://github.com/pyiron-dev/pysqa-executor


def _apply_cloudpickle(fn, /, *args, **kwargs):
fn = cloudpickle.loads(fn)
return fn(*args, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fn(*args, **kwargs)
return cloudpickle.dumps(fn(*args, **kwargs))

@pmrv pmrv mentioned this pull request Jul 11, 2023
19 tasks
Copy link
Contributor

@pmrv pmrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me to get the ball rolling. Can we think about making this dynamic somehow? Like I give you any executor and you give me back one that cloudpickles? OTOH, we'll on the order of less than a dozen "official" executors, so wrapping them once is also not too much hassle.

Since we have this now, I'll go ahead and rename tinybase's executors, since their are slightly different and I didn't like that name clash to begin with.

@liamhuber
Copy link
Member Author

I'm working on this currently. I agree that for non-pickleable callables that take pickleable arguments and have a pickleable return value, it's fine. Jan's suggestion fixes the case of non-pickleable return values but then the future results need to be unpickled post-facto, i.e. type(fs.results()) is <class 'bytes'> while coudpickle.loads(fs.results()) is the thing we actually want.

I see a potential solution where we really replace the underlying Future class with one whose .results() super's the parent class but then un-cloud-pickles any bytes found. But this is getting quite extreme. I'm going to go have lunch and think about it a bit.

@liamhuber
Copy link
Member Author

liamhuber commented Jul 11, 2023

So I'd like to merge this one as-is. I added a bit of stuff so that both the docstrings and tests are more explicit that this handles unpickleable callables but that arguments and return values must still be pickleable. This is already totally sufficient for #760, where our callables are methods of dynamically created nodes, but the input and output is (so far) always just regular data.

Jan's suggestion above does extend the class to handle unpickleable return values, and one can imagine a similar attack on arguments. The problem is that it also messes with the interface, since fs.result() is now a bytecode object in need of cloudpickle.loads instead of just the actual result we want.

I have a solution in mind (sketched below), but I would much rather get the current code -- which is working fine and tested -- merged, and either replace it or add a new more powerful executor in a future PR. This will keep the PRs smaller, give Jan's serializing and hashing executor a logical place to live on contrib, and remove barriers to working on #760 in case my sketch of an idea doesn't pan out. @jan-janssen, are you amenable to such a split/delay?

The idea:

We can get things working ok, but the future result is ugly, so we'll need a new class to handle this, like

class CloudLoadsFuture(Future):
    def result(self):
        result = super().result()
        if isinstance(result, bytes):
            result = cloudpickle.loads(result)
        return result

Then we apply a wrapper and subclass the subprocess executor to make sure everything is getting serialized before it gets seen by the multiprocessing library:

class CloudPickledCallable:
    def __init__(self, fnc: callable):
        self.fnc_serial = cloudpickle.dumps(fnc)

    def __call__(self, /, dumped_args, dumped_kwargs):
        fnc = cloudpickle.loads(self.fnc_serial)
        args = cloudpickle.loads(dumped_args)
        kwargs = cloudpickle.loads(dumped_kwargs)
        return cloudpickle.dumps(fnc(*args, **kwargs))

    @classmethod
    def dumps(cls, stuff):
        return cloudpickle.dumps(stuff)


class CloudpickleProcessPoolExecutor(ProcessPoolExecutor):
    def submit(self, fn, /, *args, **kwargs):
        return self._submit(
            CloudPickledCallable(fn), 
            CloudPickledCallable.dumps(args), 
            CloudPickledCallable.dumps(kwargs)
        )

    def _submit(self, fn, /, *args, **kwargs):
        # This is just exactly ProcessPoolExecutor.submit, but we hack in
        # f = CloudLoadsFuture()
        # instead of 
        # f = _base.Future()
        # for the future instantiation
        raise NotImplementedError

Of course this may still not do what I want, is probably not the most elegant solution, and I want to take another peek around to see if there is an existing tool that does this (there's at least a multiprocessing replacement multiprocess that replaces pickle with dill...), but these are all reasons I'd rather defer this more complex (and as yet unnecessary) task to another PR.

@liamhuber liamhuber mentioned this pull request Jul 11, 2023
4 tasks
@liamhuber
Copy link
Member Author

UPDATE: Actually that plan of attack worked basically flawlessly (at least as far as my current tests can see), and is drafted over in #767

@jan-janssen
Copy link
Member

@jan-janssen, are you amenable to such a split/delay?

As it is pyiron_contrib I am fine with merging this as is. @pmrv and I had a similar discussion in #766 and we both agreed that the Executors still need some extensions before we can use them as the core for the next generation of pyiron_base. Nevertheless this should not block the development here in pyiron_contrib and it helps us to clarify what we still need to work on.

@liamhuber liamhuber merged commit 4a35eb2 into main Jul 11, 2023
13 of 14 checks passed
@delete-merged-branch delete-merged-branch bot deleted the executor_submodule branch July 11, 2023 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants