-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTEX Protocol update [DRAFT] #3586
Comments
I'd really like to see incremental changes made to the existing protocol to address these issues before a massive protocol redesign: right now I see the inability of our current Parsl contributors to adapt the current protocol to address these issues is some evidence that those same contributors are not in a position to design a new protocol that addresses these issues. I'm also very aware of the allure of building a first-iteration Grand Solution over the grunt work of actually fixing small problems by sustained boring effort. We've had interchange protocol issues open for some time, and it's clear no one has the motivation to fix them - probably then those same people do not have time to fix the issues that inevitably will arise with a Grand Solution rewrite. Moving down one protocol layer, I've repeatedly heard Globus Compute people talk about messagepack, but without describing concrete benefits over our existing framing protocols (json and pickle). This issue is a pretty good place to start fleshing those out more concretely: pickle and json both are not particularly nice protocols here but they are both well supported and have not caused a lot of problems at this layer. |
one way to move forward with this is document the current protocols to the level that you want the final protocol to also be documented: without any other further work, that in itself is valuable for onboarding new people. then make the corresponding desired target protocol description. then build a series of supportable, reviewable and justifiable steps to get from one to the other, and proceed on that path. |
In talking about other things with @yadudoc, I think I get the sense that point 3 above (or something similar to it) has a background context of Globus Compute wanting to provide a much richer cross-version compatibility story for Globus Compute users, with something along the lines of up to four different GC installations contributing towards task execution (roughly in parsl terms, 1 the preparation of a function for remote execution (eg serialization of a function object), 2 the serialization of arguments and other preparation of a task invocation, 3 task dispatch around the interchange (around the endpoint in Globus Compute terms), 4 execution on a worker. I said to @yadudoc in that discussion that I think that story needs to be much better fleshed out on the Globus Compute side of things before placing requirements on Parsl-level protocols. |
Problem statement
Currently the
HighThroughputExecutor
does not have a clearly defined protocol to communicate tasks and results internally.This has a few different issues:
Duplication of logic all over the htex interchange that packages results, here are some examples:
Interchange reporting a version mismatch error: link
Interchange reporting ManagerLost exception: link
Interchange repeating heartbeat: link
Reporting a drained manager: link
Interchange pickling task dict before shipping to the manager: link
There is a lot of cruft that needs cleanup
GlobusCompute is interested in shipping task objects that HTEX can understand. Now, HTEX is given a wrapper function that unpacks GC's task object and executes it. I'm not 100% sold on this, but I would like some discussion on what this would look like.
Recently we've run into issues where @matthewc2003 had issues adding the
resource_specification
to the task package so that it can be used by the interchange for better scheduling decisions.Lack of metadata limited where MPIExecutor could handle decisions based on
resource_specification
. Ideally, some rework here could make it easier to update the MPIExecutor to useresource_specification
info on the interchange rather than leaving this logic to the worker.Describe the solution you'd like
messagepack
and see if we can use that, alternatively, we design a new protocol.Describe alternatives you've considered
We could do nothing, but our current model makes it harder for new work to be done on HTEX.
The text was updated successfully, but these errors were encountered: