Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Launch interchange as a fresh process #3463

Merged
merged 20 commits into from
Jun 14, 2024
Merged

Conversation

benclifford
Copy link
Collaborator

@benclifford benclifford commented May 28, 2024

This PR removes a use of multiprocessing fork-without-exec.

At heart, this is how the interchange has wanted to be launched for some time (because of earlier remote interchange work).

Launching multiprocessing fork caused a bunch of problems related to inheriteing state from from the parent submitting process that go away with this (jumbled logging topics, race conditions around at least logging-while-forking, inherited signal handlers).

The configuration dictionary, previously passed in memory over a fork, is now sent in pickled form over stdin. Using pickle here rather than (eg.) JSON keeps the path open for sending richer configuration objects, beyond what can be encoded in JSON. This isn't something needed right now, but at least configurable monitoring radios (the immediate driving force behind this PR) are modelled around passing arbitrary configuration objects around to configure things - and so it seems likely that if interchange monitoring configuration is exposed to the user, richer objects would be passed here. See PR #3315 for monitoring radio prototype.

there are default values in the interchange code, but they are all specified in the executor code too, so these defautls will
never be used. remove them as misleading.

see similar changes to process worker pool, PR #2973, for more detailed justification

needs to change zmq sockets test because that assumes the arbitrary defaults are present.
which is no longer the case.
but if you want to initialize an interchange that requires you to specify all this stuff, and want some arbitrary values,
then make those arbitrary values yourself.

client address parameter is now supplied by the executor - it was not before, and so the default/hard-coded value
now lives in the executor, not the interchange
…l, not multiprocessing

any downstream packaging will need to be aware of the presence of interchange.py as a new command-line invocable script
and this might break some build instructions which do not configure installed scripts onto the path.

this PR replaces keyword arguments with argparse command line parameters. it does not attempt to make those
command line arguments differently-optional than the constructor of the Interchange class (for example, worker_ports and
worker_port_range are both mandatory, because they are both specified before this PR)

i'm somewhat uncomfortable with this seeming like an ad-hoc serialise/deserialise protocol for what was previously
effecting a dict of typed python objects... but it's what process worker pool does.

see issue #3373 for interchange specific issue

see issue #2343 for parsl general fork vs threads issue

see possibly issue #3378?
@benclifford
Copy link
Collaborator Author

commit e7d18aa swaps this PR from using argparse to passing a base64 pickled config dict on the command line. that removes a lot of code around "we have to define a schema here", at the expense of a much more opaque interchange command line. (although prior to this PR, there was no visible interchange command line at all, in that sense: the same configuration dictionary was passed by remaining in memory across a fork)

I did it so that people can compare the argparse approach (before this commit) and the pickle approach (after this commit)

@benclifford
Copy link
Collaborator Author

(@rjmello @yadudoc @khk-globus are the people that probably care about my last comment - #3463 (comment)

)

@benclifford
Copy link
Collaborator Author

@khk-globus suggested using stdin rather than a command line parameter, and I have made that change.

We also briefly discussed what protocol to use over stdin (primarily: pickle vs JSON). This PR sticks with pickle and I added a note about that into the PR description:

Using pickle here rather than (eg.) JSON keeps the path open for sending richer configuration objects, beyond what can be encoded in JSON. This isn't something needed right now, but at least configurable monitoring radios (the immediate driving force behind this PR) are modelled around passing arbitrary configuration objects around to configure things - and so it seems likely that if interchange monitoring configuration is exposed to the user, richer objects would be passed here. See PR #3315 for monitoring radio prototype.

@benclifford benclifford marked this pull request as ready for review June 12, 2024 10:13
@khk-globus
Copy link
Collaborator

@khk-globus suggested using stdin rather than a command line parameter, and I have made that change.

Hmm; I appear to have commented in a weird place; pulling up from the commit-comment so the PR has an easily discoverable record of it:

Merely looking at the code removed, this is a beautiful addition. The key insight being that we are the consumers of this interface, so we don't have to support debuggability-for-humans, per se. That's good.

As discussed in Slack; I wonder if using pipe-to-stdin would be a better approach than faffing about with the CLI, "security concerns" there abouts, and potential (if nominal) size limits. A secondary question is whether pickle (over base64) is the correct serialization, as opposed to a "simple" repr/ast.literal_eval cycle, or even JSON.

Copy link
Collaborator

@khk-globus khk-globus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it. Nice work.

@benclifford benclifford merged commit 5e03e1f into master Jun 14, 2024
7 checks passed
@benclifford benclifford deleted the benc-interchange-fork branch June 14, 2024 09:18
khk-globus added a commit that referenced this pull request Jun 26, 2024
Per the analysis in #3495, defining the `ManagerLost` and `VersionMismatch`
errors in the `interchange.py` became a problem in #3463, where the interchange
now runs as `__main__`.  This makes it difficult for Dill to get the serde
correct.

The organizational fix is simply to move these classes to an importable
location, which follows the expectation that classes are available in both
local and remote locations, which defining in `__main__` can't easily
guarantee.

Fixes: #3495
benclifford pushed a commit that referenced this pull request Jun 27, 2024
Per the analysis in #3495, defining the `ManagerLost` and `VersionMismatch`
errors in the `interchange.py` became a problem in #3463, where the interchange
now runs as `__main__`.  This makes it difficult for Dill to get the serde
correct.

The organizational fix is simply to move these classes to an importable
location, which follows the expectation that classes are available in both
local and remote locations, which defining in `__main__` can't easily
guarantee.

Fixes: #3495
benclifford added a commit that referenced this pull request Jul 1, 2024
This was introduced in PR #3463 and at the time I incorrectly assumed that
interchange exit would close both ends of the pipe. That is untrue.

For example: pytest parsl/tests/test_htex/ --config local
ends with 341 fds open before this PR,
and 327 file descriptors open after this PR.
benclifford added a commit that referenced this pull request Jul 1, 2024
This was introduced in PR #3463 and at the time I incorrectly assumed that
interchange exit would close both ends of the pipe. That is untrue.

For example: pytest parsl/tests/test_htex/ --config local
ends with 341 fds open before this PR,
and 327 file descriptors open after this PR.
benclifford added a commit that referenced this pull request Oct 14, 2024
This was introduced in PR #2629 to guard against the submit process
installing a SIGTERM handler and then that handler being unexpectedly
inherited by the interchange via multiprocesssing fork

PR #3463 changed the interchange to run as a fresh Python
process, which will not inherit SIGTERM handlers, so since
then this line has been vestigial.

Fixes issue #3588
benclifford added a commit that referenced this pull request Oct 14, 2024
This was introduced in PR #2629 to guard against the submit process
installing a SIGTERM handler and then that handler being unexpectedly
inherited by the interchange via multiprocesssing fork

PR #3463 changed the interchange to run as a fresh Python
process, which will not inherit SIGTERM handlers, so since
then this line has been vestigial.

Fixes issue #3588
benclifford added a commit that referenced this pull request Oct 22, 2024
Prioer to PR #3463, the interchange process was launched
with multiprocessing fork and inherited the log configuration
of the parent process.


To give the interchange its own log file, a specific logger,
called "interchange" was used. In PR #2307, this logger was
configured to not propagate entries upwards, so that user
defined root handlers in the parent process do not see
interchange logs.

Since #3463 this special configuration has no longer been
necessary. See #3635 for a related change to signal handlers.
benclifford added a commit that referenced this pull request Oct 24, 2024
Prior to PR #3463, the interchange process was launched
with multiprocessing fork and inherited the log configuration
of the parent process.


To give the interchange its own log file, a specific logger,
called "interchange" was used. In PR #2307, this logger was
configured to not propagate entries upwards, so that user
defined root handlers in the parent process do not see
interchange logs.

Since #3463 this special configuration has no longer been
necessary. See #3635 for a related change to signal handlers.

Co-authored-by: Kevin Hunter Kesling <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants