-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace DFK and JobStatusPoller monitoring zmq channels with Queue radio #3338
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ugin Before this PR, the DataFlowKernel and each JobStatusPoller PolledExecutorFacade each opened a ZMQ connection to the monitoring router. These connections are not threadsafe, but (especially in the DFK case) no reasoning or checking stops the DFK's connection being used in multiple threads. Before this PR, the MonitoringHub presented a 'send' method and stored the DFK's ZMQ connection in self._dfk_channel, and each PolledExecutorFacade contained a copy of the ZMQ channel code to open its own channel, configured using parameters from the DFK passed in at construction. This PR: * moves the above uses to use the MonitoringRadio interface (which was originally designed for remote workers to send monitoring information, but seems ok here too) * has MonitoringHub construct an appropriate MonitoringRadio instance for use on the submit side, exposed as self.radio; * replaces the implementation of send with a new MultiprocessingQueueRadio which is thread safe but only works in the same multiprocessing environment as the launched monitoring database manager process (which is true on the submit side, but for example means this radio cannot be used on most remote workers) This work aligns with the prototype in #3315 (which pushes on monitoring radio configuration for remote workers) and pushes in the direction (without getting there) of allowing other submit-side hooks. This work removes some monitoring specific code from the JobStatusPoller, replacing it with a dependency injection style. This is part of work to move JobStatusPoller facade state into other classes, as part of job handling rearrangements in PR #3293 This PR will change how monitoring messages are delivered from the submitting process, and the most obvious thing I can think of that will change is how this will behave under load: heavily loaded messaging causing full buffers and other heavy-load symptoms will now behave as multiprocessing Queues do, rather than as ZMQ connections do. I have not attempted to characterise either of those modes.
khk-globus
reviewed
Apr 11, 2024
Comment on lines
+186
to
+187
def __init__(self, queue: Queue) -> None: | ||
self.queue = queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion to leave the namespace on the type declaration. Would have saved this reviewer at least one (more) look at the top of the file (and later on when rereading this in N months) to confirm that this is, in fact, an MP queue. No biggie, but a suggestion.
khk-globus
approved these changes
Apr 11, 2024
rjmello
added a commit
to globus/globus-compute
that referenced
this pull request
Jun 27, 2024
As of Parsl PR Parsl/parsl#3338, the `dfk` argument for `JobStatusPoller` initialization is no longer supported.
rjmello
added a commit
to globus/globus-compute
that referenced
this pull request
Jun 28, 2024
As of Parsl PR Parsl/parsl#3338, the `dfk` argument for `JobStatusPoller` initialization is no longer supported.
rjmello
added a commit
to globus/globus-compute
that referenced
this pull request
Jun 28, 2024
As of Parsl PR Parsl/parsl#3338, the `dfk` argument for `JobStatusPoller` initialization is no longer supported.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Before this PR, the DataFlowKernel and each JobStatusPoller PolledExecutorFacade each opened a ZMQ connection to the monitoring router. These connections are not threadsafe, but (especially in the DFK case) no reasoning or checking stops the DFK's connection being used in multiple threads.
Before this PR, the MonitoringHub presented a 'send' method and stored the DFK's ZMQ connection in self._dfk_channel, and each PolledExecutorFacade contained a copy of the ZMQ channel code to open its own channel, configured using parameters from the DFK passed in at construction.
This PR:
moves the above uses to use the MonitoringRadio interface (which was originally designed for remote workers to send monitoring information, but seems ok here too)
has MonitoringHub construct an appropriate MonitoringRadio instance for use on the submit side, exposed as self.radio;
replaces the implementation of send with a new MultiprocessingQueueRadio which is thread safe but only works in the same multiprocessing environment as the launched monitoring database manager process (which is true on the submit side, but for example means this radio cannot be used on most remote workers)
This work aligns with the prototype in #3315 (which pushes on monitoring radio configuration for remote workers) and pushes in the direction (without getting there) of allowing other submit-side hooks.
This work removes some monitoring specific code from the JobStatusPoller, replacing it with a dependency injection style.
This is part of work to move JobStatusPoller facade state into other classes, as part of job handling rearrangements in PR #3293
This PR will change how monitoring messages are delivered from the submitting process, and the most obvious thing I can think of that will change is how this will behave under load: heavily loaded messaging causing full buffers and other heavy-load symptoms will now behave as multiprocessing Queues do, rather than as ZMQ connections do. I have not attempted to characterise either of those modes.
Description
Please include a summary of the change and (optionally) which issue is fixed. Please also include
relevant motivation and context.
Changed Behaviour
Which existing user workflows or functionality will behave differently after this PR?
Fixes
Fixes # (issue)
Type of change
Choose which options apply, and delete the ones which do not apply.