-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make result queue poll for shutdown, and tidy up at shutdown #3709
Conversation
This poll happens at the configured htex poll period, which defaults to 10ms. Under heavy result load, this shoudn't result in much additional load: the poll loop will already be looping a lot to process the results. Under lower result load, there is a slight observable increase in CPU usage: a 30second sleep task shows this before this PR: before: real 0m37.451s user 0m2.160s sys 0m0.376s run 2, user 2.160s run 3, user 2.116s and this after this PR: real 0m37.473s user 0m2.400s sys 0m0.557s Run 2, 2.457s Run 3, 2.452s At shutdown, the ZMQ socket for incoming results is closed. This reduces both the number of threads and number of file descriptors left behind by the `--config local` tests. For example: $ pytest parsl/tests/test_monitoring/ --config local Before this PR, at end of test: 32 threads, 451 fds open. After this PR, at end of test: 1 thread, 48 fds open. This is part of PR #3397 shutdown tidyup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the .close()
and .join()
additions. Good call.
msgs = self.incoming_q.get() | ||
msgs = self.incoming_q.get(timeout_ms=self.poll_period) | ||
if msgs is None: # timeout | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good. This exact procedure has been on my mind for quite awhile, so thank you for this!
I'm thinking at some point we can go farther and simply block completely, but I haven't fully gotten through the shutdown logic for this route yet. But "on my mind," at some point.
This was introduced in PR #2965 to help debug ZMQ hangs. PR #3709 changed this loop to run every poll period (by default, every 10ms) rather than every result message. That turns this log line into a huge source of log noise/volume. This PR removes that log line - ZMQ debugging must proceed without it. parsl.log size from: pytest parsl/tests/ --config parsl/tests/configs/htex_local.py before this PR: 7 megabytes after this PR: 2 megabytes
This was introduced in PR #2965 to help debug ZMQ hangs. PR #3709 changed this loop to run every poll period (by default, every 10ms) rather than every result message. That turns this log line into a huge source of log noise/volume. This PR removes that log line - ZMQ debugging must proceed without it. # Changed Behaviour parsl.log size from: pytest parsl/tests/ --config parsl/tests/configs/htex_local.py before this PR: 7 megabytes after this PR: 2 megabytes ## Type of change - Bug fix
This poll happens at the configured htex poll period, which defaults to 10ms.
Under heavy result load, this shoudn't result in much additional load: the poll loop will already be looping a lot to process the results.
Under lower result load, there is a slight observable increase in CPU usage: a 30second sleep task shows this before this PR:
before:
real 0m37.451s
user 0m2.160s
sys 0m0.376s
run 2, user 2.160s
run 3, user 2.116s
and this after this PR:
real 0m37.473s
user 0m2.400s
sys 0m0.557s
Run 2, 2.457s
Run 3, 2.452s
At shutdown, the ZMQ socket for incoming results is closed.
This reduces both the number of threads and number of file descriptors left behind by the
--config local
tests. For example:$ pytest parsl/tests/test_monitoring/ --config local
Before this PR, at end of test: 32 threads, 451 fds open.
After this PR, at end of test: 1 thread, 48 fds open.
This is part of PR #3397 shutdown tidyup.
Description
Please include a summary of the change and (optionally) which issue is fixed. Please also include
relevant motivation and context.
Changed Behaviour
nothing should be really visible to normal users. Increased CPU usage in the above documented situations.
Type of change