Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make result queue poll for shutdown, and tidy up at shutdown #3709

Merged
merged 1 commit into from
Dec 2, 2024

Conversation

benclifford
Copy link
Collaborator

This poll happens at the configured htex poll period, which defaults to 10ms.

Under heavy result load, this shoudn't result in much additional load: the poll loop will already be looping a lot to process the results.

Under lower result load, there is a slight observable increase in CPU usage: a 30second sleep task shows this before this PR:

before:
real 0m37.451s
user 0m2.160s
sys 0m0.376s

run 2, user 2.160s
run 3, user 2.116s

and this after this PR:

real 0m37.473s
user 0m2.400s
sys 0m0.557s

Run 2, 2.457s
Run 3, 2.452s

At shutdown, the ZMQ socket for incoming results is closed.

This reduces both the number of threads and number of file descriptors left behind by the --config local tests. For example:

$ pytest parsl/tests/test_monitoring/ --config local

Before this PR, at end of test: 32 threads, 451 fds open.

After this PR, at end of test: 1 thread, 48 fds open.

This is part of PR #3397 shutdown tidyup.

Description

Please include a summary of the change and (optionally) which issue is fixed. Please also include
relevant motivation and context.

Changed Behaviour

nothing should be really visible to normal users. Increased CPU usage in the above documented situations.

Type of change

  • New feature
  • Code maintenance/cleanup

This poll happens at the configured htex poll period, which defaults to
10ms.

Under heavy result load, this shoudn't result in much additional load:
the poll loop will already be looping a lot to process the results.

Under lower result load, there is a slight observable increase in CPU
usage: a 30second sleep task shows this before this PR:

before:
real    0m37.451s
user    0m2.160s
sys     0m0.376s

run 2, user 2.160s
run 3, user 2.116s

and this after this PR:

real    0m37.473s
user    0m2.400s
sys     0m0.557s

Run 2, 2.457s
Run 3, 2.452s

At shutdown, the ZMQ socket for incoming results is closed.

This reduces both the number of threads and number of file descriptors
left behind by the `--config local` tests. For example:

$ pytest parsl/tests/test_monitoring/ --config local

Before this PR, at end of test: 32 threads, 451 fds open.

After this PR, at end of test: 1 thread, 48 fds open.

This is part of PR #3397 shutdown tidyup.
Copy link
Collaborator

@khk-globus khk-globus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the .close() and .join() additions. Good call.

Comment on lines -460 to +465
msgs = self.incoming_q.get()
msgs = self.incoming_q.get(timeout_ms=self.poll_period)
if msgs is None: # timeout
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good. This exact procedure has been on my mind for quite awhile, so thank you for this!

I'm thinking at some point we can go farther and simply block completely, but I haven't fully gotten through the shutdown logic for this route yet. But "on my mind," at some point.

@khk-globus khk-globus added this pull request to the merge queue Dec 2, 2024
Merged via the queue into master with commit 7622caa Dec 2, 2024
7 checks passed
@khk-globus khk-globus deleted the benc-shutdown-polling branch December 2, 2024 19:11
benclifford added a commit that referenced this pull request Jan 14, 2025
This was introduced in PR #2965 to help debug ZMQ hangs.

PR #3709 changed this loop to run every poll period (by default,
every 10ms) rather than every result message.

That turns this log line into a huge source of log noise/volume.

This PR removes that log line - ZMQ debugging must proceed without
it.

parsl.log size from:
pytest parsl/tests/ --config parsl/tests/configs/htex_local.py

before this PR: 7 megabytes
after this PR: 2 megabytes
github-merge-queue bot pushed a commit that referenced this pull request Jan 14, 2025
This was introduced in PR #2965 to help debug ZMQ hangs.

PR #3709 changed this loop to run every poll period (by default, every
10ms) rather than every result message.

That turns this log line into a huge source of log noise/volume.

This PR removes that log line - ZMQ debugging must proceed without it.

# Changed Behaviour

parsl.log size from:
pytest parsl/tests/ --config parsl/tests/configs/htex_local.py

before this PR: 7 megabytes
after this PR: 2 megabytes

## Type of change

- Bug fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants