Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle init_blocks in scaling strategy, rather than special-casing it #3283

Merged
merged 4 commits into from
Mar 27, 2024

Conversation

benclifford
Copy link
Collaborator

This is part of issue #3278 tidying up job and block management.

Changed Behaviour

Now init_blocks scale out happens on the first strategy poll, not at executor start - that will often delay init_blocks scaling by one strategy poll period compared to before this PR.

Type of change

  • Code maintenance/cleanup

This is part of issue #3278 tidying up job and block management.

Now init_blocks scale out happens on the first strategy poll, not at executor
start - that will often delay init_blocks scaling by one strategy poll period
compared to before this PR.
@benclifford benclifford force-pushed the benc-scaling-init-in-strategy branch from f86d403 to 8127310 Compare March 22, 2024 16:43
@benclifford
Copy link
Collaborator Author

failing a precondition in test_drain for exactly the "Changed Behaviour" reason - now that test will be reliant on strategy poll period so it can probably be made to use a smaller strategy period

benclifford added a commit that referenced this pull request Mar 25, 2024
This will now scale in blocks using the job status poller scale in code,
which means the DFK does not need to send its own BLOCK_INFO monitoring
messages.



minimalish change of which blocks get scaled in at shutdown - to come from the jobstatuspoller list: that will get pending blocks scaled in at shutdown, I think, but will now push the dynamically updated list to the cac hed-side of the cache poll... what does that change? we will now be delayed in s eeing ended jobs, but the executor.status data is already out of date in that se nse the moment the call returns (but *less* out of date)

this patch is deliberately minimalist in that it does not attempt to move the scale down code - this is a PR about changing behaviour, not about rewriting the scale down strategy more seriously. the behaviour change is to move towards treating the jobstatuspoller pollitem status as the source of best-estimated truth.

other work should probably do that moving, to complement the recent init_blocks handling PR #3283
benclifford added a commit that referenced this pull request Mar 26, 2024
This will now scale in blocks using the job status poller scale in code,
which means the DFK does not need to send its own BLOCK_INFO monitoring
messages.



minimalish change of which blocks get scaled in at shutdown - to come from the jobstatuspoller list: that will get pending blocks scaled in at shutdown, I think, but will now push the dynamically updated list to the cac hed-side of the cache poll... what does that change? we will now be delayed in s eeing ended jobs, but the executor.status data is already out of date in that se nse the moment the call returns (but *less* out of date)

this patch is deliberately minimalist in that it does not attempt to move the scale down code - this is a PR about changing behaviour, not about rewriting the scale down strategy more seriously. the behaviour change is to move towards treating the jobstatuspoller pollitem status as the source of best-estimated truth.

other work should probably do that moving, to complement the recent init_blocks handling PR #3283
benclifford added a commit that referenced this pull request Mar 26, 2024
This will now scale in blocks using the job status poller scale in code,
which means the DFK does not need to send its own BLOCK_INFO monitoring
messages.



minimalish change of which blocks get scaled in at shutdown - to come from the jobstatuspoller list: that will get pending blocks scaled in at shutdown, I think, but will now push the dynamically updated list to the cac hed-side of the cache poll... what does that change? we will now be delayed in s eeing ended jobs, but the executor.status data is already out of date in that se nse the moment the call returns (but *less* out of date)

this patch is deliberately minimalist in that it does not attempt to move the scale down code - this is a PR about changing behaviour, not about rewriting the scale down strategy more seriously. the behaviour change is to move towards treating the jobstatuspoller pollitem status as the source of best-estimated truth.

other work should probably do that moving, to complement the recent init_blocks handling PR #3283
benclifford added a commit that referenced this pull request Mar 27, 2024
This will now scale in blocks using the job status poller scale in code,
which means the DFK does not need to send its own BLOCK_INFO monitoring
messages.



minimalish change of which blocks get scaled in at shutdown - to come from the jobstatuspoller list: that will get pending blocks scaled in at shutdown, I think, but will now push the dynamically updated list to the cac hed-side of the cache poll... what does that change? we will now be delayed in s eeing ended jobs, but the executor.status data is already out of date in that se nse the moment the call returns (but *less* out of date)

this patch is deliberately minimalist in that it does not attempt to move the scale down code - this is a PR about changing behaviour, not about rewriting the scale down strategy more seriously. the behaviour change is to move towards treating the jobstatuspoller pollitem status as the source of best-estimated truth.

other work should probably do that moving, to complement the recent init_blocks handling PR #3283
@benclifford benclifford marked this pull request as ready for review March 27, 2024 13:29
Copy link
Collaborator

@khk-globus khk-globus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, with a couple of minor comments. A good addition to this would be a test verifying use of the .first branch.

parsl/jobs/strategy.py Show resolved Hide resolved
parsl/jobs/strategy.py Show resolved Hide resolved
@benclifford benclifford merged commit cb34643 into master Mar 27, 2024
6 checks passed
@benclifford benclifford deleted the benc-scaling-init-in-strategy branch March 27, 2024 14:02
@benclifford
Copy link
Collaborator Author

A good addition to this would be a test verifying use of the .first branch

From an end functionality perspective, that's checking that init_blocks actually gets used to start blocks, rather than a later part of the strategy code: with strategy none that's pretty straightforward (and it's what happens by chance in parsl/tests/test_monitoring/test_htex_init_blocks_vs_monitoring.py already).

With the other two strategies that's less observable from the outside, I think: init_blocks vs min_blocks doesn't necessarily even make sense because of that difficulty in observing the difference...

rjmello added a commit to globus/globus-compute that referenced this pull request Jun 27, 2024
As of Parsl PR Parsl/parsl#3283, the initial
HTEX block scale out occurs on the first strategy poll, not at HTEX
start. Thus, our tests should use a small strategy period to speed them
up and avoid timeouts.
rjmello added a commit to globus/globus-compute that referenced this pull request Jun 28, 2024
As of Parsl PR Parsl/parsl#3283, the initial
HTEX block scale out occurs on the first strategy poll, not at HTEX
start. Thus, our tests should use a small strategy period to speed them
up and avoid timeouts.
rjmello added a commit to globus/globus-compute that referenced this pull request Jun 28, 2024
As of Parsl PR Parsl/parsl#3283, the initial
HTEX block scale out occurs on the first strategy poll, not at HTEX
start. Thus, our tests should use a small strategy period to speed them
up and avoid timeouts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants