-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle init_blocks in scaling strategy, rather than special-casing it #3283
Conversation
This is part of issue #3278 tidying up job and block management. Now init_blocks scale out happens on the first strategy poll, not at executor start - that will often delay init_blocks scaling by one strategy poll period compared to before this PR.
f86d403
to
8127310
Compare
failing a precondition in test_drain for exactly the "Changed Behaviour" reason - now that test will be reliant on strategy poll period so it can probably be made to use a smaller strategy period |
This will now scale in blocks using the job status poller scale in code, which means the DFK does not need to send its own BLOCK_INFO monitoring messages. minimalish change of which blocks get scaled in at shutdown - to come from the jobstatuspoller list: that will get pending blocks scaled in at shutdown, I think, but will now push the dynamically updated list to the cac hed-side of the cache poll... what does that change? we will now be delayed in s eeing ended jobs, but the executor.status data is already out of date in that se nse the moment the call returns (but *less* out of date) this patch is deliberately minimalist in that it does not attempt to move the scale down code - this is a PR about changing behaviour, not about rewriting the scale down strategy more seriously. the behaviour change is to move towards treating the jobstatuspoller pollitem status as the source of best-estimated truth. other work should probably do that moving, to complement the recent init_blocks handling PR #3283
This will now scale in blocks using the job status poller scale in code, which means the DFK does not need to send its own BLOCK_INFO monitoring messages. minimalish change of which blocks get scaled in at shutdown - to come from the jobstatuspoller list: that will get pending blocks scaled in at shutdown, I think, but will now push the dynamically updated list to the cac hed-side of the cache poll... what does that change? we will now be delayed in s eeing ended jobs, but the executor.status data is already out of date in that se nse the moment the call returns (but *less* out of date) this patch is deliberately minimalist in that it does not attempt to move the scale down code - this is a PR about changing behaviour, not about rewriting the scale down strategy more seriously. the behaviour change is to move towards treating the jobstatuspoller pollitem status as the source of best-estimated truth. other work should probably do that moving, to complement the recent init_blocks handling PR #3283
This will now scale in blocks using the job status poller scale in code, which means the DFK does not need to send its own BLOCK_INFO monitoring messages. minimalish change of which blocks get scaled in at shutdown - to come from the jobstatuspoller list: that will get pending blocks scaled in at shutdown, I think, but will now push the dynamically updated list to the cac hed-side of the cache poll... what does that change? we will now be delayed in s eeing ended jobs, but the executor.status data is already out of date in that se nse the moment the call returns (but *less* out of date) this patch is deliberately minimalist in that it does not attempt to move the scale down code - this is a PR about changing behaviour, not about rewriting the scale down strategy more seriously. the behaviour change is to move towards treating the jobstatuspoller pollitem status as the source of best-estimated truth. other work should probably do that moving, to complement the recent init_blocks handling PR #3283
This will now scale in blocks using the job status poller scale in code, which means the DFK does not need to send its own BLOCK_INFO monitoring messages. minimalish change of which blocks get scaled in at shutdown - to come from the jobstatuspoller list: that will get pending blocks scaled in at shutdown, I think, but will now push the dynamically updated list to the cac hed-side of the cache poll... what does that change? we will now be delayed in s eeing ended jobs, but the executor.status data is already out of date in that se nse the moment the call returns (but *less* out of date) this patch is deliberately minimalist in that it does not attempt to move the scale down code - this is a PR about changing behaviour, not about rewriting the scale down strategy more seriously. the behaviour change is to move towards treating the jobstatuspoller pollitem status as the source of best-estimated truth. other work should probably do that moving, to complement the recent init_blocks handling PR #3283
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, with a couple of minor comments. A good addition to this would be a test verifying use of the .first
branch.
From an end functionality perspective, that's checking that init_blocks actually gets used to start blocks, rather than a later part of the strategy code: with strategy With the other two strategies that's less observable from the outside, I think: |
As of Parsl PR Parsl/parsl#3283, the initial HTEX block scale out occurs on the first strategy poll, not at HTEX start. Thus, our tests should use a small strategy period to speed them up and avoid timeouts.
As of Parsl PR Parsl/parsl#3283, the initial HTEX block scale out occurs on the first strategy poll, not at HTEX start. Thus, our tests should use a small strategy period to speed them up and avoid timeouts.
As of Parsl PR Parsl/parsl#3283, the initial HTEX block scale out occurs on the first strategy poll, not at HTEX start. Thus, our tests should use a small strategy period to speed them up and avoid timeouts.
This is part of issue #3278 tidying up job and block management.
Changed Behaviour
Now init_blocks scale out happens on the first strategy poll, not at executor start - that will often delay init_blocks scaling by one strategy poll period compared to before this PR.
Type of change