-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC21: add new PRIORITY state #271
Conversation
Oh man, no idea what this means. Readthedocs is working fine on my laptop (
|
1f1d147
to
3dab6b4
Compare
Looks like we smacked into a pip dependency resolution bug: pypa/pip#9031 I don't quite grok their proposed solution of removing Sphinx as a requirement. I'm guess they mean Sphinx is already included as an intermediate/implicit dependency thanks to sphinx-rtd-theme and/or sphinxcontrib-spelling. So maybe dropping sphinx from requirements.txt will work for us 🤷 . Unfortunately, I don't see a way to have RTD not use the 2020 resolver 🤦 |
@garlick : if you run |
Thanks for that hint. It didn't reproduce but for some reason some extraneous warnings went away and I now noticed that one of the JSON examples is called out as invalid JSON and could not be syntax highlighted. I tacked on a commit to fix that. |
Darn. The only thing that I can think to try currently is to drop the explicit |
Ah thanks. Well that seems to have worked. Are we OK with that change getting merged or does this need more thought? On re-reviewing this proposed change, one other thing I should have pointed out for discussion is I defined the priority as a Would it be better to go with an integer value here? how about the range? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just caught one typo on the first pass through.
spec_21.rst
Outdated
Prioritize Event | ||
^^^^^^^^^^^^^^^^ | ||
|
||
Job has been (re-)assigned a priority by the job shell priority plugin: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Job has been (re-)assigned a priority by the job shell priority plugin: | |
Job has been (re-)assigned a priority by the job-manager priority plugin: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix that.
Also, based on our discussions offline about the confusion of the PRIORITY state vs priority
event that would affect it indirectly, not directly, I think I'll propose changing the priority
event to admin-priority
and prioritize
to priority
.
Problem: sphinx build fails with the following assertion Successfully built pyyaml pyrsistent ERROR: Exception: Traceback (most recent call last): File "/home/docs/checkouts/readthedocs.org/user_builds/flux-rfc/envs/271/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 228, in _main status = self.run(options, args) File "/home/docs/checkouts/readthedocs.org/user_builds/flux-rfc/envs/271/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper return func(self, options, args) File "/home/docs/checkouts/readthedocs.org/user_builds/flux-rfc/envs/271/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 379, in run requirement_set File "/home/docs/checkouts/readthedocs.org/user_builds/flux-rfc/envs/271/lib/python3.7/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 191, in get_installation_order weights = get_topological_weights(graph) File "/home/docs/checkouts/readthedocs.org/user_builds/flux-rfc/envs/271/lib/python3.7/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 242, in get_topological_weights assert len(weights) == len(graph) AssertionErrorr This is apparently a pip dependency error that some users have worked around in the case of sphinx by removing the explicit requirement. Let's give that a try.
Problem: sphinx cannot parse one of the JSON examples: spec_21.rst:245: WARNING: Could not lex literal_block as "json". Highlighting skipped. Add a missing double quote.
Problem: the TBD entries under state synchronization are not useful. Also there's a missing period following a footnote reference. Drop the TBD entries and add the missing period.
Problem: a PRORITY state is being added, and the name of the event that changes the administrative priority is called "priority" which will lead to confusion. Rename the priority event to admin-priority. A new priority event will be defined that affects the overall priority in a subsequent commit.
f9045dc
to
f8650e6
Compare
If we were to model our multi-factor priority calculation in a similar manner to Slurm, I believe they define their job priority value as an |
@@ -195,6 +195,20 @@ Example: | |||
{"timestamp":1552593348.073045,"name":"submit","context":{"priority":16,"userid":5588,"flags":0}} | |||
|
|||
|
|||
Depend Event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit message typo "mising" -> "missing"
my first thought is that when priority plugin developers are debugging, it may be visually easier to see integers vs. floats. Like I may not want to see "0.83738937" in logs all over the place. |
Thanks @cmoussa1! An integer with range of 0 to 4294967295 sounds completely reasonable to me, and presumably it will make comparing slurm and flux priority algorithms slightly easier if we don't introduce arbitrary changes here. |
6ef85ce
to
3e5c61c
Compare
Forced a push with the priority redefined as an integer from 0 - 2^32-1, and fixed the spelling error @chu11 pointed out in my commit message. I also reworded that commit message, and added one to the main commit that adds the new state. |
an aside, should we update all language in flux-core (and sched?) about priority to "administrative priority"? e.g. On the one hand, consistency. On the other hand, presumably at the user/admin level, "administrative priority" is the only priority they know, so it shouldn't matter. |
This is the way I was leaning. Where it might get confusing is if we want to let users see what their current (caculated) priority value is. What do we call that? |
queue priority? |
Ah sure, that makes sense. So for example if we want flux job info to be able to list it in a custom formatted output, |
The PRIORITY state should also be added to the list for the ACTIVE virtual state |
Add a new PRIORITY state, between DEPEND and SCHED. A job will remain in PRIORITY state while its priority is being calculated by a job manager priority plugin. In many cases this may be instantaneous, but we leave open the possibility that a plugin may need to contact another service to obtain user data before assigning a priority. If this service is slow or unresponsive, the amount of time spent in PRIORITY state will provide some visibility into what is going on. A 'priority' event is posted when the priority is assigned, and later when the priority is updated. This event is not posted to the job eventlog to avoid the extra noise, but would appear in the job manager journal so that external services like job-info can reorder their queue.
Problem: A depend event that causes the job to transition out of DEPEND state is discussed, and is present in the state diagram, but is not defined. Define the depend event, with empty context matching the current implementation.
3e5c61c
to
66b93d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks!
Thanks, I'll set MWP |
As discussed in flux-framework/flux-core#3311, this PR adds a new PRIORITY job state that a job passes through when it is assigned its initial priority by a job manager priority plugin.
This should make integration of a priority plugin a bit easier, and also provide some user visibility in case the plugin takes a long time to assign the initial priority (for example if it has to fetch data from flux-accounting before it can answer).
Events normally drive job state transitions. I added a
prioritize
event for this purpose, but footnoted it with the caveat that, unlike the other primary events, it will not be posted to the job eventlog. This is because we surely do not want to log the repeated priority adjustments of a fair share priority plugin for posterity, and a fresh priority can easily be recalculated if the instance restarts with jobs in SCHED state.That does mean that a job can regress from SCHED to PRIORITY state on an instance restart. It does not mean that the job repeatedly does that as the priority is updated, since the job may have an outstanding alloc request to the scheduler that we do not want to cancel every time the priority is updated. (Updating the scheduler's priority info will be the subject of another PR to RFC 27)