-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc27: document job feasibility service #425
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some quick comments
spec_27.rst
Outdated
@@ -99,6 +99,8 @@ Design Criteria | |||
|
|||
- Allow the expiration time of a resource allocation to be adjusted. | |||
|
|||
- Allow out of band detection of unsatisfiable job requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be more to the point to say "Detect unsatisfiable job requests at submission time"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. How about "Allow the detection of unsatisfiable job requests at submission time" since the feasibility service and corresponding validation plugin are optional? If you'd prefer the shorter version and don't think it implies the detection of unsatisfiable jobs is required by the RFC, then I'm fine with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that works for me.
spec_27.rst
Outdated
|
||
A scheduler or other entity MAY register a generic ``feasibility`` | ||
service name through which the feasibility or satisfiability of jobs may | ||
be determined out of band, for example during job submission. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like my previous comment, maybe "may be determined during job submission"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Note that the reason I left this statement more open is that there is nothing preventing use of the feasibility service and validator plugin outside of job submission (e.g. via flux job-validator --jobspec-only
), so I was being a bit wary of specifying only at job submission in the RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever you think is best. My first thought was that it's clearer if we get directly to the primary intended purpose but you make a good point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just took your suggestions verbatim. I think shorter is better at this point, and if gets confusing in the future (which it probably won't) then it is easy to update the RFC.
spec_27.rst
Outdated
If the job could ever be satisfied, then success SHALL be indicated via | ||
a response with the following REQUIRED key: | ||
|
||
errnum | ||
(integer) An integer error number zero (0). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh weird, I'm guessing the existing protocol requires errnum
in the success response payload? I wonder why we did that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was copying the requirements for the validator workers. Shall we change it for the feasibilty.check
service? (Also if you have another suggestion for the name of the service, feel free to share it. I was just throwing something up to get this moving)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, looking at the feasibility
validator plugin, I don't think the errnum
is actually required. It just happens to be in the sched-simple
implementation for some reason.
I'll remove it here since it doesn't seem necessary, and open a PR to remove it in sched-simple's implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup sounds good.
Ok, I've pushed changes that I believe address @garlick's comments (also had to add a URL to linkcheck ignore). This could use another pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I did have one thought but I'm not sure how to resolve it. You mentioned that feasibility might run on more than one rank for scalability but since the feasibility client isn't discussed it's left open ended how that would get used. Would it make sense to just tack on an explanatory sentence that mentions that feasibility is normally called by the ingest validator which runs on all ranks and makes the RPC with FLUX_NODEID_ANY so it would work whether loaded on all ranks or just rank 0? Or maybe as a boxed note?
Or maybe not at all! It just seemed a little open ended to me. I leave it to you to decide.
I was wondering about that too. I wasn't sure if it was appropriate to go into that much detail in this RFC. I like your suggestion. |
Side note: in the case of the Fluxion feasibility service, there will be a non-trivial memory hit for running this service module, so I think the intent is not to run it on every rank, but possibly rank 0 and the login nodes? I wonder if the sentence "all ranks or just rank 0" should be modified to state something along the lines of
Or something? |
Works for me. |
Uhh, I took a stab at it, but not too happy with the result. Feel free to suggest editing. |
maybe instead
|
Problem: The feasibility service, through which the satisifiability of submitted jobs may be determined out of band, is not documented. Add a Feasibility section to RFC 27 describing an optional feasibility service.
Problem: A linked mpich commit gets "406 Client not acceptable for URL" during linkcheck. Add this URL to the ignore list.
Problem: On Ubuntu 22.04 "unsatisfiable" is not in the spelling dictionary. Add it to spell.en.pws.
Thanks, I took your suggestion verbatim (sorry, having trouble with my prose today) and force-pushed the result. |
I guess I'll set MWP here since this was approved. We can always update with future PRs. |
This is a first cut at documenting the
feasibilty
service proposed in #384.Fixes #384