Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: job-manager: add job validation #3192

Closed
wants to merge 2 commits into from

Conversation

garlick
Copy link
Member

@garlick garlick commented Sep 3, 2020

Here's a first cut at a job-manager.validate RPC as discussed in #3133

This implementation handles inactive jobs by replaying a job's KVS eventlog if it's already been purged from the job manager's active job hash. Thus the response can definitively identify an invalid job ID.

The basic job info that job-manager keeps is returned (t_submit, priority, userid, flags, state). The request is open to guests (hopefully that is OK?)

I also added a flux job validate ID command. It doesn't display any of the returned info but probably it should at least have an option to do that.

Thought I'd park it as a WIP to get some feedback before proceeding with tests.

Add job-manager.validate RPC that can look up a job ID and
return the basic info that the job manager keeps for it:
  userid - job owner
  t_submit - submit timestamp
  priority - job priority
  state - job state
  flags - submit flags

Inactive jobs are handled too.  If the job is not found in the
job manager's active job hash, its eventlog is fetched from the
KVS and replayed.

If the KVS lookup fails with ENOENT, the RPC returns with ENOENT
and an "invalid job ID" human-readable message.  This response is
definitive since the submit RPC does not return a job ID to the
user until the job manager is aware of it.

This RPC is currently open to guests.

Fixes flux-framework#3191
Add 'flux job validate ID'.

This command just indicates whether the ID is valid through its exit code.
If there is a failure, a message is displayed on stderr.
@codecov-commenter
Copy link

Codecov Report

Merging #3192 into master will decrease coverage by 0.12%.
The diff coverage is 18.82%.

@@            Coverage Diff             @@
##           master    #3192      +/-   ##
==========================================
- Coverage   81.24%   81.12%   -0.13%     
==========================================
  Files         295      296       +1     
  Lines       44456    44541      +85     
==========================================
+ Hits        36120    36133      +13     
- Misses       8336     8408      +72     
Impacted Files Coverage Δ
src/cmd/flux-job.c 84.27% <0.00%> (-1.03%) ⬇️
src/modules/job-manager/validate.c 21.21% <21.21%> (ø)
src/modules/job-manager/job-manager.c 54.05% <50.00%> (-0.24%) ⬇️
src/broker/publisher.c 79.59% <0.00%> (-2.05%) ⬇️
src/modules/job-info/list.c 68.31% <0.00%> (-1.65%) ⬇️
src/modules/job-info/guest_watch.c 75.00% <0.00%> (-0.58%) ⬇️
src/broker/module.c 74.94% <0.00%> (-0.46%) ⬇️
src/modules/job-archive/job-archive.c 59.68% <0.00%> (+0.79%) ⬆️
src/modules/job-info/watch.c 72.53% <0.00%> (+1.55%) ⬆️
... and 1 more

@chu11
Copy link
Member

chu11 commented Sep 4, 2020

at a basic level looks good.

I tried to see if it could be seamlessly dropped into job-info, and the answer was I think so, and clean up some code

it should speed up the code on average, by reducing the number of eventlog lookups on active jobs. There would be cases that are slower b/c we'd be doing lookups of the eventlog twice (i.e. user lookups eventlog of inactive job, jobid ccheck does an eventlog lookup too).

@garlick garlick closed this Sep 9, 2020
@garlick garlick deleted the job_validate branch September 9, 2020 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants