Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job-info: filter job list inactive results by name #2818

Merged
merged 4 commits into from
Mar 14, 2020

Conversation

chu11
Copy link
Member

@chu11 chu11 commented Mar 10, 2020

I just thought I'd throw this up for comments before I write tests. (For Issue #2817)

  • it's clearly just a hack for the need mentioned in Flux tree helper flux-sched#621

  • There's nothing wrong with it per se, but am concerned with the long term. What if we need to filter on more than this. Perhaps there should be something more extendable for filtering, for this and normal job list? Hypothetically something like filter="name:hostname,userid:13543"?

  • But the above would take time. Do we consider this an acceptable short term solution?

@SteVwonder
Copy link
Member

Thanks for putting this together @chu11! Generally LGTM!

but am concerned with the long term.

Just an idea, I don't know how good it is: if you want to reduce the probability of people using the name parameter and then removing it out from under them when a more general solution comes along, maybe we just add the parameter to the service and expose it via Python but not C? AFAIK, we currently only have a use-case in python, and the python function declaration is a little more flexible than C so we can workaround leaving the optional "name" argument in the function declaration for a bit.

@grondo
Copy link
Contributor

grondo commented Mar 11, 2020

Good point @SteVwonder. I don't think we need to make a C API binding for every single service RPC anyway. Like you said, the C functions are inflexible and it is annoying to have to break ABI when we change or add some parameters in the RPC.
It is also tedious to write tests for the C api wrappers for RPCs, as well as adding all the error checks. For both Python and C we have the very nice flux_rpc() and f.RPC fucntions that we can make use of for utilities and tests of the service interfaces.

Especially while we're focusing on functionality to meet big milestones, we should evaluate if we really need a C wrapper for every service RPC that we add.

@chu11
Copy link
Member Author

chu11 commented Mar 11, 2020

Sounds good! Lets go with just the Python binding. Will update w/ tests.

@chu11 chu11 force-pushed the issue2817 branch 2 times, most recently from c1b7e4b to 945bb3d Compare March 11, 2020 20:23
@chu11 chu11 changed the title [WIP] job-info: filter job list inactive results by name job-info: filter job list inactive results by name Mar 11, 2020
@chu11
Copy link
Member Author

chu11 commented Mar 11, 2020

re-pushed, only adding the python binding, and a simple test. Also cleaned up a few tests along the way.

@chu11 chu11 force-pushed the issue2817 branch 4 times, most recently from 2f2dd7f to 636af61 Compare March 11, 2020 22:25
@chu11
Copy link
Member Author

chu11 commented Mar 12, 2020

Hmmm, asan builder hung ... noticed

FAIL: t0005-exec.t 32 - stdin redirect from /dev/null works without -n

              t0005-exec.t:  FAIL: N=35  PASS=32  FAIL=1 SKIP=2 XPASS=0 XFAIL=0

ERROR: t0005-exec.t - exited with status 1

Dunno if related to the hang or not.

test_expect_success 'stdin redirect from /dev/null works without -n' '
        test_expect_code 0 run_timeout 1 flux exec -r0-3 cat
'

test_expect_success 'stdin redirect from /dev/null works with -n' '
       test_expect_code 0 run_timeout 1 flux exec -n -r0-3 cat
'

Perhaps timeouts of 1 second are too short. I'll increase them a bit.

@chu11
Copy link
Member Author

chu11 commented Mar 12, 2020

re-pushed, increasing all the timeouts in t0005-exec a bit (from 1-3 to 5).

@SteVwonder
Copy link
Member

Just to clarify the semantics of name. Does the job name default to the first element of the command list unless explicitly set by the user (presumably somewhere in the jobspec)? If so, that sounds perfect for the flux-tree-helper use-case. If not, can you elaborate on the semantics?

@chu11
Copy link
Member Author

chu11 commented Mar 12, 2020

Does the job name default to the first element of the command list unless explicitly set by the user (presumably somewhere in the jobspec)?

Yes. If the user doesn't specify a job name, then it is the first element of the command line.

But only the basename of the command. Is that ok?

@chu11
Copy link
Member Author

chu11 commented Mar 12, 2020

rebased and re-pushed, now that #2828 is merged

@SteVwonder
Copy link
Member

Is that ok?

Yep! That's perfect. Just wanted to make sure that if you set the name manually, you could then filter based on that.

@SteVwonder
Copy link
Member

I went ahead and rebased #2804 on top of this and then included the new name parameter in flux-tree-helper's inactive listing function (from flux-framework/flux-sched#621). Works like a charm! Thanks @chu11!

@chu11
Copy link
Member Author

chu11 commented Mar 13, 2020

rebased, fixed conflict with #2805, and re-pushed

Copy link
Member

@SteVwonder SteVwonder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Minor nit below:

@chu11
Copy link
Member Author

chu11 commented Mar 14, 2020

re-pushed, fixing Stephens nit

@chu11
Copy link
Member Author

chu11 commented Mar 14, 2020

re-pushed, fixing python checker's nit

Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

chu11 added 4 commits March 14, 2020 05:44
Remove job submissions that are unnecessary.  Add helper
comment.
Support job name filter on job-info.list-inactive service.

Fixes flux-framework#2817
Add optional name to job_list_inactive() to filter results
based on job name.
@codecov-io
Copy link

Codecov Report

Merging #2818 into master will decrease coverage by 0.01%.
The diff coverage is 70%.

@@            Coverage Diff             @@
##           master    #2818      +/-   ##
==========================================
- Coverage      81%   80.98%   -0.02%     
==========================================
  Files         250      250              
  Lines       39511    39513       +2     
==========================================
- Hits        32004    32000       -4     
- Misses       7507     7513       +6
Impacted Files Coverage Δ
src/modules/job-info/list.c 66.83% <70%> (+0.34%) ⬆️
src/modules/job-info/guest_watch.c 76.14% <0%> (-0.58%) ⬇️
src/common/libutil/veb.c 98.42% <0%> (-0.53%) ⬇️
src/common/libsubprocess/local.c 79.59% <0%> (-0.35%) ⬇️
src/cmd/flux-job.c 85.98% <0%> (-0.27%) ⬇️
src/broker/broker.c 74.08% <0%> (-0.11%) ⬇️
src/common/libflux/message.c 82.24% <0%> (+0.13%) ⬆️
src/common/librouter/usock.c 89.17% <0%> (+0.37%) ⬆️

@SteVwonder
Copy link
Member

Restarted a timedout builder. Unfortunately, the logs weren't that useful, but I saved a copy in case someone wants to take a look: https://gist.github.com/SteVwonder/643ed4b068c6c1017e318bf9a0356986

@chu11
Copy link
Member Author

chu11 commented Mar 14, 2020

restarted hung builder, which was "Ubuntu: py3.8 distcheck" one. Same as the one I restarted on #2838, and I think the same one @SteVwonder restarted. hopefully is not a pattern emerging.

@mergify mergify bot merged commit fa28453 into flux-framework:master Mar 14, 2020
@grondo
Copy link
Contributor

grondo commented Mar 14, 2020

Yeah, I'm noticing a lot more hangs in the past 24 hrs. Trying to see if I can reproduce locally or on @trws timeout branch.

@chu11 chu11 deleted the issue2817 branch June 5, 2021 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants