Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: t2800-jobs-cmd.t - exited with status 137 (terminated by signal 9?) #2764

Closed
chu11 opened this issue Feb 22, 2020 · 5 comments
Closed

Comments

@chu11
Copy link
Member

chu11 commented Feb 22, 2020

I've been seeing this a lot of late when running make -j16 check.

ERROR: t2800-jobs-cmd.t - exited with status 137 (terminated by signal 9?)

My assumption is after #2744, rc3-job is just taking a bit too long. This was removed in that PR:

test_expect_success 'cleanup job listing jobs ' '
        for jobid in `cat job_ids_pending.out`; do \
            flux job cancel $jobid; \
            flux job wait-event $jobid clean; \
        done &&
        for jobid in `cat job_ids_running.out`; do \
            flux job cancel $jobid; \
            flux job wait-event $jobid clean; \
        done
'

I think the default grace timeout is used, which would be 1 second, so I can totally see flux job cancelall taking a bit too long.

I'll add a little something to test_under_flux to see if that helps.

@chu11 chu11 self-assigned this Feb 22, 2020
@grondo
Copy link
Contributor

grondo commented Feb 22, 2020

Any reason not to just use flux job cancelall here?

@chu11
Copy link
Member Author

chu11 commented Feb 22, 2020

You mean re-add the above, but just use flux job cancelall? I was initially thinking of just upping the grace timeout, but adding a flux job cancelall would probably work.

Any preference? Each seems equally simple to add.

chu11 added a commit to chu11/flux-core that referenced this issue Feb 22, 2020
Increase shutdown grace time when running tests under the flux
"job" personality.  Cleanup of jobs can sometimes take longer than
the default grace timeout.

Fixes flux-framework#2764
@chu11
Copy link
Member Author

chu11 commented Feb 22, 2020

actually, pausing on a fix given #2733

@grondo
Copy link
Contributor

grondo commented Feb 22, 2020

Ah, sorry my comment above was out of context. I had forgotten that code was already removed.

@chu11 chu11 removed their assignment Feb 22, 2020
@chu11
Copy link
Member Author

chu11 commented Feb 22, 2020

With #2733 merged didn't see this issue after 10 runs, i'm going to assume it's fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants