-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cancel does not work on jobs in "runrequest" state #1438
Comments
`-f` doesn't kill it?
…________________________________
From: Tom Scogland <[email protected]>
Sent: Sunday, April 8, 2018 9:24:12 PM
To: flux-framework/flux-core
Cc: Subscribed
Subject: [flux-framework/flux-core] cancel does not work on jobs in "runrequest" state (#1438)
Normally I don't think this would be a big deal, but somehow jobs every once in a while get into runrequest then never make progress again. Having a way to clear out such jobs from sched and wreck would be useful for such cases.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#1438>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AA0nq3HMHe-w27Fwl4mBd27XqeKKHnjlks5tmuJsgaJpZM4TL62C>.
|
I’ll try that next time it comes up, but I don’t think so.
…On 9 Apr 2018, at 8:15, Dong H. Ahn wrote:
`-f` doesn't kill it?
________________________________
From: Tom Scogland ***@***.***>
Sent: Sunday, April 8, 2018 9:24:12 PM
To: flux-framework/flux-core
Cc: Subscribed
Subject: [flux-framework/flux-core] cancel does not work on jobs in
"runrequest" state (#1438)
Normally I don't think this would be a big deal, but somehow jobs
every once in a while get into runrequest then never make progress
again. Having a way to clear out such jobs from sched and wreck would
be useful for such cases.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on
GitHub<#1438>, or
mute the
thread<https://github.com/notifications/unsubscribe-auth/AA0nq3HMHe-w27Fwl4mBd27XqeKKHnjlks5tmuJsgaJpZM4TL62C>.
--
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
#1438 (comment)
|
In the current system "canceling" a job in the runrequest state is going to
be difficult if not impossible. It is a possible flaw in the engineering of
the way that jobs are launched, and we should design the "starting" process
better next time so this is possible. One problem with the current design
is that in `runrequest` state until we reach `running` I'm not sure that
wrexecd daemons are listening on their flux handles so they may not even
hear that a starting job should be canceled (likely they are blocked on a
barrier, so it is conceivable to fix this in the job shell replacement).
On Mon, Apr 9, 2018 at 8:20 AM, Tom Scogland <[email protected]>
wrote:
… I’ll try that next time it comes up, but I don’t think so.
On 9 Apr 2018, at 8:15, Dong H. Ahn wrote:
> `-f` doesn't kill it?
> ________________________________
> From: Tom Scogland ***@***.***>
> Sent: Sunday, April 8, 2018 9:24:12 PM
> To: flux-framework/flux-core
> Cc: Subscribed
> Subject: [flux-framework/flux-core] cancel does not work on jobs in
> "runrequest" state (#1438)
>
>
> Normally I don't think this would be a big deal, but somehow jobs
> every once in a while get into runrequest then never make progress
> again. Having a way to clear out such jobs from sched and wreck would
> be useful for such cases.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on
> GitHub<#1438>, or
> mute the
> thread<https://github.com/notifications/unsubscribe-auth/AA0nq3HMHe-
w27Fwl4mBd27XqeKKHnjlks5tmuJsgaJpZM4TL62C>.
>
>
> --
> You are receiving this because you authored the thread.
> Reply to this email directly or view it on GitHub:
> https://github.com/flux-framework/flux-core/issues/
1438#issuecomment-379788208
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1438 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAtSUuOWAEiFewU16VHzHtB_r3R5Rm1Uks5tm3xKgaJpZM4TL62C>
.
|
grondo
added a commit
to grondo/flux-core
that referenced
this issue
Feb 5, 2019
The wreck exec system is worthless, remove it along with associated commands, tests, and support code. Since libjsc doesn't work without wreck, it is removed as well. Fixes flux-framework#1984 Closes flux-framework#1947 Closes flux-framework#1618 Closes flux-framework#1595 Closes flux-framework#1593 Closes flux-framework#1468 Closes flux-framework#1438 Closes flux-framework#1419 Closes flux-framework#1410 Closes flux-framework#915 Closes flux-framework#894 Closes flux-framework#866 Closes flux-framework#833 Closes flux-framework#774 Closes flux-framework#772 Closes flux-framework#335 Closes flux-framework#249
grondo
added a commit
to grondo/flux-core
that referenced
this issue
Feb 5, 2019
The wreck exec system is worthless, remove it along with associated commands, tests, and support code. Since libjsc doesn't work without wreck, it is removed as well. Fixes flux-framework#1984 Closes flux-framework#1947 Closes flux-framework#1618 Closes flux-framework#1595 Closes flux-framework#1593 Closes flux-framework#1534 Closes flux-framework#1468 Closes flux-framework#1443 Closes flux-framework#1438 Closes flux-framework#1419 Closes flux-framework#1410 Closes flux-framework#1407 Closes flux-framework#1393 Closes flux-framework#915 Closes flux-framework#894 Closes flux-framework#866 Closes flux-framework#833 Closes flux-framework#774 Closes flux-framework#772 Closes flux-framework#335 Closes flux-framework#249
grondo
added a commit
to grondo/flux-core
that referenced
this issue
Feb 5, 2019
The wreck exec system is worthless, remove it along with associated commands, tests, and support code. Since libjsc doesn't work without wreck, it is removed as well. Fixes flux-framework#1984 Closes flux-framework#1947 Closes flux-framework#1618 Closes flux-framework#1595 Closes flux-framework#1593 Closes flux-framework#1534 Closes flux-framework#1468 Closes flux-framework#1443 Closes flux-framework#1438 Closes flux-framework#1419 Closes flux-framework#1410 Closes flux-framework#1407 Closes flux-framework#1393 Closes flux-framework#915 Closes flux-framework#894 Closes flux-framework#866 Closes flux-framework#833 Closes flux-framework#774 Closes flux-framework#772 Closes flux-framework#335 Closes flux-framework#249
grondo
added a commit
to grondo/flux-core
that referenced
this issue
Feb 9, 2019
The wreck exec system is worthless, remove it along with associated commands, tests, and support code. Since libjsc doesn't work without wreck, it is removed as well. Fixes flux-framework#1984 Closes flux-framework#1947 Closes flux-framework#1618 Closes flux-framework#1595 Closes flux-framework#1593 Closes flux-framework#1534 Closes flux-framework#1468 Closes flux-framework#1443 Closes flux-framework#1438 Closes flux-framework#1419 Closes flux-framework#1410 Closes flux-framework#1407 Closes flux-framework#1393 Closes flux-framework#915 Closes flux-framework#894 Closes flux-framework#866 Closes flux-framework#833 Closes flux-framework#774 Closes flux-framework#772 Closes flux-framework#335 Closes flux-framework#249
closed by #1988 |
chu11
pushed a commit
to chu11/flux-core
that referenced
this issue
Feb 13, 2019
The wreck exec system is worthless, remove it along with associated commands, tests, and support code. Since libjsc doesn't work without wreck, it is removed as well. Fixes flux-framework#1984 Closes flux-framework#1947 Closes flux-framework#1618 Closes flux-framework#1595 Closes flux-framework#1593 Closes flux-framework#1468 Closes flux-framework#1438 Closes flux-framework#1419 Closes flux-framework#1410 Closes flux-framework#915 Closes flux-framework#894 Closes flux-framework#866 Closes flux-framework#833 Closes flux-framework#774 Closes flux-framework#772 Closes flux-framework#335 Closes flux-framework#249
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Normally I don't think this would be a big deal, but somehow jobs every once in a while get into runrequest then never make progress again. Having a way to clear out such jobs from sched and wreck would be useful for such cases.
The text was updated successfully, but these errors were encountered: