Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobs: clear claim for already-dead paused jobs #92121

Merged
merged 1 commit into from
Nov 22, 2022

Conversation

stevendanna
Copy link
Collaborator

@stevendanna stevendanna commented Nov 18, 2022

Previously we only cleared the claim after the state machine returned and only if the status wasn't pause-requested or
cancel-requested. This filter on status, however, was unnecessary.

The job may still be in the cancel-requested or pause-requested state when we go to clear the claim because the transaction that resulted in the canceled context may not have completed. But, it is still fine to clear the claim. There are 1 of two cases:

  1. Either the transaction that cancelled us fails and we are thus
    still in the state cancel-requested or paused-requested with no
    claim. This is fine. The claim-jobs loop will claim the job and we will then move
    the state to paused or reverting, just with no context to cancel.

  2. The transaction succeeds and we are in paused or reverting without
    a claim set. Just as we wanted.

Here we remove the where clause to always clear the claim when we return from the state machine.

In the case of (1), when processing the cancel-requested or paused-requested state the second time, we may still want the claim cleared. Here, we make sure it gets cleared even in the case where there is no running job that actually needs to be canceled.

Fixes #92112

Epic: None

Release note: None

Previously we only cleared the claim after the state machine returned
and only if the status wasn't pause-requested or
cancel-requested. This filter on status, however, was unnecessary.

The job may still be in the cancel-requested or pause-requested state
when we go to clear the claim because the transaction that resulted in
the canceled context may not have completed. But, it is still fine to
clear the claim. There are 1 of two cases:

1) Either the transaction that cancelled us fails and we are thus
   still in the state cancel-requested or paused-requested with no
   claim. This is fine. The adoption loop will adopt the job and move
   the state to paused or reverting, just with no context to cancel.

2) The transaction succeeds and we are in paused or reverting without
   a claim set. Just as we wanted.

Here we remove the where clause to always clear the claim when we
return from the state machine.

In the case of (1), when processing the cancel-requested or
paused-requested state the second time, we may still want the claim
cleared. Here, we make sure it gets cleared even in the case where
there is no running job that actually needs to be canceled.

Fixes cockroachdb#92112

Release note: None
@stevendanna stevendanna requested a review from a team as a code owner November 18, 2022 12:51
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I could imagine wanting to batch this but I don't anticipate it being common.

@stevendanna
Copy link
Collaborator Author

bors r=ajwerner

@craig
Copy link
Contributor

craig bot commented Nov 22, 2022

Build succeeded:

@craig craig bot merged commit a9080f2 into cockroachdb:master Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

jobs: TestPauseReason failed
3 participants