Skip to content

Commit

Permalink
jobs: fix mixed-version jobs flake
Browse files Browse the repository at this point in the history
Similar to cockroachdb#107570
this is a short term fix for when an a query is executed with an AS OF SYSTEM TIME
picks a transaction timestamp before the job_info migration has run.
In which case parts of the jobs infrastructure will attempt to query
the job_info column even though it doesn't exist at the transaction's timestamp.

As a short term fix, when we encounter an UndefinedObject error for the job_info table
we generate a synthetic retryable error so that the txn is pushed to a higher timestamp
at which the upgrade will have completed and the job_info table will be visible.
The longer term fix is being tracked in cockroachdb#106764.

On master I can no longer reproduce the failure in cockroachdb#105032 but
on 23.1 with this change I can successfully run 30 iterations of the test
on a seed (-8690666577594439584) which previously saw occurrences
of this flake.

Fixes: cockroachdb#103239
Fixes: cockroachdb#105032

Release note: None
  • Loading branch information
adityamaru committed Aug 8, 2023
1 parent b87852a commit 79cc2ea
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion pkg/jobs/job_info_storage.go
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,10 @@ func (i InfoStorage) Write(ctx context.Context, infoKey string, value []byte) er
if value == nil {
return errors.AssertionFailedf("missing value (infoKey %q)", infoKey)
}
return i.write(ctx, infoKey, value)
if err := i.write(ctx, infoKey, value); err != nil {
return MaybeGenerateForcedRetryableError(ctx, i.txn.KV(), err)
}
return nil
}

// Delete removes the info record for the provided infoKey.
Expand Down

0 comments on commit 79cc2ea

Please sign in to comment.