-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: make deadline errors retryable #99760
sql: make deadline errors retryable #99760
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty straightforward and clean
Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @DrewKimball)
pkg/sql/catalog/descs/txn_external_test.go
line 166 at r1 (raw file):
} type fakeSession struct{ exp hlc.Timestamp }
Would it make sense to lift this into a common package, since we have another fake session implementation inside sql_test
0f2df26
to
9ddccd6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @fqazi)
pkg/sql/catalog/descs/txn_external_test.go
line 166 at r1 (raw file):
Previously, fqazi (Faizan Qazi) wrote…
Would it make sense to lift this into a common package, since we have another fake session implementation inside sql_test
Done.
9ddccd6
to
a7ab071
Compare
TFTR! bors r+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @DrewKimball, and @fqazi)
pkg/sql/conn_executor.go
line 3110 at r2 (raw file):
errors.Is(err, retriableMinTimestampBoundUnsatisfiableError) || errors.Is(err, descidgen.ErrDescIDSequenceMigrationInProgress) || execinfra.IsDynamicQueryHasNoHomeRegionError(err) ||
nit: add pgerror.ClientVisibleRetryError
interface assertions to the errors that are no longer explicitly mentioned in this method.
bors r- |
Canceled. |
a7ab071
to
3e17136
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball, @fqazi, and @yuzefovich)
pkg/sql/conn_executor.go
line 3110 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: add
pgerror.ClientVisibleRetryError
interface assertions to the errors that are no longer explicitly mentioned in this method.
cockroach/pkg/kv/kvpb/errors.go
Line 701 in 736a67e
var _ ClientVisibleRetryError = &TransactionRetryWithProtoRefreshError{} |
and then otherwise done.
Release note: None
If a transaction gets finds itself with a deadline before its current timestamp, something that can happen when there are schema changes or when the sqlliveness subsystem is unavailable, we ought to retry the transaction. Today, an assertion failure is sent. This is the wrong behavior. In order to achieve the desired goal, we detect the scenario in MaybeUpdateDeadline and we return an error which implements the appropriate interface to be interpreted as a retry error in the pgwire and sql layers. This change does a few other little things: 1) It simplies the logic to check whether an error is retriable so that all errors which pgwire will treat as client-visible retries are treated as retriable internally. In addition, we retain certain additional checks for retriable errors the sql layer handles explicitly. 2) It makes sure to reset the sqlliveness session in the descs.Collection when restarting transactions. Not doing this, on the one hand, was an oversight that meant that if the session turned over internally, then restarts wouldn't see it. However, it's not actually a bug today because only secondary tenants set the session, and they never acquire a new session. The test utilizes sessions rather than descriptors because the testing knob infrastructure is easier to manipulate. Fixes: cockroachdb#96336 Backports will deal with cockroachdb#76727 Release note (bug fix): In rare cases involving overload and schema changes, users could sometimes, transiently, see errors of the form "deadline below read timestamp is nonsensical; txn has would have no chance to commit". These errors carried and internal pgcode and could not be retried. This form of error is now classified as a retriable error and will be retried automatically either by the client or internally.
3e17136
to
d00dace
Compare
bors r+ |
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from ffb16ca to blathers/backport-release-22.1-99760: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.1.x failed. See errors above. error creating merge commit from ffb16ca to blathers/backport-release-22.2-99760: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
If a transaction gets finds itself with a deadline before its current timestamp, something that can happen when there are schema changes or when the sqlliveness subsystem is unavailable, we ought to retry the transaction. Today, an assertion failure is sent. This is the wrong behavior.
In order to achieve the desired goal, we detect the scenario in MaybeUpdateDeadline and we return an error which implements the appropriate interface to be interpreted as a retry error in the pgwire and sql layers.
This change does a few other little things:
all errors which pgwire will treat as client-visible retries are treated
as retriable internally. In addition, we retain certain additional checks
for retriable errors the sql layer handles explicitly.
when restarting transactions. Not doing this, on the one hand, was an
oversight that meant that if the session turned over internally, then
restarts wouldn't see it. However, it's not actually a bug today because
only secondary tenants set the session, and they never acquire a new
session.
The test utilizes sessions rather than descriptors because the testing knob infrastructure is easier to manipulate.
Fixes: #96336
Backports will deal with #76727
Release note (bug fix): In rare cases involving overload and schema changes, users could sometimes, transiently, see errors of the form "deadline below read timestamp is nonsensical; txn has would have no chance to commit". These errors carried and internal pgcode and could not be retried. This form of error is now classified as a retriable error and will be retried automatically either by the client or internally.