-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: refine transaction retry error messages #10466
Conversation
domain/domain.go
Outdated
@@ -1051,7 +1051,7 @@ var ( | |||
// ErrInfoSchemaExpired returns the error that information schema is out of date. | |||
ErrInfoSchemaExpired = terror.ClassDomain.New(codeInfoSchemaExpired, "Information schema is out of date.") | |||
// ErrInfoSchemaChanged returns the error that information schema is changed. | |||
ErrInfoSchemaChanged = terror.ClassDomain.New(codeInfoSchemaChanged, "Information schema is changed.") | |||
ErrInfoSchemaChanged = terror.ClassDomain.New(codeInfoSchemaChanged, "Information schema is changed."+kv.TxnRetryableMark) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are three types of errors could be safe to retry the transaction,
- ErrInfoSchemaChanged
- ErrWriteConflict
- ErrRetryable
add the marker for them, to make jepsen happy.
var ( | ||
// ErrClosed is used when close an already closed txn. | ||
ErrClosed = terror.ClassKV.New(codeClosed, "Error: Transaction already closed") | ||
// ErrNotExist is used when try to get an entry with an unexist key from KV store. | ||
ErrNotExist = terror.ClassKV.New(codeNotExist, "Error: key not exist") | ||
// ErrConditionNotMatch is used when condition is not met. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some errors and code are removed, because they are no used.
kv/error.go
Outdated
// ErrKeyExists returns when key is already exist. | ||
ErrKeyExists = terror.ClassKV.New(codeKeyExists, "key already exist") | ||
// ErrNotImplemented returns when a function is not implemented yet. | ||
ErrNotImplemented = terror.ClassKV.New(codeNotImplemented, "not implemented") | ||
// ErrWriteConflict is the error when the commit meets an write conflict error. | ||
ErrWriteConflict = terror.ClassKV.New(mysql.ErrWriteConflict, mysql.MySQLErrName[mysql.ErrWriteConflict]+TxnRetryableMark) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the ErrWriteConflict here to resolve the import circle.
kv/error.go
Outdated
ErrConditionNotMatch.Equal(err) || | ||
// TiKV exception message will tell you if you should retry or not | ||
strings.Contains(err.Error(), "try again later") { | ||
if ErrRetryable.Equal(err) || ErrWriteConflict.Equal(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handle ErrRetryable and ErrWriteConflict here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename the function to IsTxnRetryableError
is better
@@ -862,17 +862,6 @@ func (c *twoPhaseCommitter) execute(ctx context.Context) error { | |||
return errors.Trace(err) | |||
} | |||
|
|||
// mockGetTSErrorInRetry should wait MockCommitErrorOnce first, then will run into retry() logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PD timeout should be handled in backoffs.
@@ -311,14 +312,18 @@ func extractLockFromKeyErr(keyErr *pb.KeyError) (*Lock, error) { | |||
} | |||
|
|||
func extractKeyErr(keyErr *pb.KeyError) error { | |||
failpoint.Inject("ErrMockRetryableOnly", func(val failpoint.Value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this to mock conflict == nil, but retryable != nil, e.g. error: txnLockNotFound.
if keyErr.Conflict != nil { | ||
err := newWriteConflictError(keyErr.Conflict) | ||
return errors.Annotate(err, txnRetryableMark) | ||
return newWriteConflictError(keyErr.Conflict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
txnRetryableMark -> ErrWriteConflict
errors.Annotate will make the txnRetryableMark at the end of the error stack.
store/tikv/snapshot.go
Outdated
err := errors.Errorf("tikv restarts txn: %s", keyErr.GetRetryable()) | ||
logutil.Logger(context.Background()).Debug("error", zap.Error(err)) | ||
return errors.Annotate(err, txnRetryableMark) | ||
return kv.ErrRetryable.GenWithStackByArgs("tikv restarts txn: " + keyErr.GetRetryable()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
txnRetryableMark -> ErrRetryable
store/tikv/txn.go
Outdated
@@ -305,7 +305,7 @@ func (txn *tikvTxn) Commit(ctx context.Context) error { | |||
defer txn.store.txnLatches.UnLock(lock) | |||
if lock.IsStale() { | |||
err = errors.Errorf("txnStartTS %d is stale", txn.startTS) | |||
return errors.Annotate(err, txnRetryableMark) | |||
return kv.ErrWriteConflict.GenWithStackByArgs(txn.startTS, 0, "is stale") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
txnRetryableMark -> ErrWriteConflict
Codecov Report
@@ Coverage Diff @@
## master #10466 +/- ##
================================================
+ Coverage 77.3044% 77.3084% +0.0039%
================================================
Files 412 412
Lines 86722 86715 -7
================================================
- Hits 67040 67038 -2
+ Misses 14525 14521 -4
+ Partials 5157 5156 -1 |
/run-all-tests |
/run-integration-common-test |
kv/error.go
Outdated
// errors which SQL layer can safely retry. | ||
ErrRetryable = terror.ClassKV.New(codeRetryable, "Error: KV error safe to retry") | ||
// ErrRetryable is used when KV store occurs RPC error or some other errors which SQL layer can safely retry. | ||
ErrRetryable = terror.ClassKV.New(codeRetryable, "Error: KV error safe to retry %s"+TxnRetryableMark) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to ErrTxnRetryable
is better.
kv/error.go
Outdated
) | ||
|
||
// TxnRetryableMark is used to uniform the commit error messages which could retry the transaction. | ||
const TxnRetryableMark = " [try again later]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the leading space?
kv/error.go
Outdated
// ErrRetryable is used when KV store occurs RPC error or some other | ||
// errors which SQL layer can safely retry. | ||
ErrRetryable = terror.ClassKV.New(codeRetryable, "Error: KV error safe to retry") | ||
// ErrRetryable is used when KV store occurs RPC error or some other errors which SQL layer can safely retry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better add comment
currently, the err is returned when lock not found in Commit, subjective to change in the future.
kv/error.go
Outdated
// ErrTxnRetryable is used when KV store occurs retryable error which SQL layer can safely retry the transaction. | ||
// When using TiKV as the storage node, the error is returned ONLY when lock not found (txnLockNotFound) in Commit, | ||
// subject to change it in the future. | ||
ErrTxnRetryable = terror.ClassKV.New(codeRetryable, "Error: KV error safe to retry %s "+TxnRetryableMark) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codeRetryable
-> 'codeTxnRetryable`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed, PTAL
session/session.go
Outdated
@@ -543,9 +543,9 @@ func (s *session) isInternal() bool { | |||
|
|||
func (s *session) isRetryableError(err error) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isTxnRetryableError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed, PTAL
LGTM |
prettyWriteKey(&buf, conflict.Key) | ||
buf.WriteString(" primary=") | ||
prettyWriteKey(&buf, conflict.Primary) | ||
return ErrWriteConflict.FastGen(buf.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should enhance FastGen by use pingcap/errors#13 :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for high conflict situation, control logic error should never generate stack.
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
/run-all-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Shuaipeng Yu [email protected]
What problem does this PR solve?
[try again later]
as the transaction retry marker is not suitable. It costs too many resources to search the error messagesstrings.Contrains(err.Error(), "[try again later]")
when the error stack is very long.errors.Annotate
sometimes only set the message in the full stack error, that means it could not be used by jepsen.What is changed and how it works?
Check List
Tests
Code changes
Side effects