Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: refine transaction retry error messages #10466

Merged
merged 6 commits into from
May 15, 2019

Conversation

jackysp
Copy link
Member

@jackysp jackysp commented May 14, 2019

Signed-off-by: Shuaipeng Yu [email protected]

What problem does this PR solve?

  1. Using [try again later] as the transaction retry marker is not suitable. It costs too many resources to search the error messages strings.Contrains(err.Error(), "[try again later]") when the error stack is very long.
  2. Some tests, e.g. jepsen have to use the error message to determine whether the transaction is aborted as expected.
  3. errors.Annotate sometimes only set the message in the full stack error, that means it could not be used by jepsen.

What is changed and how it works?

  1. Do not use a string to determine whether should retry the transaction.
  2. Add "[try again later]" for ErrWriteConflict, ErrRetryable, and ErrInfoSchemaChanged, then they will be used by some test to know the transaction is safe to retry.

Check List

Tests

  • Unit test

Code changes

  • Has exported variable/fields change

Side effects

  • Increased code complexity

@jackysp jackysp added the type/enhancement The issue or PR belongs to an enhancement. label May 14, 2019
@jackysp jackysp requested review from coocood and lysu May 14, 2019 15:23
domain/domain.go Outdated
@@ -1051,7 +1051,7 @@ var (
// ErrInfoSchemaExpired returns the error that information schema is out of date.
ErrInfoSchemaExpired = terror.ClassDomain.New(codeInfoSchemaExpired, "Information schema is out of date.")
// ErrInfoSchemaChanged returns the error that information schema is changed.
ErrInfoSchemaChanged = terror.ClassDomain.New(codeInfoSchemaChanged, "Information schema is changed.")
ErrInfoSchemaChanged = terror.ClassDomain.New(codeInfoSchemaChanged, "Information schema is changed."+kv.TxnRetryableMark)
Copy link
Member Author

@jackysp jackysp May 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three types of errors could be safe to retry the transaction,

  1. ErrInfoSchemaChanged
  2. ErrWriteConflict
  3. ErrRetryable

add the marker for them, to make jepsen happy.

var (
// ErrClosed is used when close an already closed txn.
ErrClosed = terror.ClassKV.New(codeClosed, "Error: Transaction already closed")
// ErrNotExist is used when try to get an entry with an unexist key from KV store.
ErrNotExist = terror.ClassKV.New(codeNotExist, "Error: key not exist")
// ErrConditionNotMatch is used when condition is not met.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some errors and code are removed, because they are no used.

kv/error.go Outdated
// ErrKeyExists returns when key is already exist.
ErrKeyExists = terror.ClassKV.New(codeKeyExists, "key already exist")
// ErrNotImplemented returns when a function is not implemented yet.
ErrNotImplemented = terror.ClassKV.New(codeNotImplemented, "not implemented")
// ErrWriteConflict is the error when the commit meets an write conflict error.
ErrWriteConflict = terror.ClassKV.New(mysql.ErrWriteConflict, mysql.MySQLErrName[mysql.ErrWriteConflict]+TxnRetryableMark)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the ErrWriteConflict here to resolve the import circle.

kv/error.go Outdated
ErrConditionNotMatch.Equal(err) ||
// TiKV exception message will tell you if you should retry or not
strings.Contains(err.Error(), "try again later") {
if ErrRetryable.Equal(err) || ErrWriteConflict.Equal(err) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle ErrRetryable and ErrWriteConflict here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename the function to IsTxnRetryableError is better

@@ -862,17 +862,6 @@ func (c *twoPhaseCommitter) execute(ctx context.Context) error {
return errors.Trace(err)
}

// mockGetTSErrorInRetry should wait MockCommitErrorOnce first, then will run into retry() logic.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PD timeout should be handled in backoffs.

@@ -311,14 +312,18 @@ func extractLockFromKeyErr(keyErr *pb.KeyError) (*Lock, error) {
}

func extractKeyErr(keyErr *pb.KeyError) error {
failpoint.Inject("ErrMockRetryableOnly", func(val failpoint.Value) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this to mock conflict == nil, but retryable != nil, e.g. error: txnLockNotFound.

if keyErr.Conflict != nil {
err := newWriteConflictError(keyErr.Conflict)
return errors.Annotate(err, txnRetryableMark)
return newWriteConflictError(keyErr.Conflict)
Copy link
Member Author

@jackysp jackysp May 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

txnRetryableMark -> ErrWriteConflict
errors.Annotate will make the txnRetryableMark at the end of the error stack.

err := errors.Errorf("tikv restarts txn: %s", keyErr.GetRetryable())
logutil.Logger(context.Background()).Debug("error", zap.Error(err))
return errors.Annotate(err, txnRetryableMark)
return kv.ErrRetryable.GenWithStackByArgs("tikv restarts txn: " + keyErr.GetRetryable())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

txnRetryableMark -> ErrRetryable

@@ -305,7 +305,7 @@ func (txn *tikvTxn) Commit(ctx context.Context) error {
defer txn.store.txnLatches.UnLock(lock)
if lock.IsStale() {
err = errors.Errorf("txnStartTS %d is stale", txn.startTS)
return errors.Annotate(err, txnRetryableMark)
return kv.ErrWriteConflict.GenWithStackByArgs(txn.startTS, 0, "is stale")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

txnRetryableMark -> ErrWriteConflict

@codecov
Copy link

codecov bot commented May 14, 2019

Codecov Report

Merging #10466 into master will increase coverage by 0.0039%.
The diff coverage is 87.0967%.

@@               Coverage Diff                @@
##             master     #10466        +/-   ##
================================================
+ Coverage   77.3044%   77.3084%   +0.0039%     
================================================
  Files           412        412                
  Lines         86722      86715         -7     
================================================
- Hits          67040      67038         -2     
+ Misses        14525      14521         -4     
+ Partials       5157       5156         -1

@jackysp
Copy link
Member Author

jackysp commented May 14, 2019

/run-all-tests

@jackysp jackysp requested a review from cwen0 May 14, 2019 15:42
@jackysp jackysp added the priority/P1 The issue has P1 priority. label May 14, 2019
@jackysp
Copy link
Member Author

jackysp commented May 14, 2019

/run-integration-common-test

kv/error.go Outdated
// errors which SQL layer can safely retry.
ErrRetryable = terror.ClassKV.New(codeRetryable, "Error: KV error safe to retry")
// ErrRetryable is used when KV store occurs RPC error or some other errors which SQL layer can safely retry.
ErrRetryable = terror.ClassKV.New(codeRetryable, "Error: KV error safe to retry %s"+TxnRetryableMark)
Copy link
Member

@coocood coocood May 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to ErrTxnRetryable is better.

kv/error.go Outdated
)

// TxnRetryableMark is used to uniform the commit error messages which could retry the transaction.
const TxnRetryableMark = " [try again later]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the leading space?

kv/error.go Outdated
// ErrRetryable is used when KV store occurs RPC error or some other
// errors which SQL layer can safely retry.
ErrRetryable = terror.ClassKV.New(codeRetryable, "Error: KV error safe to retry")
// ErrRetryable is used when KV store occurs RPC error or some other errors which SQL layer can safely retry.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better add comment
currently, the err is returned when lock not found in Commit, subjective to change in the future.

kv/error.go Outdated
// ErrTxnRetryable is used when KV store occurs retryable error which SQL layer can safely retry the transaction.
// When using TiKV as the storage node, the error is returned ONLY when lock not found (txnLockNotFound) in Commit,
// subject to change it in the future.
ErrTxnRetryable = terror.ClassKV.New(codeRetryable, "Error: KV error safe to retry %s "+TxnRetryableMark)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codeRetryable -> 'codeTxnRetryable`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed, PTAL

@@ -543,9 +543,9 @@ func (s *session) isInternal() bool {

func (s *session) isRetryableError(err error) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isTxnRetryableError

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed, PTAL

@coocood
Copy link
Member

coocood commented May 15, 2019

LGTM

@coocood coocood added the status/LGT1 Indicates that a PR has LGTM 1. label May 15, 2019
prettyWriteKey(&buf, conflict.Key)
buf.WriteString(" primary=")
prettyWriteKey(&buf, conflict.Primary)
return ErrWriteConflict.FastGen(buf.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should enhance FastGen by use pingcap/errors#13 :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for high conflict situation, control logic error should never generate stack.

jackysp added 3 commits May 15, 2019 17:36
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
jackysp added 3 commits May 15, 2019 17:36
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
Signed-off-by: Shuaipeng Yu <[email protected]>
@jackysp
Copy link
Member Author

jackysp commented May 15, 2019

/run-all-tests

Copy link
Contributor

@lysu lysu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lysu lysu added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels May 15, 2019
@jackysp jackysp merged commit d97e7d9 into pingcap:master May 15, 2019
@jackysp jackysp deleted the refine_errmsg branch May 29, 2019 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/P1 The issue has P1 priority. status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants