-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stability: retry truncating sst files upon failure #484
Conversation
sst files may be deleted after compaction
tests/pkg/ops/tikv.go
Outdated
stdout, stderr, err := exec("find", "/var/lib/tikv/db", "-name", "*.sst", "-o", "-name", "*.save") | ||
if err != nil { | ||
glog.Errorf("list sst files: stderr=%s err=%s", stderr, err.Error()) | ||
return errors.Annotate(err, "list sst files") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return errors.Annotate(err, "list sst files") | |
continue |
tests/pkg/ops/tikv.go
Outdated
} | ||
} | ||
if len(sst) == 0 { | ||
return errors.New("cannot find a sst file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return errors.New("cannot find a sst file") | |
glog.Error("cannot find a sst file") | |
continue |
return errors.Annotate(err, "list sst files") | ||
} | ||
retryCount := 0 | ||
for ; retryCount < retryLimit; retryCount++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for ; retryCount < retryLimit; retryCount++ { | |
for ; retryCount < retryLimit; retryCount++ { | |
time.Sleep(10*time.Second) |
|
||
sstCandidates := make(map[string]bool) | ||
sstCandidates := make(map[string]bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete the blanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the one more indent because it's in retry loop now.
glog.Errorf("truncate sst file: stderr=%s err=%s", stderr, err.Error()) | ||
return errors.Annotate(err, "truncate sst file") | ||
if retryCount == retryLimit { | ||
return errors.New("failed to truncate sst file after " + strconv.Itoa(retryLimit) + " trials") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the error log needs opts.Namespace
and opts.Cluster
field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added annotations to log methods. however, it's caller who pass the opts
arg, that is, the caller must know these info, so I don't think there is a need for adding these fields to returned error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the caller has not added these fields to error log too
ref: https://github.com/pingcap/tidb-operator/blob/master/tests/failover.go#L74-L81
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some error logs to the case func. If you think every error should be annotated with additional fields, it would be better to create an another PR to do it, because most of methods of operatorActions
failed to do so.
/run-e2e-tests |
56c7f9a
to
4bd7fba
Compare
/run-e2e-tests |
/run-e2e-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code needs to be formatted, rest LGTM.
@zyguan PTAL |
/run-e2e-tests |
/run-e2e-tests |
What problem does this PR solve?
sst files may be deleted after compaction, thus the
TruncateSSTFile
may fail unexpectly.This pr closes: #501
What is changed and how it works?
add retry logical.
Check List
Tests
Code changes
Side effects
Related changes
Does this PR introduce a user-facing change?: