-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
region not receiving event from tikv for too long time cause the batch resolvedts event is dropped #1599
Labels
area/ticdc
Issues or PRs related to TiCDC.
severity/critical
subject/replication-interruption
Denotes an issue or pull request is related to replication interruption.
type/bug
The issue is confirmed as a bug.
Comments
index c723262..1631785 100644
--- a/cdc/kv/client.go
+++ b/cdc/kv/client.go
@@ -73,7 +73,7 @@ const (
// hard code switch
// true: use kv client v2, which has a region worker for each stream
// false: use kv client v1, which runs a goroutine for every single region
-var enableKVClientV2 = true
+var enableKVClientV2 = false
type singleRegionInfo struct {
verID tikv.RegionVerID
@@ -1277,6 +1277,7 @@ func (s *eventFeedSession) sendResolvedTs(
pendingRegions *syncRegionFeedStateMap,
addr string,
) error {
+ log.Debug("batch resolved ts", zap.Reflect("regions", resolvedTs.Regions), zap.Uint64("ts", resolvedTs.Ts))
for _, regionID := range resolvedTs.Regions {
state, ok := regionStates[regionID]
if ok {
@@ -1294,6 +1295,8 @@ func (s *eventFeedSession) sendResolvedTs(
case <-ctx.Done():
return ctx.Err()
}
+ } else {
+ log.Warn("region not found", zap.Uint64("regionID", regionID), zap.Uint64("ts", resolvedTs.Ts), zap.String("addr", fmt.Sprintf("%p", s)))
}
}
return nil
@@ -1408,6 +1411,7 @@ func (s *eventFeedSession) singleEventFeed(
log.Warn("region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock",
zap.Uint64("regionID", regionID), zap.Stringer("span", span),
zap.Duration("duration", sinceLastResolvedTs),
+ zap.String("addr", fmt.Sprintf("%p", s)),
zap.Uint64("resolvedTs", lastResolvedTs))
maxVersion := oracle.ComposeTS(oracle.GetPhysical(currentTimeFromPD.Add(-10*time.Second)), 0)
err = s.lockResolver.Resolve(ctx, regionID, maxVersion) |
|
goroutines: |
|
I suspect the block of region span |
This was referenced Apr 6, 2021
Closed
amyangfei
added
the
subject/replication-interruption
Denotes an issue or pull request is related to replication interruption.
label
Apr 26, 2021
This was referenced May 19, 2021
This is fixed by above PRs |
@amyangfei seems still has warnings, I also meet this problem, and my ticdc_version is 5.1.0. |
Sink write duration reflects transaction execution to downstream, the resolve lock warning is not relevant to it. You can check the following metrics
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/ticdc
Issues or PRs related to TiCDC.
severity/critical
subject/replication-interruption
Denotes an issue or pull request is related to replication interruption.
type/bug
The issue is confirmed as a bug.
Bug Report
The text was updated successfully, but these errors were encountered: