Skip to content

Commit

Permalink
kvcoord: DistSender rangefeed bookkeeping had an off-by-one
Browse files Browse the repository at this point in the history
It turns out that two commits occurred about two months apart to address some
off-by-one errors due to disagreements regarding the inclusivity or exclusivity
of bounds of time intervals. In cockroachdb#79525 we added a next call to compensate for
the catch-up scan occurring at an inclusive time. In cockroachdb#82451 we made the catch-
up scan act exclusively, like the rest of the kvserver code has assumed. The
end result is that we now actually do the catch up scan one tick later than
we had intended.

This resulted in some flakey tests, and in cases where the closed timestamp
pushed a writing transaction, may have resulted in missing rows. This was
uncovered deflaking cockroachdb#90764. With some added logging we see:

```
I221102 01:31:44.444557 1509 kv/kvclient/kvcoord/dist_sender_rangefeed.go:667  [nsql1,rangefeed=lease,dest_n=1,dest_s=1,dest_r=53] 3882  RangeFeedEvent: span:<key:"\376\222\213" end_key:"\376\222\214" > resolved_ts:<wall_time:166735270430458388 >
E221102 01:31:44.445042 1509 kv/kvclient/kvcoord/dist_sender_rangefeed.go:653  [nsql1,rangefeed=lease,dest_n=1,dest_s=1,dest_r=53] 3886  RangeFeedError: retry rangefeed (REASON_RANGE_SPLIT)
I221102 01:31:44.480676 2388 sql/internal.go:1321  [nsql1,job=810294652971450369,scExec,id=106,mutation=1] 3947  txn committed at 1667352704.380458388,1
I221102 01:31:44.485558 1509 kv/kvclient/kvcoord/dist_sender_rangefeed.go:420  [nsql1,rangefeed=lease] 3965  RangeFeed /Tenant/10/Table/{3-4} disconnected with last checkpoint 105.097693ms ago: retry rangefeed (REASON_RANGE_SPLIT)
```

Notice that the commit for the schema change occurred at
`1667352704.380458388,1` and the resolved event was at `1667352704.380458388`.
As the code was before, we'd perform the catch-up scan at
`1667352704.380458388,2` and miss the write we needed to see.

Fixes cockroachdb#90764.

Release note (bug fix): Fixed a bug which, in rare cases, could result in a
changefeed missing rows which occur around the time of a split in writing
transactions which take longer than the closed timestamp target duration
(defaults to 3s).
  • Loading branch information
ajwerner committed Nov 2, 2022
1 parent aa1de0d commit 46bbd61
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
Original file line number Diff line number Diff line change
Expand Up @@ -646,7 +646,12 @@ func (ds *DistSender) singleRangeFeed(
if !t.ResolvedTS.IsEmpty() && catchupRes != nil {
finishCatchupScan()
}
args.Timestamp.Forward(t.ResolvedTS.Next())
// Note that this timestamp means that all rows in the span with
// writes at or before the timestamp have now been seen. The
// Timestamp field in the request is exclusive, meaning if we send
// the request with exactly the ResolveTS, we'll see only rows after
// that timestamp.
args.Timestamp.Forward(t.ResolvedTS)
}
case *roachpb.RangeFeedSSTable:
case *roachpb.RangeFeedError:
Expand Down

0 comments on commit 46bbd61

Please sign in to comment.