kvserver: rangefeed txn pusher barrier may cause resolved timestamp stalls #119536
Labels
A-kv-rangefeed
Rangefeed infrastructure, server+client
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
T-kv
KV Team
In #117612, we added a barrier command to flush the rangefeed's Raft pipeline when pushing aborted transactions, as a fix for #104309.
The barrier command is marked
isUnsplittable
to prevent it from spanning range boundaries. Unfortunately, the DistSender enforces this constraint based on its range cache, which can be stale. Moreover, it will never attempt to refresh its range cache in response to this.If a rangefeed runs on a follower replica after a range merge, the local DistSender's range cache may be stale, and contain the pre-merge range descriptors. When the barrier command is submitted spanning the entire merged range, it will be continually rejected by the DistSender, until some other request happens to trigger a cache refresh. This can prevent the rangefeed's resolved timestamp (and thus checkpoints) from advancing, similarly preventing the changefeed's frontier (or watermark) from advancing, logging the following error:
Rangefeed events will still be emitted as usual, and garbage collection will be prevented by CDC protected timestamps, allowing the rangefeed to recover if it is restarted.
Seen in #119333.
The text was updated successfully, but these errors were encountered: