gc: use separate latch for range tombstone operations #86551
Labels
A-kv-replication
Relating to Raft, consensus, and coordination.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Range tombstone GC requests require obtaining write latches on the range to prevent any other range tombstone operations from interfering with stats calculations in case of range splits or merges in underlying pebble.
This is done by obtaining write latches on future timestamp that allows all the reads to go through, but hold all writes within the range.
This is not very good for potential writers. That could be the case where range had cancelled import that had data written on top of that operation.
To address the problem we may have a separate write latch to serialize range tombstone write operations while only keeping read latches for the actual ranges if they are not modified. With such approach slower consistency checks could still be performed by GC requests without affecting foreground traffic.
All write operations on range tombstones would acquire read latch on this key and any latches that they require for consistency, while GC operation will obtain write latch on this key only. That behaviour will let reader and writers on current timestamp to proceed, while preventing any range key boundaries change.
Jira issue: CRDB-18813
Epic CRDB-2624
The text was updated successfully, but these errors were encountered: