-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
kv: configure leading closed timestamp target for global_read ranges
Informs #59680. Informs #52745. This commit updates `closedts.TargetForPolicy` to calculate a target closed timestamp that leads present time for ranges with the LEAD_FOR_GLOBAL_READS closed timestamp policy. This is needed for non-blocking transactions, which require ranges to closed time in the future. TargetForPolicy's LEAD_FOR_GLOBAL_READS calculation is more complex than its LAG_BY_CLUSTER_SETTING calculation. Instead of the policy defining an offset from the publisher's perspective, the policy defines a goal from the consumer's perspective - the goal being that present time reads (with a possible uncertainty interval) can be served from all followers. To accomplish this, we must work backwards to establish a lead time to publish closed timestamps at. The calculation looks something like the following: ``` // this should be sufficient for any present-time transaction, // because its global uncertainty limit should be <= this time. // For more, see (*Transaction).RequiredFrontier. closed_ts_at_follower = now + max_offset // the sender must account for the time it takes to propagate a // closed timestamp update to its followers. closed_ts_at_sender = closed_ts_at_follower + propagation_time // closed timestamps propagate in two ways. Both need to make it to // followers in time. propagation_time = max(raft_propagation_time, side_propagation_time) // raft propagation takes 3 network hops to go from a leader proposing // a write (with a closed timestamp update) to the write being applied. // 1. leader sends MsgProp with entry // 2. followers send MsgPropResp with vote // 3. leader sends MsgProp with higher commit index // // we also add on a small bit of overhead for request evaluation, log // sync, and state machine apply latency. raft_propagation_time = max_network_rtt * 1.5 + raft_overhead // side-transport propagation takes 1 network hop, as there is no voting. // However, it is delayed by the full side_transport_close_interval in // the worst-case. side_propagation_time = max_network_rtt * 0.5 + side_transport_close_interval // put together, we get the following result closed_ts_at_sender = now + max_offset + max( max_network_rtt * 1.5 + raft_overhead, max_network_rtt * 0.5 + side_transport_close_interval, ) ``` While writing this, I explored what it would take to use dynamic network latency measurements in this calculation to complete #59680. The code for that wasn't too bad, but brought up a number of questions, including how far into the tail we care about and whether we place upper and lower bounds on this value. To avoid needing to immediately answer these questions, the commit hardcodes a maximum network RTT of 150ms, which should be an overestimate for almost any cluster we expect to run on. The commit also adds a new `kv.closed_timestamp.lead_for_global_reads_override` cluster setting, which, if nonzero, overrides the lead time that global_read ranges use to publish closed timestamps. The cluster setting is hidden, but should provide an escape hatch for cases where we get the calculation (especially when it becomes dynamic) wrong. Release justification: needed for new functionality
- Loading branch information
1 parent
34dc5ae
commit b320c03
Showing
9 changed files
with
220 additions
and
65 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.