-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: compute MVCCStats in AdminSplit instead of splitTrigger #119499
Labels
A-kv
Anything in KV that doesn't belong in a more specific category.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
Comments
miraradeva
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-kv
Anything in KV that doesn't belong in a more specific category.
T-kv
KV Team
labels
Feb 22, 2024
This was referenced Feb 22, 2024
miraradeva
added a commit
to miraradeva/cockroach
that referenced
this issue
Mar 4, 2024
…-user spans ComputeStatsForRange iterates over an entire range and computes its MVCC stats. This patch adds ComputeStatsForRangeUserOnly and ComputeStatsForRangeExcludingUser, which compute the MVCC stats corresponding to the user-only and non-user spans in the range, respectively. These functions are useful for computing the estimated stats proposed in cockroachdb#119499 by allowing to quickly iterate only over the non-user key spans in the range, and separately iterate over the user-only key spans. Part of: cockroachdb#119499 Release note: None
miraradeva
added a commit
to miraradeva/cockroach
that referenced
this issue
Mar 4, 2024
Previously, we computed MVCC stats for the LHS and RHS of a split in splitTrigger, while holding latches. This resulted in high tail latencies due to foreground work not being able to proceed as long as the split holds these latches. More details in cockroachdb#119499. This change moves some of the stats computation to AdminSplit, where no latches are held. In particular, in AdminSplit, we pre-compute the part of the LHS stats corresponding to user-data spans, and pass that to the splitTrigger. In the splitTrigger, we iterate only over the non-user spans in the range (which are small), and combine the resulting stats with the passed in user stats to arrive at the new LHS stats. The RHS stats are derived by substituting the LHS stats from the total on-disk stats. This computation does not produce 100% correct stats because any writes concurrent with the split may not be attributed correctly to the LHS or RHS. Therefore, both LHS and RHS stats are marked with ContainsEstimates. To prevent the stats from drifting after successive splits, we kick off a stats re-computation in AdminSplit (only if the on-disk stats contain estimates). Thus, each split may introduce some stats estimates, but before doing so it will correct any previous estimates. There are two invariants that the stats computation guarantees: 1. Non-user stats (SysBytes, SysCount, etc.) are always correct. This is because we scan those spans while holding latches. 2. If there are no writes concurrent with the split, the stats are always correct. Fixes: cockroachdb#119499 Release note (performance improvement): Splits no longer hold latches for time proportional to the range size while computing MVCC stats. Instead, MVCC stats are pre-computed before the critical section of the split. As a side effect, the resulting stats are no longer 100% accurate because they may correctly distribute writes concurrent with the split. To mitigate this, and to prevent the stats from drifting after successive splits, the existing stored stats are re-computed and corrected (if needed) as part of the non-critical section of the split.
miraradeva
added a commit
to miraradeva/cockroach
that referenced
this issue
Mar 6, 2024
…-user spans ComputeStatsForRange iterates over an entire range and computes its MVCC stats. This patch adds ComputeStatsForRangeUserOnly and ComputeStatsForRangeExcludingUser, which compute the MVCC stats corresponding to the user-only and non-user spans in the range, respectively. These functions are useful for computing the estimated stats proposed in cockroachdb#119499 by allowing to quickly iterate only over the non-user key spans in the range, and separately iterate over the user-only key spans. Part of: cockroachdb#119499 Release note: None
miraradeva
added a commit
to miraradeva/cockroach
that referenced
this issue
Mar 6, 2024
Previously, we computed MVCC stats for the LHS and RHS of a split in splitTrigger, while holding latches. This resulted in high tail latencies due to foreground work not being able to proceed as long as the split holds these latches. More details in cockroachdb#119499. This change moves some of the stats computation to AdminSplit, where no latches are held. In particular, in AdminSplit, we pre-compute the part of the LHS stats corresponding to user-data spans, and pass that to the splitTrigger. In the splitTrigger, we iterate only over the non-user spans in the range (which are small), and combine the resulting stats with the passed in user stats to arrive at the new LHS stats. The RHS stats are derived by substituting the LHS stats from the total on-disk stats. This computation does not produce 100% correct stats because any writes concurrent with the split may not be attributed correctly to the LHS or RHS. Therefore, both LHS and RHS stats are marked with ContainsEstimates. To prevent the stats from drifting after successive splits, we kick off a stats re-computation in AdminSplit (only if the on-disk stats contain estimates). Thus, each split may introduce some stats estimates, but before doing so it will correct any previous estimates. There are two invariants that the stats computation guarantees: 1. Non-user stats (SysBytes, SysCount, etc.) are always correct. This is because we scan those spans while holding latches. 2. If there are no writes concurrent with the split, the stats are always correct. Fixes: cockroachdb#119499 Release note (performance improvement): Splits no longer hold latches for time proportional to the range size while computing MVCC stats. Instead, MVCC stats are pre-computed before the critical section of the split. As a side effect, the resulting stats are no longer 100% accurate because they may correctly distribute writes concurrent with the split. To mitigate this, and to prevent the stats from drifting after successive splits, the existing stored stats are re-computed and corrected (if needed) as part of the non-critical section of the split.
miraradeva
added a commit
to miraradeva/cockroach
that referenced
this issue
Mar 6, 2024
…ccurate-stats split This patch is a followup to cockroachdb#119499, and allows to add thresholds on the number of keys and bytes of the difference between the pre-split MVCC stats (retrieved in AdminSplit) and the stats when the split holds latches (retrieved in splitTrigger). This difference corresponds to writes concurrent with the split. If the difference is too large, the split falls back to computing LHS and RHS stats accurately. The difference is computed only for stats corresponding to user-data; system stats are always kept accurate. The thresholds are tunable by cluster settings. Fixes: cockroachdb#119503 Release note: None
miraradeva
added a commit
to miraradeva/cockroach
that referenced
this issue
Mar 11, 2024
…-user spans ComputeStatsForRange iterates over an entire range and computes its MVCC stats. This patch adds ComputeStatsForRangeUserOnly and ComputeStatsForRangeExcludingUser, which compute the MVCC stats corresponding to the user-only and non-user spans in the range, respectively. These functions are useful for computing the estimated stats proposed in cockroachdb#119499 by allowing to quickly iterate only over the non-user key spans in the range, and separately iterate over the user-only key spans. Part of: cockroachdb#119499 Release note: None
miraradeva
added a commit
to miraradeva/cockroach
that referenced
this issue
Mar 11, 2024
Previously, we computed MVCC stats for the LHS and RHS of a split in splitTrigger, while holding latches. This resulted in high tail latencies due to foreground work not being able to proceed as long as the split holds these latches. More details in cockroachdb#119499. This change moves some of the stats computation to AdminSplit, where no latches are held. In particular, in AdminSplit, we pre-compute the part of the LHS stats corresponding to user-data spans, and pass that to the splitTrigger. In the splitTrigger, we iterate only over the non-user spans in the range (which are small), and combine the resulting stats with the passed in user stats to arrive at the new LHS stats. The RHS stats are derived by substituting the LHS stats from the total on-disk stats. This computation does not produce 100% correct stats because any writes concurrent with the split may not be attributed correctly to the LHS or RHS. Therefore, both LHS and RHS stats are marked with ContainsEstimates. To prevent the stats from drifting after successive splits, we kick off a stats re-computation in AdminSplit (only if the on-disk stats contain estimates). Thus, each split may introduce some stats estimates, but before doing so it will correct any previous estimates. There are two invariants that the stats computation guarantees: 1. Non-user stats (SysBytes, SysCount, etc.) are always correct. This is because we scan those spans while holding latches. 2. If there are no writes concurrent with the split, the stats are always correct. Fixes: cockroachdb#119499 Release note (performance improvement): Splits no longer hold latches for time proportional to the range size while computing MVCC stats. Instead, MVCC stats are pre-computed before the critical section of the split. As a side effect, the resulting stats are no longer 100% accurate because they may correctly distribute writes concurrent with the split. To mitigate this, and to prevent the stats from drifting after successive splits, the existing stored stats are re-computed and corrected (if needed) as part of the non-critical section of the split.
miraradeva
added a commit
to miraradeva/cockroach
that referenced
this issue
Mar 11, 2024
…ccurate-stats split This patch is a followup to cockroachdb#119499, and allows to add thresholds on the number of keys and bytes of the difference between the pre-split MVCC stats (retrieved in AdminSplit) and the stats when the split holds latches (retrieved in splitTrigger). This difference corresponds to writes concurrent with the split. If the difference is too large, the split falls back to computing LHS and RHS stats accurately. The difference is computed only for stats corresponding to user-data; system stats are always kept accurate. These thresholds are tunable by two new cluster settings, MaxMVCCStatCountDiff and MaxMVCCStatBytesDiff, which denote the maximum number of entities (e.g. keys, intents) and maximum number of bytes (e.g. value bytes, range value bytes), respectively, that are acceptable as the difference between the pre- and post-split values of an individual stat. The former defaults to 1000, and the latter to 512KB (0.1% of the maximum range size). Fixes: cockroachdb#119503 Release note: None
miraradeva
added a commit
to miraradeva/cockroach
that referenced
this issue
Mar 14, 2024
…-user spans ComputeStatsForRange iterates over an entire range and computes its MVCC stats. This patch adds ComputeStatsForRangeUserOnly and ComputeStatsForRangeExcludingUser, which compute the MVCC stats corresponding to the user-only and non-user spans in the range, respectively. These functions are useful for computing the estimated stats proposed in cockroachdb#119499 by allowing to quickly iterate only over the non-user key spans in the range, and separately iterate over the user-only key spans. Part of: cockroachdb#119499 Release note: None
miraradeva
added a commit
to miraradeva/cockroach
that referenced
this issue
Mar 14, 2024
Previously, we computed MVCC stats for the LHS and RHS of a split in splitTrigger, while holding latches. This resulted in high tail latencies due to foreground work not being able to proceed as long as the split holds these latches. More details in cockroachdb#119499. This change moves some of the stats computation to AdminSplit, where no latches are held. In particular, in AdminSplit, we pre-compute the part of the LHS stats corresponding to user-data spans, and pass that to the splitTrigger. In the splitTrigger, we iterate only over the non-user spans in the range (which are small), and combine the resulting stats with the passed in user stats to arrive at the new LHS stats. The RHS stats are derived by substituting the LHS stats from the total on-disk stats. This computation does not produce 100% correct stats because any writes concurrent with the split may not be attributed correctly to the LHS or RHS. Therefore, both LHS and RHS stats are marked with ContainsEstimates. To prevent the stats from drifting after successive splits, we kick off a stats re-computation in AdminSplit (only if the on-disk stats contain estimates). Thus, each split may introduce some stats estimates, but before doing so it will correct any previous estimates. There are two invariants that the stats computation guarantees: 1. Non-user stats (SysBytes, SysCount, etc.) are always correct. This is because we scan those spans while holding latches. 2. If there are no writes concurrent with the split, the stats are always correct. Fixes: cockroachdb#119499 Release note (performance improvement): Splits no longer hold latches for time proportional to the range size while computing MVCC stats. Instead, MVCC stats are pre-computed before the critical section of the split. As a side effect, the resulting stats are no longer 100% accurate because they may correctly distribute writes concurrent with the split. To mitigate this, and to prevent the stats from drifting after successive splits, the existing stored stats are re-computed and corrected (if needed) as part of the non-critical section of the split.
miraradeva
added a commit
to miraradeva/cockroach
that referenced
this issue
Mar 14, 2024
…ccurate-stats split This patch is a followup to cockroachdb#119499, and allows to add thresholds on the number of keys and bytes of the difference between the pre-split MVCC stats (retrieved in AdminSplit) and the stats when the split holds latches (retrieved in splitTrigger). This difference corresponds to writes concurrent with the split. If the difference is too large, the split falls back to computing LHS and RHS stats accurately. The difference is computed only for stats corresponding to user-data; system stats are always kept accurate. These thresholds are tunable by two new cluster settings, MaxMVCCStatCountDiff and MaxMVCCStatBytesDiff, which denote the maximum number of entities (e.g. keys, intents) and maximum number of bytes (e.g. value bytes, range value bytes), respectively, that are acceptable as the difference between the pre- and post-split values of an individual stat. The former defaults to 5000, and the latter to 5.12MB (1% of the maximum range size). Fixes: cockroachdb#119503 Release note: None
miraradeva
added a commit
to miraradeva/cockroach
that referenced
this issue
Mar 19, 2024
…-user spans ComputeStatsForRange iterates over an entire range and computes its MVCC stats. This patch adds ComputeStatsForRangeUserOnly and ComputeStatsForRangeExcludingUser, which compute the MVCC stats corresponding to the user-only and non-user spans in the range, respectively. These functions are useful for computing the estimated stats proposed in cockroachdb#119499 by allowing to quickly iterate only over the non-user key spans in the range, and separately iterate over the user-only key spans. Part of: cockroachdb#119499 Release note: None
miraradeva
added a commit
to miraradeva/cockroach
that referenced
this issue
Mar 19, 2024
…ccurate-stats split This patch is a followup to cockroachdb#119499, and allows to add thresholds on the number of keys and bytes of the difference between the pre-split MVCC stats (retrieved in AdminSplit) and the stats when the split holds latches (retrieved in splitTrigger). This difference corresponds to writes concurrent with the split. If the difference is too large, the split falls back to computing LHS and RHS stats accurately. The difference is computed only for stats corresponding to user-data; system stats are always kept accurate. These thresholds are tunable by two new cluster settings, MaxMVCCStatCountDiff and MaxMVCCStatBytesDiff, which denote the maximum number of entities (e.g. keys, intents) and maximum number of bytes (e.g. value bytes, range value bytes), respectively, that are acceptable as the difference between the pre- and post-split values of an individual stat. The former defaults to 5000, and the latter to 5.12MB (1% of the maximum range size). Fixes: cockroachdb#119503 Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv
Anything in KV that doesn't belong in a more specific category.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
In order to mitigate the latency impact of computing MVCC stats during a split (#22348), we can move some of the stats computation from the
splitTrigger
(which holds latches while computing the stats) toAdminSplit
(where we just need a pebble snapshot). The resulting stats will be inaccurate wrt writes concurrent with the split but experiments have shown that this discrepancy is fairly small. More details on this approach and alternatives in Split Latency due to MVCC Stats Calculation.The resulting stats will be marked as containing estimates, which will trigger stats re-computation next time the range is processed by the consistency checker queue. In the mean time, the stats are close to accurate, and the estimates do not drift (see below).
For this change, as part of
AdminSplit
compute the following:preSplitLeftStats
.preSplitLeftStats
and the stats of the entire range aspreSplitFullStats
. Since the range contains estimatespreSplitFullStats
will differ from the stored stats for the range. We can either correct them now, inAdminSplit
, or pass in a delta tosplitTrigger
, asantiDriftDelta
. This delta accounts for estimates introduced during a previous split, or AddSST, or any any other operation that introduces estimates. Correcting this delta ensures stats estimates do not compound over time.preSplitLeftStats
and potentiallyantiDriftDelta
tosplitTrigger
.In the split trigger:
preSplitLeftStats
as the post-split LHS stats.antiDriftDelta
to the stored full-range stats (if we didn't already do it inAdminSplit
).preSplitLeftStats
from the full-range stats.Jira issue: CRDB-36253
Epic CRDB-34215
The text was updated successfully, but these errors were encountered: