Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: compute MVCCStats in AdminSplit instead of splitTrigger #119499

Closed
miraradeva opened this issue Feb 22, 2024 · 0 comments
Closed

kvserver: compute MVCCStats in AdminSplit instead of splitTrigger #119499

miraradeva opened this issue Feb 22, 2024 · 0 comments
Assignees
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@miraradeva
Copy link
Contributor

miraradeva commented Feb 22, 2024

In order to mitigate the latency impact of computing MVCC stats during a split (#22348), we can move some of the stats computation from the splitTrigger (which holds latches while computing the stats) to AdminSplit (where we just need a pebble snapshot). The resulting stats will be inaccurate wrt writes concurrent with the split but experiments have shown that this discrepancy is fairly small. More details on this approach and alternatives in Split Latency due to MVCC Stats Calculation.

The resulting stats will be marked as containing estimates, which will trigger stats re-computation next time the range is processed by the consistency checker queue. In the mean time, the stats are close to accurate, and the estimates do not drift (see below).

For this change, as part of AdminSplit compute the following:

  • If the range doesn't contain estimates, before creating the split txn, iterate over the new LHS and compute its MVCC stats as preSplitLeftStats.
  • Otherwise, iterate over the entire range and compute the stats of the LHS as preSplitLeftStats and the stats of the entire range as preSplitFullStats. Since the range contains estimates preSplitFullStats will differ from the stored stats for the range. We can either correct them now, in AdminSplit, or pass in a delta to splitTrigger, as antiDriftDelta. This delta accounts for estimates introduced during a previous split, or AddSST, or any any other operation that introduces estimates. Correcting this delta ensures stats estimates do not compound over time.
  • Pass in preSplitLeftStats and potentially antiDriftDelta to splitTrigger.

In the split trigger:

  • Use preSplitLeftStats as the post-split LHS stats.
  • Apply antiDriftDelta to the stored full-range stats (if we didn't already do it in AdminSplit).
  • Derive the RHS stats by subtracting preSplitLeftStats from the full-range stats.
  • Mark the stats as containing estimates because these pre-split stats may not include any write concurrent with the split.
  • If any of the new LHS or RHS ranges are empty, don't mark them as containing estimates to avoid extra stats re-computation later.

Jira issue: CRDB-36253

Epic CRDB-34215

@miraradeva miraradeva added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv Anything in KV that doesn't belong in a more specific category. T-kv KV Team labels Feb 22, 2024
@miraradeva miraradeva self-assigned this Feb 22, 2024
miraradeva added a commit to miraradeva/cockroach that referenced this issue Mar 4, 2024
…-user spans

ComputeStatsForRange iterates over an entire range and computes its MVCC
stats.

This patch adds ComputeStatsForRangeUserOnly and
ComputeStatsForRangeExcludingUser, which compute the MVCC stats
corresponding to the user-only and non-user spans in the range,
respectively. These functions are useful for computing the estimated
stats proposed in cockroachdb#119499 by allowing to quickly iterate only over the
non-user key spans in the range, and separately iterate over the
user-only key spans.

Part of: cockroachdb#119499
Release note: None
miraradeva added a commit to miraradeva/cockroach that referenced this issue Mar 4, 2024
Previously, we computed MVCC stats for the LHS and RHS of a split in
splitTrigger, while holding latches. This resulted in high tail
latencies due to foreground work not being able to proceed as long as
the split holds these latches. More details in cockroachdb#119499.

This change moves some of the stats computation to AdminSplit, where no
latches are held. In particular, in AdminSplit, we pre-compute the part
of the LHS stats corresponding to user-data spans, and pass that to the
splitTrigger. In the splitTrigger, we iterate only over the non-user
spans in the range (which are small), and combine the resulting stats
with the passed in user stats to arrive at the new LHS stats. The RHS
stats are derived by substituting the LHS stats from the total on-disk
stats. This computation does not produce 100% correct stats because any
writes concurrent with the split may not be attributed correctly to the
LHS or RHS. Therefore, both LHS and RHS stats are marked with
ContainsEstimates.

To prevent the stats from drifting after successive splits, we kick off
a stats re-computation in AdminSplit (only if the on-disk stats contain
estimates). Thus, each split may introduce some stats estimates, but
before doing so it will correct any previous estimates.

There are two invariants that the stats computation guarantees: 1.
Non-user stats (SysBytes, SysCount, etc.) are always correct. This is
because we scan those spans while holding latches. 2. If there are no
writes concurrent with the split, the stats are always correct.

Fixes: cockroachdb#119499

Release note (performance improvement): Splits no longer
hold latches for time proportional to the range size while computing
MVCC stats. Instead, MVCC stats are pre-computed before the critical
section of the split. As a side effect, the resulting stats are no
longer 100% accurate because they may correctly distribute writes
concurrent with the split. To mitigate this, and to prevent the stats
from drifting after successive splits, the existing stored stats are
re-computed and corrected (if needed) as part of the non-critical
section of the split.
miraradeva added a commit to miraradeva/cockroach that referenced this issue Mar 6, 2024
…-user spans

ComputeStatsForRange iterates over an entire range and computes its MVCC
stats.

This patch adds ComputeStatsForRangeUserOnly and
ComputeStatsForRangeExcludingUser, which compute the MVCC stats
corresponding to the user-only and non-user spans in the range,
respectively. These functions are useful for computing the estimated
stats proposed in cockroachdb#119499 by allowing to quickly iterate only over the
non-user key spans in the range, and separately iterate over the
user-only key spans.

Part of: cockroachdb#119499
Release note: None
miraradeva added a commit to miraradeva/cockroach that referenced this issue Mar 6, 2024
Previously, we computed MVCC stats for the LHS and RHS of a split in
splitTrigger, while holding latches. This resulted in high tail
latencies due to foreground work not being able to proceed as long as
the split holds these latches. More details in cockroachdb#119499.

This change moves some of the stats computation to AdminSplit, where no
latches are held. In particular, in AdminSplit, we pre-compute the part
of the LHS stats corresponding to user-data spans, and pass that to the
splitTrigger. In the splitTrigger, we iterate only over the non-user
spans in the range (which are small), and combine the resulting stats
with the passed in user stats to arrive at the new LHS stats. The RHS
stats are derived by substituting the LHS stats from the total on-disk
stats. This computation does not produce 100% correct stats because any
writes concurrent with the split may not be attributed correctly to the
LHS or RHS. Therefore, both LHS and RHS stats are marked with
ContainsEstimates.

To prevent the stats from drifting after successive splits, we kick off
a stats re-computation in AdminSplit (only if the on-disk stats contain
estimates). Thus, each split may introduce some stats estimates, but
before doing so it will correct any previous estimates.

There are two invariants that the stats computation guarantees: 1.
Non-user stats (SysBytes, SysCount, etc.) are always correct. This is
because we scan those spans while holding latches. 2. If there are no
writes concurrent with the split, the stats are always correct.

Fixes: cockroachdb#119499

Release note (performance improvement): Splits no longer
hold latches for time proportional to the range size while computing
MVCC stats. Instead, MVCC stats are pre-computed before the critical
section of the split. As a side effect, the resulting stats are no
longer 100% accurate because they may correctly distribute writes
concurrent with the split. To mitigate this, and to prevent the stats
from drifting after successive splits, the existing stored stats are
re-computed and corrected (if needed) as part of the non-critical
section of the split.
miraradeva added a commit to miraradeva/cockroach that referenced this issue Mar 6, 2024
…ccurate-stats split

This patch is a followup to cockroachdb#119499, and allows to add thresholds on the
number of keys and bytes of the difference between the pre-split MVCC
stats (retrieved in AdminSplit) and the stats when the split holds
latches (retrieved in splitTrigger). This difference corresponds to
writes concurrent with the split. If the difference is too large, the
split falls back to computing LHS and RHS stats accurately. The
difference is computed only for stats corresponding to user-data;
system stats are always kept accurate. The thresholds are tunable by
cluster settings.

Fixes: cockroachdb#119503

Release note: None
miraradeva added a commit to miraradeva/cockroach that referenced this issue Mar 11, 2024
…-user spans

ComputeStatsForRange iterates over an entire range and computes its MVCC
stats.

This patch adds ComputeStatsForRangeUserOnly and
ComputeStatsForRangeExcludingUser, which compute the MVCC stats
corresponding to the user-only and non-user spans in the range,
respectively. These functions are useful for computing the estimated
stats proposed in cockroachdb#119499 by allowing to quickly iterate only over the
non-user key spans in the range, and separately iterate over the
user-only key spans.

Part of: cockroachdb#119499
Release note: None
miraradeva added a commit to miraradeva/cockroach that referenced this issue Mar 11, 2024
Previously, we computed MVCC stats for the LHS and RHS of a split in
splitTrigger, while holding latches. This resulted in high tail
latencies due to foreground work not being able to proceed as long as
the split holds these latches. More details in cockroachdb#119499.

This change moves some of the stats computation to AdminSplit, where no
latches are held. In particular, in AdminSplit, we pre-compute the part
of the LHS stats corresponding to user-data spans, and pass that to the
splitTrigger. In the splitTrigger, we iterate only over the non-user
spans in the range (which are small), and combine the resulting stats
with the passed in user stats to arrive at the new LHS stats. The RHS
stats are derived by substituting the LHS stats from the total on-disk
stats. This computation does not produce 100% correct stats because any
writes concurrent with the split may not be attributed correctly to the
LHS or RHS. Therefore, both LHS and RHS stats are marked with
ContainsEstimates.

To prevent the stats from drifting after successive splits, we kick off
a stats re-computation in AdminSplit (only if the on-disk stats contain
estimates). Thus, each split may introduce some stats estimates, but
before doing so it will correct any previous estimates.

There are two invariants that the stats computation guarantees: 1.
Non-user stats (SysBytes, SysCount, etc.) are always correct. This is
because we scan those spans while holding latches. 2. If there are no
writes concurrent with the split, the stats are always correct.

Fixes: cockroachdb#119499

Release note (performance improvement): Splits no longer
hold latches for time proportional to the range size while computing
MVCC stats. Instead, MVCC stats are pre-computed before the critical
section of the split. As a side effect, the resulting stats are no
longer 100% accurate because they may correctly distribute writes
concurrent with the split. To mitigate this, and to prevent the stats
from drifting after successive splits, the existing stored stats are
re-computed and corrected (if needed) as part of the non-critical
section of the split.
miraradeva added a commit to miraradeva/cockroach that referenced this issue Mar 11, 2024
…ccurate-stats split

This patch is a followup to cockroachdb#119499, and allows to add thresholds on the
number of keys and bytes of the difference between the pre-split MVCC
stats (retrieved in AdminSplit) and the stats when the split holds
latches (retrieved in splitTrigger). This difference corresponds to
writes concurrent with the split. If the difference is too large, the
split falls back to computing LHS and RHS stats accurately. The
difference is computed only for stats corresponding to user-data;
system stats are always kept accurate.

These thresholds are tunable by two new cluster settings,
MaxMVCCStatCountDiff and MaxMVCCStatBytesDiff, which denote the maximum
number of entities (e.g. keys, intents) and maximum number of bytes
(e.g. value bytes, range value bytes), respectively, that are
acceptable as the difference between the pre- and post-split values of
an individual stat. The former defaults to 1000, and the latter to
512KB (0.1% of the maximum range size).

Fixes: cockroachdb#119503

Release note: None
miraradeva added a commit to miraradeva/cockroach that referenced this issue Mar 14, 2024
…-user spans

ComputeStatsForRange iterates over an entire range and computes its MVCC
stats.

This patch adds ComputeStatsForRangeUserOnly and
ComputeStatsForRangeExcludingUser, which compute the MVCC stats
corresponding to the user-only and non-user spans in the range,
respectively. These functions are useful for computing the estimated
stats proposed in cockroachdb#119499 by allowing to quickly iterate only over the
non-user key spans in the range, and separately iterate over the
user-only key spans.

Part of: cockroachdb#119499
Release note: None
miraradeva added a commit to miraradeva/cockroach that referenced this issue Mar 14, 2024
Previously, we computed MVCC stats for the LHS and RHS of a split in
splitTrigger, while holding latches. This resulted in high tail
latencies due to foreground work not being able to proceed as long as
the split holds these latches. More details in cockroachdb#119499.

This change moves some of the stats computation to AdminSplit, where no
latches are held. In particular, in AdminSplit, we pre-compute the part
of the LHS stats corresponding to user-data spans, and pass that to the
splitTrigger. In the splitTrigger, we iterate only over the non-user
spans in the range (which are small), and combine the resulting stats
with the passed in user stats to arrive at the new LHS stats. The RHS
stats are derived by substituting the LHS stats from the total on-disk
stats. This computation does not produce 100% correct stats because any
writes concurrent with the split may not be attributed correctly to the
LHS or RHS. Therefore, both LHS and RHS stats are marked with
ContainsEstimates.

To prevent the stats from drifting after successive splits, we kick off
a stats re-computation in AdminSplit (only if the on-disk stats contain
estimates). Thus, each split may introduce some stats estimates, but
before doing so it will correct any previous estimates.

There are two invariants that the stats computation guarantees: 1.
Non-user stats (SysBytes, SysCount, etc.) are always correct. This is
because we scan those spans while holding latches. 2. If there are no
writes concurrent with the split, the stats are always correct.

Fixes: cockroachdb#119499

Release note (performance improvement): Splits no longer
hold latches for time proportional to the range size while computing
MVCC stats. Instead, MVCC stats are pre-computed before the critical
section of the split. As a side effect, the resulting stats are no
longer 100% accurate because they may correctly distribute writes
concurrent with the split. To mitigate this, and to prevent the stats
from drifting after successive splits, the existing stored stats are
re-computed and corrected (if needed) as part of the non-critical
section of the split.
miraradeva added a commit to miraradeva/cockroach that referenced this issue Mar 14, 2024
…ccurate-stats split

This patch is a followup to cockroachdb#119499, and allows to add thresholds on the
number of keys and bytes of the difference between the pre-split MVCC
stats (retrieved in AdminSplit) and the stats when the split holds
latches (retrieved in splitTrigger). This difference corresponds to
writes concurrent with the split. If the difference is too large, the
split falls back to computing LHS and RHS stats accurately. The
difference is computed only for stats corresponding to user-data;
system stats are always kept accurate.

These thresholds are tunable by two new cluster settings,
MaxMVCCStatCountDiff and MaxMVCCStatBytesDiff, which denote the maximum
number of entities (e.g. keys, intents) and maximum number of bytes
(e.g. value bytes, range value bytes), respectively, that are
acceptable as the difference between the pre- and post-split values of
an individual stat. The former defaults to 5000, and the latter to
5.12MB (1% of the maximum range size).

Fixes: cockroachdb#119503

Release note: None
miraradeva added a commit to miraradeva/cockroach that referenced this issue Mar 19, 2024
…-user spans

ComputeStatsForRange iterates over an entire range and computes its MVCC
stats.

This patch adds ComputeStatsForRangeUserOnly and
ComputeStatsForRangeExcludingUser, which compute the MVCC stats
corresponding to the user-only and non-user spans in the range,
respectively. These functions are useful for computing the estimated
stats proposed in cockroachdb#119499 by allowing to quickly iterate only over the
non-user key spans in the range, and separately iterate over the
user-only key spans.

Part of: cockroachdb#119499
Release note: None
miraradeva added a commit to miraradeva/cockroach that referenced this issue Mar 19, 2024
…ccurate-stats split

This patch is a followup to cockroachdb#119499, and allows to add thresholds on the
number of keys and bytes of the difference between the pre-split MVCC
stats (retrieved in AdminSplit) and the stats when the split holds
latches (retrieved in splitTrigger). This difference corresponds to
writes concurrent with the split. If the difference is too large, the
split falls back to computing LHS and RHS stats accurately. The
difference is computed only for stats corresponding to user-data;
system stats are always kept accurate.

These thresholds are tunable by two new cluster settings,
MaxMVCCStatCountDiff and MaxMVCCStatBytesDiff, which denote the maximum
number of entities (e.g. keys, intents) and maximum number of bytes
(e.g. value bytes, range value bytes), respectively, that are
acceptable as the difference between the pre- and post-split values of
an individual stat. The former defaults to 5000, and the latter to
5.12MB (1% of the maximum range size).

Fixes: cockroachdb#119503

Release note: None
@craig craig bot closed this as completed in 5ce9052 Mar 21, 2024
@github-project-automation github-project-automation bot moved this to Closed in KV Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
No open projects
Status: Closed
Development

No branches or pull requests

1 participant