-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: benchmark and optimize ComputeStatsForRange
#84544
Labels
A-storage
Relating to our storage engine (Pebble) on-disk storage.
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
T-storage
Storage Team
Comments
erikgrinaker
added
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
A-storage
Relating to our storage engine (Pebble) on-disk storage.
T-storage
Storage Team
labels
Jul 17, 2022
craig bot
pushed a commit
that referenced
this issue
Aug 11, 2022
84107: sql, builtins: implement format r=[rafiss] a=HonoreDB Closes #68330. Unfortunately hooking into fmt by implementing the Formatter interface doesn't give us the postgres-style syntax, so instead I've implemented my own very stripped-down version of fmt by copying golang's. Release note (sql change): Added the format builtin function. format interpolates arguments into a string in the style of C's sprintf. For example, format('Hello, %s', 'world') returns 'Hello, world'. 85580: storage: remove key comparison in MVCC stats computations r=jbowens a=erikgrinaker This patch restructures MVCC stats computations to use an MVCC iterator with appropriate bounds. This allows omitting a key comparison in the hot path, yielding a ~10% performance improvement. The new stats functions are: * `ComputeStats`: calculates stats for a `Reader` key span. * `ComputeStatsWithVisitors`: calculates stats for a `Reader` key span, calling the given visitor callbacks for every point/range key. * `ComputeStatsForIter`: calculates stats for a given `MVCCIterator`, by scanning it until exhausted. It also removes the `MVCCIterator.ComputeStats` method, which had no business being part of the interface. ``` name old time/op new time/op delta MVCCComputeStats_Pebble/valueSize=64-24 130ms ± 1% 119ms ± 1% -8.77% (p=0.000 n=10+10) name old speed new speed delta MVCCComputeStats_Pebble/valueSize=64-24 516MB/s ± 1% 565MB/s ± 1% +9.61% (p=0.000 n=10+10) ``` Resolves #41899. Touches #84544. Release note: None 85778: sql/catalog/tabledesc: fixed a bug in `maybeAddConstraintIDs` r=Xiang-Gu a=Xiang-Gu Previously, when we RunPostDeserializationChanges, one of the step is to fill in constraint IDs if they are not already set, which is done in function `maybeAddConstraintIDs`. That function however is buggy in that, if there is constraint mutation on the table descriptor, we directly go assign a constraint ID (from `tbl.nextConstraintID`) to it, without checking whether it already has a non-zero constraint ID or not. This PR fixes that. Release note: None 85943: sql: don't create idx recommendations for internal queries r=maryliag a=maryliag Previously we were generating recommendations for internal queries. This commit adds a check and if is internal, it won't generate a recommendation. Partially addresses #85934 Release note: None 85951: kvserver: replace ad-hoc stats in RaftTransport by metrics r=tbg a=pavelkalinnikov These stats were only printed in text to the logs, and hard to use and match with other existing metrics in monitoring. There is a queue size metric introduced in 5d51ab3 which properly exports the `queue` counter to monitoring. The exported graphs allow observing both the current, and historical queue sizes (including the approx. max size), so this PR safely removes both stats as obsolete. This PR also similarly replaces all the remaining ad-hoc counters with proper metrics, and removes the plumbing and data structures that were previously maintaining those stats. The granularity of the new metrics is per-node/transport. This is due to the metrics infrastructure not being particularly good at handling high cardinality combinations of labels such as O(N^2) node ID pairs. Part of #83917 Release note: None Co-authored-by: Aaron Zinger <[email protected]> Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Xiang Gu <[email protected]> Co-authored-by: Marylia Gutierrez <[email protected]> Co-authored-by: Pavel Kalinnikov <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Aug 14, 2022
86091: storage: use `RangeKeyChanged()` in `ComputeStats()` r=nicktrav a=erikgrinaker This patch makes use of `RangeKeyChanged()` in `ComputeStats()`, for more efficient processing of MVCC range tombstones. ``` name old time/op new time/op delta MVCCComputeStats_Pebble/valueSize=8/numRangeKeys=0-24 230ms ± 0% 220ms ± 1% -4.57% (p=0.008 n=5+5) MVCCComputeStats_Pebble/valueSize=8/numRangeKeys=1-24 344ms ± 1% 271ms ± 1% -21.18% (p=0.008 n=5+5) MVCCComputeStats_Pebble/valueSize=8/numRangeKeys=100-24 365ms ± 1% 293ms ± 0% -19.80% (p=0.008 n=5+5) MVCCComputeStats_Pebble/valueSize=32/numRangeKeys=0-24 162ms ± 1% 154ms ± 0% -4.92% (p=0.008 n=5+5) MVCCComputeStats_Pebble/valueSize=32/numRangeKeys=1-24 239ms ± 0% 190ms ± 0% -20.75% (p=0.008 n=5+5) MVCCComputeStats_Pebble/valueSize=32/numRangeKeys=100-24 260ms ± 1% 209ms ± 0% -19.54% (p=0.008 n=5+5) MVCCComputeStats_Pebble/valueSize=256/numRangeKeys=0-24 55.7ms ± 8% 51.0ms ± 7% ~ (p=0.095 n=5+5) MVCCComputeStats_Pebble/valueSize=256/numRangeKeys=1-24 72.8ms ± 2% 58.6ms ± 1% -19.49% (p=0.016 n=5+4) MVCCComputeStats_Pebble/valueSize=256/numRangeKeys=100-24 82.1ms ± 5% 67.9ms ± 3% -17.26% (p=0.008 n=5+5) ``` Touches #84544. Release note: None Co-authored-by: Erik Grinaker <[email protected]>
I think we've done most of what we can here with #86091. As seen in #82559 (comment) we're now down to a 10% regression from 22.1, which matches the regression in scans. I'll leave this open just in case we get time to do some more micro-optimizations here, but it's not likely. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-storage
Relating to our storage engine (Pebble) on-disk storage.
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
T-storage
Storage Team
As with other reads,
ComputeStatsForRange
now has to take MVCC range tombstones into account as well, which comes with a performance penalty. Many of the optimizations in #82559 and #83049 will also apply here, but we should see if there are additional optimization we can do.One optimization that comes to mind is to construct the iterator internally, with appropriate bounds and options. Setting precise bounds would avoid a key comparison as described in #41899, yielding a 30% improvement. However, we sometimes need to compute stats for SSTs (using e.g.
NewPebbleMemSSTIterator
), which might preclude constructing the iterator internally.We should also get rid of
Engine.ComputeStats()
while we're at it, and migrate all callers to useComputeStatsForRange
. However, one benefit ofEngine.ComputeStats()
is that it respects spanset assertions. We should instead consider forcing MVCC iterators to specify both a lower and upper bound, and do spanset assertions onNewMVCCIterator
instead.This needs to be benchmarked against
release-22.1
.Jira issue: CRDB-17722
Epic CRDB-2624
The text was updated successfully, but these errors were encountered: