-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: add MVCC steps and seeks to EXPLAIN ANALYZE #64503
Conversation
e2f4b04
to
cd7beb4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)
pkg/storage/pebble_mvcc_scanner.go, line 149 at r2 (raw file):
// numSeeks is the number of times that this scanner had to seek due to // exceeding its itersBeforeSeek configuration value for a given key. numSeeks uint64
Thanks for starting on this! We really need to expose more iteration stats, and this is a good start.
I'm going to nit-pick about a few things here:
numMVCCKeysRead
is being incremented initerNext
and there are many reasons that is being called, not just "read and then skip due to having too-old ...". There are a variable number ofMVCCIterator.Next
calls that happen initerNext
, while this is counting once periterNext
.- This instrumentation for seeks is somewhat fragile since it is happening at the caller and not the callee. And
pebbleMVCCScanner
is not the only place where we incur read costs. - The instrumentation for reverse scan is partial. The SeekLTs are being counted, but I don't see the
p.parent.Prev
calls being counted. I think this relates to the above fragility.
I would prefer a lower-level instrumentation point to which we can add more over time.
- Add
numSeeks
,numNexts
,numPrevs
to the existingstorage.IteratorStats
, - Change
pebbleIterator
to track these and reset these onClose
(even if the iterator is reusable). - The one other real implementation of
MVCCIterator
(other thanpebbleIterator
) isintentInterleavingIter
. It uses anMVCCIterator
and anEngineIterator
both of which are implemented bypebbleIterator
. We can add theStats
method to theEngineIterator
interface, and sum the two stats when returning to the caller.
Thanks for the review! Definitely it makes sense that these basic counters are too simplistic. I think it would make sense for you or one of the storage team members to take on the task of doing the iteration stats correctly - I don't really know what is going on too well at this part of the stack. I'm happy to keep banging away on this on Fridays but realistically it will take a long time that way :) I wonder, is there some kind of low hanging fruit that could start stats collection off on the right foot but still provide some basic metrics to the user of whether the scan was slower than expected due to MVCC garbage? |
The calls to internalIterator are tracked in Iterator since we use both mergingIter and getIter in that role, and we probably don't need these stats for background work (compactions) involving mergingIter. Informs cockroachdb/cockroach#59069 Intended to unblock cockroachdb/cockroach#64503
The calls to internalIterator are tracked in Iterator since we use both mergingIter and getIter in that role, and we probably don't need these stats for background work (compactions) involving mergingIter. Informs cockroachdb/cockroach#59069 Intended to unblock cockroachdb/cockroach#64503
The calls to internalIterator are tracked in Iterator since we use both mergingIter and getIter in that role, and we probably don't need these stats for background work (compactions) involving mergingIter. Informs cockroachdb/cockroach#59069 Intended to unblock cockroachdb/cockroach#64503
I added some stuff to Pebble in cockroachdb/pebble#1129. I'll follow that up with a change to the The way I am currently thinking about this is that if the number of seeks or steps on the |
Very awesome, thank you for turning this around in Pebble so quickly. Will be happy to rebase to use your work soon. The heuristic you describe makes some sense to me. Perhaps we could expose these quantities as multipliers, and if the multiplier is high enough, perhaps we could emit some kind of warning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully
cockroach/pkg/storage/engine.go
Lines 90 to 105 in f5b21cf
// Iteration stats. We directly expose pebble.IteratorStats. Callers | |
// may want to aggregate and interpret these in the following manner: | |
// - Aggregate {Forward,Reverse}SeekCount, {Forward,Reverse}StepCount. | |
// - Interpret the four aggregated stats as follows: | |
// - {SeekCount,StepCount}[InterfaceCall]: We can refer to these simply as | |
// {SeekCount,StepCount} in logs/metrics/traces. These represents | |
// explicit calls by the implementation of MVCCIterator, in response to | |
// the caller of MVCCIterator. A high count relative to the unique MVCC | |
// keys returned suggests there are many versions for the same key. | |
// - {SeekCount,StepCount}[InternalIterCall]: We can refer to these simply | |
// as {InternalSeekCount,InternalStepCount}. If these are significantly | |
// larger than the ones in the preceding bullet, it suggests that there | |
// are lots of uncompacted deletes or stale Pebble-versions (not to be | |
// confused with MVCC versions) that need to be compacted away. This | |
// should be very rare, but has been observed. | |
Stats pebble.IteratorStats |
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)
cd7beb4
to
327fef7
Compare
Thanks for the updated stats @sumeerbhola! I redid this commit based on your recommendations. So far, we don't attempt any heuristics to determine when to display steps/seeks - we always display the "interface" steps/seeks, and display the "internal" steps/seeks when the verbose flag is used. We'll probably want to do better here, but for now it's good to get the data out there at least, I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, with some minor comments, plus why the seek and step stats are all 0.
I didn't look very carefully at the plumbing.
Reviewed 1 of 25 files at r3, 10 of 25 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)
pkg/roachpb/api.go, line 1542 at r4 (raw file):
// SafeFormat implements redact.SafeFormatter. func (s *ScanStats) SafeFormat(w redact.SafePrinter, _ rune) { w.Printf("scan stats: stepped %d times (%d internal); sought %d times (%d internal)",
seeked?
sought will be confusing I think.
pkg/sql/execinfrapb/component_stats.proto, line 134 at r4 (raw file):
optional util.optional.Duration contention_time = 4 [(gogoproto.nullable) = false]; optional util.optional.Uint num_interface_steps = 5 [(gogoproto.nullable) = false];
can we just embed roachpb.ScanStats
here?
pkg/sql/logictest/testdata/logic_test/dist_vectorize, line 74 at r4 (raw file):
KV rows read: 5 KV bytes read: 40 B MVCC step count: 0
do you know why these counts are 0? We must have done at least 1 seek to position the pebble.Iterator
. Perhaps something wrong in the plumbing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)
pkg/sql/execinfra/stats.go, line 64 at r4 (raw file):
// PopulateKVStats adds data from the input ScanStats to the input KVStats. func PopulateKVStats(kvStats *execinfrapb.KVStats, ss *ScanStats) {
nit: perhaps 'PopulateKVScanStats' or 'PopulateKVMVCCStats' ?
pkg/sql/execinfrapb/component_stats.go, line 325 at r4 (raw file):
timeVal(&s.KV.KVTime) timeVal(&s.KV.ContentionTime) resetUint(&s.KV.StepCount)
Are these stats something non-deterministic? If so, perhaps we should override it with a useful value like what we do with s.KV.BytesRead
pkg/sql/row/kv_fetcher.go, line 143 at r4 (raw file):
f.newSpan = true nBytes := len(f.batchResponse) for i := range f.kvs {
Hmm, is this change related to tracking MVCC stats? Or it's a bug fix to our accoutning for bytesRead
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @jordanlewis, and @sumeerbhola)
pkg/sql/logictest/testdata/logic_test/dist_vectorize, line 74 at r4 (raw file):
Previously, sumeerbhola wrote…
do you know why these counts are 0? We must have done at least 1 seek to position the
pebble.Iterator
. Perhaps something wrong in the plumbing?
We set values like this to 0 in test output because it's nondeterministic when run in a logic test.
pkg/sql/row/kv_fetcher.go, line 143 at r4 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Hmm, is this change related to tracking MVCC stats? Or it's a bug fix to our accoutning for
bytesRead
?
It was just a small improvement to the way we were tracking bytesRead. I think it's a small bugfix too, but most of the time we don't use the old kvs format so things were right before.
327fef7
to
251c2d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @sumeerbhola from a discussion.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @sumeerbhola)
pkg/roachpb/api.go, line 1542 at r4 (raw file):
Previously, sumeerbhola wrote…
seeked?
sought will be confusing I think.
Done. Yeah this was sort of silly.
pkg/sql/execinfra/stats.go, line 64 at r4 (raw file):
Previously, Azhng (Archer Zhang) wrote…
nit: perhaps 'PopulateKVScanStats' or 'PopulateKVMVCCStats' ?
Done.
pkg/sql/execinfrapb/component_stats.go, line 325 at r4 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Are these stats something non-deterministic? If so, perhaps we should override it with a useful value like what we do with
s.KV.BytesRead
Yes, these are non-deterministic in logic tests. Not sure what useful value we'd have here.
pkg/sql/execinfrapb/component_stats.proto, line 134 at r4 (raw file):
Previously, sumeerbhola wrote…
can we just embed
roachpb.ScanStats
here?
Unfortunately we can't, assuming we want to follow the pattern in this file, because the roachpb.ScanStats use ordinary primitive uint32s, whereas things in this protobuf use this util.optional.Uint thing which as I understand gets around the inability to distinguish 0 from not-present in protobuf otherwise.
178391e
to
0197fcb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 26 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @jordanlewis, and @sumeerbhola)
pkg/sql/logictest/testdata/logic_test/dist_vectorize, line 74 at r4 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
We set values like this to 0 in test output because it's nondeterministic when run in a logic test.
Is there any test that validates that the end-to-end plumbing is working ok? If not, I think it is worth having one test.
And was there a particular count you were seeing the non-determinism in? I would expect that a single node setup with a simple table with a single version for each key should be deterministic for these counts -- these are not really affected by the shape of the LSM since they are generated by operations in the top-level iterator which does not see the details of the underlying merging iterator.
0197fcb
to
89f4aad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @sumeerbhola)
pkg/sql/logictest/testdata/logic_test/dist_vectorize, line 74 at r4 (raw file):
Previously, sumeerbhola wrote…
Is there any test that validates that the end-to-end plumbing is working ok? If not, I think it is worth having one test.
And was there a particular count you were seeing the non-determinism in? I would expect that a single node setup with a simple table with a single version for each key should be deterministic for these counts -- these are not really affected by the shape of the LSM since they are generated by operations in the top-level iterator which does not see the details of the underlying merging iterator.
Yeah, I've added a test.
The non-determinism comes into play because these tests are run on several different configurations of distribution. Agreed, on a fixed cluster type there should be no nondeterminism.
Thanks for the reviews! |
62a6ce1
to
14e3e40
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis, @RaduBerinde, and @sumeerbhola)
pkg/sql/explain_test.go, line 345 at r11 (raw file):
foundSteps, foundSeeks := getMVCCStats(t, r) assert.Equal(t, expectedSteps, foundSteps)
Seems like we are having a mix of assert.Equal()
vs require.Equal()
in our testing code, is there a preference for one over another?
14e3e40
to
992dc0a
Compare
992dc0a
to
a318492
Compare
a318492
to
0ec9bb3
Compare
After cockroachdb#61583, we emit GetRequests from SQL when possible. We forgot to update the stats collection code that tracks the number of bytes read from KV when GetRequests were emitted; this is now corrected. Release note: None
0ec9bb3
to
4d28e9c
Compare
This commit adds 4 new fields to EXPLAIN ANALYZE for each operator that reads data from disk: MVCC steps and seeks, as well as MVCC "internal" steps and seeks when verbose mode is toggled. MVCC steps is the number of times that the underlying storage iterator stepped forward during the work to serve the operator's reads, including stepping over MVCC keys that were too old for user in a scan. MVCC seeks is the number of times that the underlying storage iterator jumped (seeked) to a different data location. Seeks are more expensive than steps. Comparing MVCC steps to KVs read helps indicate to a user when a scan might be slower than expected due to MVCC garbage in the middle of the keyspace being scanned. A high number of MVCC seeks might also indicate a lot of skipped MVCC garbage, especially when doing a sequential scan. Release note (sql change): EXPLAIN ANALYZE now contains more information about the MVCC behavior of operators that scan data from disk. See commit message for more details.
4d28e9c
to
77d424a
Compare
bors r+ |
Build succeeded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/sql/rowexec/joinreader.go, line 886 at r14 (raw file):
func (jr *joinReader) close() { // Make sure to clone any tracing span so that stats can pick it up later.
I don't understand why we're fetching the scan stats in different places in the row-by-row processors - I would expect for all four disk-reading processors we would have exactly the same new logic, why is this not the case?
86056: storage: add `MVCCIterator.RangeKeyChanged()` r=jbowens a=erikgrinaker This patch adds `RangeKeyChanged()` to the MVCC iterator interface, returning `true` if the latest positioning operation caused the overlapping range keys to change. Previously returned `RangeKeys()` and `RangeBounds()` values are valid as long as this returns `false`. Support has been implemented in most iterators, but not yet for `MVCCIncrementalIterator` which is less trivial (in particular due to the mixing of operations that respect and ignore time bounds via e.g. `NextIgnoringTime`). No callers make use of this yet. `intentInterleavingIter` likely takes a slight performance hit from this, but benchmarks are inconclusive. This will possibly be made up for by replacing `HasPointAndRange()` calls in hot paths with `RangeKeyChanged()`, even in the no-range-key case. ``` MVCCScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=0-24 5.55µs ± 1% 5.56µs ± 1% ~ (p=0.921 n=9+10) MVCCScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=1-24 12.1µs ± 1% 12.2µs ± 3% ~ (p=0.196 n=10+10) MVCCScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=100-24 165µs ± 1% 164µs ± 1% -0.83% (p=0.002 n=10+8) MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-24 38.5µs ± 1% 39.0µs ± 2% +1.38% (p=0.002 n=10+10) MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=1-24 71.8µs ± 1% 72.2µs ± 0% +0.51% (p=0.035 n=10+9) MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=100-24 179µs ± 1% 177µs ± 1% -0.88% (p=0.002 n=10+8) MVCCScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=0-24 2.76ms ± 1% 2.79ms ± 0% +1.16% (p=0.000 n=10+10) MVCCScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=1-24 5.41ms ± 1% 5.48ms ± 1% +1.21% (p=0.000 n=10+10) MVCCScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=100-24 6.52ms ± 1% 6.53ms ± 2% ~ (p=0.739 n=10+10) MVCCReverseScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=0-24 5.96µs ± 1% 6.08µs ± 1% +1.97% (p=0.000 n=9+9) MVCCReverseScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=1-24 12.1µs ± 1% 12.2µs ± 1% +0.56% (p=0.014 n=9+10) MVCCReverseScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=100-24 92.3µs ± 1% 91.2µs ± 1% -1.18% (p=0.000 n=10+9) MVCCReverseScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-24 51.7µs ± 2% 51.7µs ± 1% ~ (p=0.529 n=10+10) MVCCReverseScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=1-24 85.6µs ± 1% 86.0µs ± 1% ~ (p=0.093 n=10+10) MVCCReverseScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=100-24 183µs ± 1% 183µs ± 1% ~ (p=0.481 n=10+10) MVCCReverseScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=0-24 3.97ms ± 1% 3.98ms ± 1% ~ (p=0.063 n=10+10) MVCCReverseScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=1-24 6.70ms ± 1% 6.73ms ± 0% +0.39% (p=0.028 n=10+9) MVCCReverseScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=100-24 7.78ms ± 1% 7.84ms ± 1% +0.77% (p=0.005 n=10+10) MVCCGet_Pebble/batch=false/versions=1/valueSize=8/numRangeKeys=0-24 5.35µs ± 2% 5.34µs ± 2% ~ (p=0.493 n=10+10) MVCCGet_Pebble/batch=false/versions=1/valueSize=8/numRangeKeys=1-24 11.4µs ± 1% 11.6µs ± 1% +1.67% (p=0.000 n=10+8) MVCCGet_Pebble/batch=false/versions=1/valueSize=8/numRangeKeys=100-24 179µs ± 1% 176µs ± 1% -1.66% (p=0.000 n=10+9) MVCCGet_Pebble/batch=false/versions=10/valueSize=8/numRangeKeys=0-24 6.47µs ± 2% 6.43µs ± 1% ~ (p=0.315 n=10+10) MVCCGet_Pebble/batch=false/versions=10/valueSize=8/numRangeKeys=1-24 17.3µs ± 1% 17.6µs ± 2% +1.90% (p=0.000 n=10+10) MVCCGet_Pebble/batch=false/versions=10/valueSize=8/numRangeKeys=100-24 184µs ± 2% 182µs ± 1% -0.91% (p=0.010 n=10+9) MVCCGet_Pebble/batch=false/versions=100/valueSize=8/numRangeKeys=0-24 15.2µs ± 2% 15.2µs ± 4% ~ (p=0.631 n=10+10) MVCCGet_Pebble/batch=false/versions=100/valueSize=8/numRangeKeys=1-24 58.0µs ± 2% 57.6µs ± 3% ~ (p=0.218 n=10+10) MVCCGet_Pebble/batch=false/versions=100/valueSize=8/numRangeKeys=100-24 230µs ± 1% 225µs ± 2% -2.15% (p=0.000 n=10+10) MVCCGet_Pebble/batch=true/versions=1/valueSize=8/numRangeKeys=0-24 3.69µs ± 2% 3.63µs ± 1% -1.60% (p=0.001 n=9+10) MVCCGet_Pebble/batch=true/versions=1/valueSize=8/numRangeKeys=1-24 8.59µs ± 1% 8.68µs ± 1% +1.07% (p=0.003 n=10+10) MVCCGet_Pebble/batch=true/versions=1/valueSize=8/numRangeKeys=100-24 169µs ± 1% 165µs ± 1% -2.17% (p=0.000 n=10+10) MVCCGet_Pebble/batch=true/versions=10/valueSize=8/numRangeKeys=0-24 4.63µs ± 1% 4.61µs ± 2% ~ (p=0.356 n=9+10) MVCCGet_Pebble/batch=true/versions=10/valueSize=8/numRangeKeys=1-24 14.2µs ± 1% 14.3µs ± 0% +0.97% (p=0.000 n=10+8) MVCCGet_Pebble/batch=true/versions=10/valueSize=8/numRangeKeys=100-24 173µs ± 1% 169µs ± 1% -2.26% (p=0.000 n=10+10) MVCCGet_Pebble/batch=true/versions=100/valueSize=8/numRangeKeys=0-24 11.9µs ± 4% 11.8µs ± 3% ~ (p=0.720 n=10+9) MVCCGet_Pebble/batch=true/versions=100/valueSize=8/numRangeKeys=1-24 51.9µs ± 1% 52.2µs ± 1% ~ (p=0.156 n=10+9) MVCCGet_Pebble/batch=true/versions=100/valueSize=8/numRangeKeys=100-24 213µs ± 1% 209µs ± 2% -1.74% (p=0.000 n=10+10) ``` Touches #84379. Release note: None 86084: storage: omit `MVCCIterator.Stats()` call during `MVCCScan`/`MVCCGet` r=nicktrav a=erikgrinaker `MVCCScan` and `MVCCGet` attach iterator statistics to an active trace. However, they would fetch the iterator statistics even if no trace was active, which has a significant cost. This patch avoids fetching the stats when there is no active trace. ``` name old time/op new time/op delta MVCCGet_Pebble/batch=true/versions=1/valueSize=8/numRangeKeys=0-24 3.59µs ± 0% 3.34µs ± 0% -7.02% (p=0.000 n=10+8) MVCCGet_Pebble/batch=true/versions=1/valueSize=8/numRangeKeys=1-24 7.43µs ± 1% 7.01µs ± 0% -5.70% (p=0.000 n=10+9) MVCCGet_Pebble/batch=true/versions=10/valueSize=8/numRangeKeys=0-24 4.62µs ± 1% 4.36µs ± 1% -5.53% (p=0.000 n=10+10) MVCCGet_Pebble/batch=true/versions=10/valueSize=8/numRangeKeys=1-24 10.4µs ± 1% 10.0µs ± 1% -3.79% (p=0.000 n=10+10) ``` Resolves #86083. Touches #64503. Release note: None Co-authored-by: Erik Grinaker <[email protected]>
Updates #57322. It doesn't track number of bytes scanned over, not sure how to efficiently do that.