-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: investigate slow ScanRequest evaluation #96361
Comments
Hi @sumeerbhola, I've guessed the C-ategory of your issue and suitably labeled it. Please re-label if inaccurate. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Benchmark with 7 levels (these are L0 sub-levels, but these are similar to normal levels in using LevelIter) each with one file. 1000 v22.2
Note that we are mostly seeking at the interface level, due to the first 10 keys defeating
|
v22.1 is 2x faster! This is because it does not seek despite this attempt to defeat
Looks like we "temporarily" changed the code in b258e82#diff-8a8fc3cd44bc071821e87442be0871b30214293e94c435c2a21368ad8c3af3bfL486 and forgot to change it back |
This was supposed to be temporarily removed in cockroachdb@b258e82 but was never restored. It accounts for a 2x slowdown in a benchmark where a scan encounters a few keys at the beginning of the scan with > 5 versions, that cause itersBeforeSeek to drop to 0, and then for the remaining 1000s of keys with only 1 versions it uses SeekGE instead of Next. Informs cockroachdb#96361 Also see cockroachlabs/support#2033 Epic: none Release note: None
…canner.itersBeforeSeek This commit restores the lower-bound of 1 on pebbleMVCCScanner.itersBeforeSeek. This has no effect on the added benchmark since on master (unlike v22.2) we use Pebble's Iterator.NextPrefix for the common case of stepping to the next roachpb.Key. itersBeforeSeek continues to be used for seeking to a particular version and for reverse scans. The added benchmark has 7 levels and resolved intents where both the Set and SingleDelete of the intent are present in Pebble. It tries to trick pebbleMVCCScanner with having keys with many versions in the beginning of the scan. Benchmark results: BenchmarkMVCCScannerWithIntentsAndVersions-10 4000 316177 ns/op 133119 B/op 28 allocs/op stats: (interface (dir, seek, step): (fwd, 2, 1999), (rev, 0, 0)), (internal (dir, seek, step): (fwd, 2, 3110), (rev, 0, 0)), (internal-stats: (block-bytes: (total 15 K, cached 15 K)), (points: (count 3.1 K, key-bytes 88 K, value-bytes 60 K, tombstoned 0))) Informs cockroachdb#96361 Epic: none Release note: None
This was supposed to be temporarily removed in cockroachdb@b258e82 but was never restored. It accounts for a 2x slowdown in a benchmark where a scan encounters a few keys at the beginning of the scan with > 5 versions, that cause itersBeforeSeek to drop to 0, and then for the remaining 1000s of keys with only 1 versions it uses SeekGE instead of Next. Informs cockroachdb#96361 Also see cockroachlabs/support#2033 Epic: none Release note: None
…canner.itersBeforeSeek This commit restores the lower-bound of 1 on pebbleMVCCScanner.itersBeforeSeek. This has no effect on the added benchmark since on master (unlike v22.2) we use Pebble's Iterator.NextPrefix for the common case of stepping to the next roachpb.Key. itersBeforeSeek continues to be used for seeking to a particular version and for reverse scans. The added benchmark has 7 levels and resolved intents where both the Set and SingleDelete of the intent are present in Pebble. It tries to trick pebbleMVCCScanner with having keys with many versions in the beginning of the scan. Benchmark results: BenchmarkMVCCScannerWithIntentsAndVersions-10 4000 316177 ns/op 133119 B/op 28 allocs/op stats: (interface (dir, seek, step): (fwd, 2, 1999), (rev, 0, 0)), (internal (dir, seek, step): (fwd, 2, 3110), (rev, 0, 0)), (internal-stats: (block-bytes: (total 15 K, cached 15 K)), (points: (count 3.1 K, key-bytes 88 K, value-bytes 60 K, tombstoned 0))) Informs cockroachdb#96361 Epic: none Release note: None
This was supposed to be temporarily removed in cockroachdb@b258e82 but was never restored. It accounts for a 2x slowdown in a benchmark where a scan encounters a few keys at the beginning of the scan with > 5 versions, that cause itersBeforeSeek to drop to 0, and then for the remaining 1000s of keys with only 1 versions it uses SeekGE instead of Next. Informs cockroachdb#96361 Also see cockroachlabs/support#2033 Epic: none Release note: None
This was supposed to be temporarily removed in cockroachdb@b258e82 but was never restored. It accounts for a 2x slowdown in a benchmark where a scan encounters a few keys at the beginning of the scan with > 5 versions, that cause itersBeforeSeek to drop to 0, and then for the remaining 1000s of keys with only 1 versions it uses SeekGE instead of Next. Informs cockroachdb#96361 Also see cockroachlabs/support#2033 Epic: none Release note: None
96380: storage: add scan benchmark with resolved intents and fix pebbleMVCCS… r=sumeerbhola a=sumeerbhola …canner.itersBeforeSeek This commit restores the lower-bound of 1 on pebbleMVCCScanner.itersBeforeSeek. This has no effect on the added benchmark since on master (unlike v22.2) we use Pebble's Iterator.NextPrefix for the common case of stepping to the next roachpb.Key. itersBeforeSeek continues to be used for seeking to a particular version and for reverse scans. The added benchmark has 7 levels and resolved intents where both the Set and SingleDelete of the intent are present in Pebble. It tries to trick pebbleMVCCScanner with having keys with many versions in the beginning of the scan. Benchmark results: ``` BenchmarkMVCCScannerWithIntentsAndVersions-10 4000 316177 ns/op 133119 B/op 28 allocs/op stats: (interface (dir, seek, step): (fwd, 2, 1999), (rev, 0, 0)), (internal (dir, seek, step): (fwd, 2, 3110), (rev, 0, 0)), (internal-stats: (block-bytes: (total 15 K, cached 15 K)), (points: (count 3.1 K, key-bytes 88 K, value-bytes 60 K, tombstoned 0))) ``` Informs #96361 Epic: none Release note: None Co-authored-by: sumeerbhola <[email protected]>
In a production cluster there were some slow ScanRequests that took ~95ms, with trace:
94.349ms event:scan stats: stepped 674 times (22806 internal); seeked 51782 times (25902 internal); block-bytes: (total 2.6 MiB, cached 2.6 MiB); points: (count 48.706 k, key-bytes 2.6 MiB, value-bytes 1.3 MiB, tombstoned: 0 ) ranges: (count 0 ), (contained-points 0 , skipped-points 0 )
This was a single ScanRequest in the BatchRequest and returned
resp=num_keys:26165 num_bytes:2851274
from the KV layer.The range had "val_count": 3233223 and "live_count": 3160356, so the fraction of garbage is low.
(the details of this incident are in https://github.com/cockroachlabs/support/issues/2033)
The hypothesis for the excessive seeks is that (in this v22.2 cluster), we are getting unlucky with the
pebbleMVCCScanner.itersBeforeSeek
logic (that no longer exists on master, since it uses the Pebble implementedNextPrefix
). More details:manifest.LevelIterator
on v22.2, which we have optimized away on master, which can be costly.More hypothesizing:
stepped 674 times (22806 internal); seeked 51782 times (25902 internal)
The ~2x the number of seeks as the keys retrieved is explained by the
intentInterleavingIter
which needs to seek the lock-table iter and the mvcc-key-space iter. The latter will translate into an internal seek since the iterator is positioned at the preceding key. The former does not translate into an internal seek, due to:SeekGEWithLimit
the first time, the lock-table iter will do a seek, then step over the SingleDelete, Set pair for the key that is within the limit and get to the next SingleDelete and stop because of the limit. The nextSeekGEWithLimit
is to the userkey of the SingleDelete so will not do an internal seek and step over the SingleDelete, Set (2 internal next calls) and then stop due to the limit. This behavior would explain the low value of 674 interface steps vs the high value of 22806 internal steps.Next steps:
Jira issue: CRDB-24086
The text was updated successfully, but these errors were encountered: