-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DNM] release-19.2: storage/gc: create gc pkg, reverse iteration in rditer, paginate versions during GC #44257
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SeekLT is exclusive on the key being seeked. It should be allowed to SeekLT to the end of a span. This commit makes that possible. Release note: None
Prior to this commit, the ReplicaDataIterator only permitted forward iteration. This PR exposes the ability to iterate in reverse by adding a `Prev()` method as well as a constructor option to seek the iterator to the end of the data range. Release note: None
This commit is just code movement. It moves the helper functions of `RunGC()`, `processLocalKeyRange()` and `processAbortSpan()` from above to below. Release note: None
This commit moves the logic of RunGC as well as the engine.GC struct into a separate subpackage. As of this commit that subpackage contains no testing other than what existed for the engine.GC code. The RunGC function is a reasonably well specified interface to separate the logic of scanning a range and collecting the garbage from the gcQueue. I was getting overwhelmed by testing boundaries and unit testing in general. Release note: None
This commit reworks the processing of replicated state underneath the gcQueue for the purpose of determining and sending GC requests. The primary intention of this commit is to remove the need to buffer all of the versions of a key in memory. As we learned in cockroachdb#42531, this bufferring can be extremely unfortunate when using sequence data types which are written to frequently. Prior to this commit, the code forward iterates through the range's data and eagerly reads all versions of the a key into memory. It then binary searches those versions to find the latest timestamp for the key which can be GC'd. It then reverse iterates through all of those versions to determine the latest version of the key which would put the current batch over its limit. This last step works to paginate the process of actually deleting the data for many versions of the same key. I suppose this pagination was added to ensure that write batches due to GC requests don't get too large. Unfortunately this logic was unable to paginate the loading of versions from the storage engine. In this new commit, the entire process of computing data to GC now uses reverse iteration; for each key we examine versions from oldest to newest. The commit adds a `gcIterator` which wraps this reverse iteration with some useful lookahead. During this GC process, at most two additional versions need to examined to determine whether a given version is garbage. While this approach relies on reverse iteration which is known to be less efficient than forward iteration, it offers the opportunity to avoid allocating memory for versions of a key which are not going to end up as a part of a GC request. This reduction in memory usage shows up in benchmarks (see below). The change retains the old implementation as a testing strategy and as a basis for the benchmarks. ``` name old time/op new time/op delta Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8 924ns ± 8% 590ns ± 1% -36.13% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8 976ns ± 2% 578ns ± 1% -40.75% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8 944ns ± 0% 570ns ± 9% -39.63% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8 903ns ± 0% 612ns ± 3% -32.29% (p=0.016 n=4+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8 994ns ± 9% 592ns ± 9% -40.47% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8 669ns ± 4% 526ns ± 1% -21.34% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8 624ns ± 0% 529ns ± 2% -15.16% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8 636ns ± 4% 534ns ± 2% -16.04% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8 612ns ± 1% 532ns ± 3% -13.08% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8 638ns ± 2% 534ns ±10% -16.35% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8 603ns ± 6% 527ns ± 8% -12.51% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8 613ns ± 5% 517ns ± 6% -15.78% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8 619ns ± 6% 534ns ± 4% -13.61% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8 607ns ± 7% 520ns ± 2% -14.39% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8 599ns ± 4% 501ns ± 7% -16.36% (p=0.008 n=5+5) name old speed new speed delta Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8 23.9MB/s ± 8% 37.3MB/s ± 1% +56.23% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8 22.6MB/s ± 2% 38.1MB/s ± 1% +68.81% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8 23.3MB/s ± 0% 38.7MB/s ± 9% +66.06% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8 24.4MB/s ± 0% 36.0MB/s ± 3% +47.73% (p=0.016 n=4+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8 22.2MB/s ± 8% 37.3MB/s ± 9% +68.09% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8 34.4MB/s ± 4% 43.7MB/s ± 1% +27.08% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8 36.9MB/s ± 0% 43.4MB/s ± 2% +17.84% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8 36.2MB/s ± 4% 43.1MB/s ± 2% +19.02% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8 37.6MB/s ± 1% 43.3MB/s ± 3% +15.02% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8 36.0MB/s ± 2% 43.2MB/s ±10% +19.87% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8 36.5MB/s ± 5% 41.8MB/s ± 9% +14.39% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8 35.9MB/s ± 5% 42.7MB/s ± 6% +18.83% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8 35.6MB/s ± 6% 41.2MB/s ± 4% +15.66% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8 36.3MB/s ± 6% 42.3MB/s ± 2% +16.69% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8 36.7MB/s ± 4% 44.0MB/s ± 7% +19.69% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8 325B ± 0% 76B ± 0% -76.62% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8 358B ± 0% 49B ± 0% ~ (p=0.079 n=4+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8 340B ± 0% 29B ± 0% -91.47% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8 328B ± 0% 18B ± 0% -94.51% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8 325B ± 0% 14B ± 0% -95.69% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8 226B ± 0% 2B ± 0% ~ (p=0.079 n=4+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8 228B ± 0% 3B ± 0% -98.69% (p=0.000 n=5+4) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8 228B ± 0% 2B ± 0% -99.12% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8 228B ± 0% 2B ± 0% -99.12% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8 226B ± 0% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8 388B ± 2% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8 391B ± 2% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8 389B ± 1% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8 391B ± 2% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8 390B ± 1% 0B -100.00% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8 0.00 0.00 ~ (all equal) ``` Release note (bug fix): The GC process was improved to paginate the key versions of a key to fix OOM crashes which can occur when there are extremely large numbers of versions for a given key.
ajwerner
force-pushed
the
ajwerner/19.2-gc-fix
branch
from
January 23, 2020 02:12
3853b09
to
cad6077
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of #43862 for a hot fix.