OOM occurred while running replica GC old version #42531

heidawei · 2019-11-18T02:45:07Z

Describe the problem

cockroach data node crash due to OOM when a replica running GC.

To Reproduce

What did you do? Describe in your own words.
I load data to a table(load like batch insert a large number of rows ) which has auto-increment, after 25hours later, the replica running GC old version for the table, but there are too many version, and when running GC we copy all version into memory. I known auto-increment is not recommended in cockroach, but I think it is a bug.

And I found another problem, when a key has to many version to GC(more than 256KB), it will Increase normal read and write request latency in that key since the latch.

If possible, provide steps to reproduce the behavior:

Set up CockroachDB cluster ...
Send SQL ... / CLI command ...
Look at UI / log file / client app ...
See error

Expected behavior
A clear and concise description of what you expected to happen.

Additional data / screenshots
If the problem is SQL-related, include a copy of the SQL query and the schema
of the supporting tables.

If a node in your cluster encountered a fatal error, supply the contents of the
log directories (at minimum of the affected node(s), but preferably all nodes).

Note that log files can contain confidential information. Please continue
creating this issue, but contact [email protected] to submit the log
files in private.

If applicable, add screenshots to help explain your problem.

Environment:

CockroachDB version [e.g. 2.0.x]
Server OS: [e.g. Linux/Distrib]
Client app [e.g. cockroach sql, JDBC, ...]

Additional context
What was the impact?

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

heidawei · 2019-11-18T02:49:59Z

I try to support a PR to fix this bug, which I had do and test in my environment.

fast GC
when a key has too many old version to GC(> 512KB now) I will use range delete for GC.
GC limit
when the last version + 500ms > time.Now(), we will limit the key versions count to reduce latency.

ajwerner · 2019-11-21T23:26:41Z

And I found another problem, when a key has to many version to GC(more than 256KB), it will Increase normal read and write request latency in that key since the latch.

Yes, this is known. See #17229.

As for the OOM, I think I understand the problem but I'd like to understand how it occurred as I suspect there are easier solutions than your span-oriented GC request. That being said I admire your tenacity in attempting to fix the problem. As I read the code, it seems that we load every version of each individual key into RAM. This seems problematic in the case where there is a very large amount of data underneath a single key. That situation also happens to be exactly the case where a range becomes unsplittable. All that being said, you should receive write backpressure when a range exceeds kv.range.backpressure_range_size_multiplier*range_max_bytes.

Did you mess with the range size settings or the backpressure?
Is it the case that you had gigabytes worth of versions of a single key?
If so, how did you get in to such a situation?
What version of cockroach are you running?
Are there details I'm misunderstanding?

Generally if there are just kilobytes of a single key in terms of versions I don't see how you would run of out RAM during GC. Is your machine extremely memory constrained?

I am happy you filed this issue. I'm amenable to (though not enthusiastic about) the idea of using ClearRange to clear many versions of a key. I find #42640 in its current state to be more invasive than it needs to be. As a first step I'd look to paginate the individual versions of a key during the GC process rather than jumping straight to ClearRange. I'd also like to understand how the backpressure mechanisms we have failed to prevent the OOM you encountered before moving forward with any sort of code change.

heidawei · 2019-11-22T02:11:22Z

We use docker to limit the CPU and memory. In general, we use 4 CPU and 8GB of memory by default. In our usage scenario, if I keep writing a table with self-incrementing columns, or if I keep updating a row, then A large number of historical versions will be generated in the table that maintains the self-added column ID, but the fragment of this table cannot be split because there is only one valid key, and the original mechanism will actually generate a write error due to backpressure, but this kind of The scene is very common on our side. Based on business considerations, we must properly correct the logic of the split-related source code. When we identify whether we need to split, we only find that the fragment has only one valid key (range size > 64MB).
), then ignore it, so that it passes through backpressure.

heidawei · 2019-11-22T02:11:57Z

The version we are using is V19.1. I used to use another solution to solve the problem of OOM. I used the bucket. With the iteration, I will only keep a limited bucket, and each bucket is small. 1MB, up to 10 buckets, so only the last 10MB of data will be processed, but the test shows that the GC is too slow, because it requires multiple GC schedules, iterates through each time, takes a lot of time, and is easy Cause timeout, in the event of a timeout will encounter another tragedy, that is, there will never be a chance for the GC history version, because the future data will be more and more, vicious circle.
We also tried the way of reverse traversal, but considering that the iterator is not deeply understood, because it has not been used in practice. Later we think that you can use range delete to quickly clean up this extreme scenario (in our business) A large number of outdated versions under the scene are very common.
Here we also propose an improvement, cockroach is simply that all versions are loaded into the memory, but in fact we can filter in advance when loading, which will not be satisfied between the versions that are not satisfied, so you can also reduce the memory The amount of data in.

heidawei · 2019-11-22T02:21:13Z

We are also worried that the introduction of range delete in the GC has some unpredictable risks, so we submitted ISSUE and PR, and we hope to get community discussion and guidance.
In our current tests, our PR solves the impact on normal reading and writing in the GC process, and also solves the long time-consuming problem of eliminating large-scale expired data.
My timeout processing in PR is "hijacking", ignoring timeout, and our internal discussion also feels so strange, but for the time being, there is no better way to deal with this problem, so I hope to discuss this in PR. problem.
In short, we hope to have a more elegant way to solve the problems we face.

ajwerner · 2019-11-22T03:30:56Z

A large number of historical versions will be generated in the table that maintains the self-added column ID

Oh! I finally now understand that this is caused by our SQL sequence type!
What sort of write rate are you talking about in terms of sequence increments per second?

The code today simply increments the value on a single key. I suspect there are better implementations. There are at least implementations which make different tradeoffs. One such might me to keep sequences as a single row table where the primary key gets incremented every time, deleting the old row. This will lead to much saner GC behavior. The downside is that it can't be done in a single request and becomes quite exposed to retries. Perhaps we could add lower level support to make incrementing cheaper. A hybrid option might to only store so many underneath a given key but need to do an initial read to figure out the key and optimistically assume it doesn't change. Maybe every 16k or something.

Another trick would be to dramatically turn down the GC TTL for the sequence table. What do you have it set to?

Cause timeout, in the event of a timeout will encounter another tragedy, that is, there will never be a chance for the GC history version

:) I was just looking into timeouts from the queues today... #42686

I can easily extend that change to include a knob to control the timeout of the GC queue processing.

I used the bucket. With the iteration, I will only keep a limited bucket, and each bucket is small. 1MB, up to 10 buckets, so only the last 10MB of data will be processed, but the test shows that the GC is too slow,

I don't think I understand this part. Say you did one 10MB batch of versions. You're saying that if you did that continuously until they were all gone, you wouldn't be able to keep up? Or maybe you would be if you hadn't accumulated so much garbage in the sequence table?

Short term I think we should do the following:

Make a knob to control GC queue process timeout.
Make a patch to paginate the versions of key if the data size is too large.

At a certain size a clear range might be more efficient but I'd generally be pretty stunned if the GC logic couldn't keep up with increments if the GC TTL was set low enough. It's possible that we should default the GC TTL for sequences to a low value though I can see reasons not to.

As I come to better understand you use case and how the above steps might be insufficient then we can talk about more ideas.

heidawei · 2019-11-22T06:20:55Z

The use of buckets means this. For example, if there is a 1GB outdated version that requires GC, I only keep the last 10MB of data processing in the memory. The use of the bucket allows me to discard the data generated by the previous iteration. It is not the key to use the bucket. The key is that in order to avoid OOM, for each key, a GC process will only delete up to 10MB of data, so that a round of GC can not complete all the overdue data cleanup, must wait for the next round, so that the overall need for multiple rounds of scheduling Finished, every RunGC needs to traverse the entire range from scratch, causing duplicates.

heidawei · 2019-11-22T06:23:55Z

we can set the sequence table GC TTL, but this is time consuming and laborious. Every time you create a table, you need extra work. We hope to avoid this kind of manual intervention. It is often easy to make mistakes.

heidawei · 2019-11-22T06:28:19Z

If we can solve the problem of the sequence table GC, many times we don't have to keep so many versions. In this case, the probability of the problem we are discussing is very low, only extreme cases.
I think of a way, can we default the sequence table GC TTL is very small? I understand it is a special table, the impact on the backup is very small, other things should be nothing to pay attention to.

heidawei · 2019-11-22T06:37:29Z

Can we keep a mechanism like Fast GC, when users encounter similar extreme situations, users can quickly deal with it through this special channel, because this deteriorating scene is difficult to improve, we have tested and ignored GC timeout It took a day to use the original GC logic, but the effect is not ideal, the range size is still very large, if the business is still written, then this situation will not get better, because each time the GC costs a lot Iterate over all the keys, but only delete the last 10MB (for a key).

ajwerner · 2019-11-22T12:50:02Z

The use of buckets means this. For example, if there is a 1GB outdated version that requires GC, I only keep the last 10MB of data processing in the memory. The use of the bucket allows me to discard the data generated by the previous iteration. It is not the key to use the bucket. The key is that in order to avoid OOM, for each key, a GC process will only delete up to 10MB of data, so that a round of GC can not complete all the overdue data cleanup, must wait for the next round, so that the overall need for multiple rounds of scheduling Finished, every RunGC needs to traverse the entire range from scratch, causing duplicates.

I think that you misunderstand what I mean by pagination. The idea is not to allow the RunGC method to return until all of the garbage has been removed. The RunGC method already subdivides it’s processing into several requests if the data is due to multiple keys. I’m not suggesting only clearing 10MB per gc queue process cycle. I’m suggesting clearing as much as possible during a gc queue cycle but with 10 MB worth of data scanned at a time. The iterator state can be preserved avoiding a rescan.

I agree that if you could only clear 10MB per gc process it would be a problem. That problem would be made much worse if the gc ttl is 25 hours.

If we can solve the problem of the sequence table GC, many times we don't have to keep so many versions. In this case, the probability of the problem we are discussing is very low, only extreme cases.
I think of a way, can we default the sequence table GC TTL is very small? I understand it is a special table, the impact on the backup is very small, other things should be nothing to pay attention to.

It sounds like providing a mechanism to have a short default gc ttl for sequences is a good idea and seems like it would have prevented your problem in the first place.

ajwerner · 2019-11-22T12:56:21Z

Can we keep a mechanism like Fast GC, when users encounter similar extreme situations, users can quickly deal with it through this special channel, because this deteriorating scene is difficult to improve, we have tested and ignored GC timeout It took a day to use the original GC logic, but the effect is not ideal, the range size is still very large, if the business is still written, then this situation will not get better, because each time the GC costs a lot Iterate over all the keys, but only delete the last 10MB (for a key).

That “special channel” is something I actively would like to avoid. I get your point that ClearRange can be a more efficient way to delete many versions of the same key. Let’s see about making the existing per-key delete logic more robust to multiple versions before we go and do anything extreme.

heidawei · 2019-11-26T07:48:25Z

I will also conduct more tests and attempts to fix this problem. The ideas and principles you provided give a new perspective to solve the problem, thank you very much

This commit reworks the processing of replicated state underneath the gcQueue for the purpose of determining and sending GC requests. The primary intention of this commit is to remove the need to buffer all of the versions of a key in memory. As we learned in cockroachdb#42531, this bufferring can be extremely unfortunate when using sequence data types which are written to frequently. Prior to this commit, the code forward iterates through the range's data and eagerly reads all versions of the a key into memory. It then binary searches those versions to find the latest timestamp for the key which can be GC'd. It then reverse iterates through all of those versions to determine the latest version of the key which would put the current batch over its limit. This last step works to paginate the process of actually deleting the data for many versions of the same key. I suppose this pagination was added to ensure that write batches due to GC requests don't get too large. Unfortunately this logic was unable to paginate the loading of versions from the storage engine. In this new commit, the entire process of computing data to GC now uses reverse iteration; for each key we examine versions from oldest to newest. The commit adds a `gcIterator` which wraps this reverse iteration with some useful lookahead. During this GC process, at most two additional versions need to examined to determine whether a given version is garbage. There is some concern that this reverse iteration is slower and less efficient than the forward iteration which preceded it. I'm not too concerned by this cost as I expect the IO required to load the data to dominate computational overhead of this process. That being said, benchmarks are probably a good idea. TODO(ajwerner): Add more and better testing that exercises the pagination that motivated this change. TODO(ajwerner): Add benchmarks that explore the cost of reverse iteration. Release note (bug fix): The GC process was improved to paginate the key versions of a key to fix OOM crashes which can occur when there are extremely large numbers of versions for a given key.

This commit reworks the processing of replicated state underneath the gcQueue for the purpose of determining and sending GC requests. The primary intention of this commit is to remove the need to buffer all of the versions of a key in memory. As we learned in cockroachdb#42531, this bufferring can be extremely unfortunate when using sequence data types which are written to frequently. Prior to this commit, the code forward iterates through the range's data and eagerly reads all versions of the a key into memory. It then binary searches those versions to find the latest timestamp for the key which can be GC'd. It then reverse iterates through all of those versions to determine the latest version of the key which would put the current batch over its limit. This last step works to paginate the process of actually deleting the data for many versions of the same key. I suppose this pagination was added to ensure that write batches due to GC requests don't get too large. Unfortunately this logic was unable to paginate the loading of versions from the storage engine. In this new commit, the entire process of computing data to GC now uses reverse iteration; for each key we examine versions from oldest to newest. The commit adds a `gcIterator` which wraps this reverse iteration with some useful lookahead. During this GC process, at most two additional versions need to examined to determine whether a given version is garbage. While this approach relies on reverse iteration which is known to be less efficient than forward iteration, it offers the opportunity to avoid allocating memory for versions of a key which are not going to end up as a part of a GC request. This reduction in memory usage shows up in benchmarks (see below). The change retains the old implementation as a testing strategy and as a basis for the benchmarks. ``` name old time/op new time/op delta Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8 924ns ± 8% 590ns ± 1% -36.13% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8 976ns ± 2% 578ns ± 1% -40.75% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8 944ns ± 0% 570ns ± 9% -39.63% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8 903ns ± 0% 612ns ± 3% -32.29% (p=0.016 n=4+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8 994ns ± 9% 592ns ± 9% -40.47% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8 669ns ± 4% 526ns ± 1% -21.34% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8 624ns ± 0% 529ns ± 2% -15.16% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8 636ns ± 4% 534ns ± 2% -16.04% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8 612ns ± 1% 532ns ± 3% -13.08% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8 638ns ± 2% 534ns ±10% -16.35% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8 603ns ± 6% 527ns ± 8% -12.51% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8 613ns ± 5% 517ns ± 6% -15.78% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8 619ns ± 6% 534ns ± 4% -13.61% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8 607ns ± 7% 520ns ± 2% -14.39% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8 599ns ± 4% 501ns ± 7% -16.36% (p=0.008 n=5+5) name old speed new speed delta Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8 23.9MB/s ± 8% 37.3MB/s ± 1% +56.23% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8 22.6MB/s ± 2% 38.1MB/s ± 1% +68.81% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8 23.3MB/s ± 0% 38.7MB/s ± 9% +66.06% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8 24.4MB/s ± 0% 36.0MB/s ± 3% +47.73% (p=0.016 n=4+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8 22.2MB/s ± 8% 37.3MB/s ± 9% +68.09% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8 34.4MB/s ± 4% 43.7MB/s ± 1% +27.08% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8 36.9MB/s ± 0% 43.4MB/s ± 2% +17.84% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8 36.2MB/s ± 4% 43.1MB/s ± 2% +19.02% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8 37.6MB/s ± 1% 43.3MB/s ± 3% +15.02% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8 36.0MB/s ± 2% 43.2MB/s ±10% +19.87% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8 36.5MB/s ± 5% 41.8MB/s ± 9% +14.39% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8 35.9MB/s ± 5% 42.7MB/s ± 6% +18.83% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8 35.6MB/s ± 6% 41.2MB/s ± 4% +15.66% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8 36.3MB/s ± 6% 42.3MB/s ± 2% +16.69% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8 36.7MB/s ± 4% 44.0MB/s ± 7% +19.69% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8 325B ± 0% 76B ± 0% -76.62% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8 358B ± 0% 49B ± 0% ~ (p=0.079 n=4+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8 340B ± 0% 29B ± 0% -91.47% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8 328B ± 0% 18B ± 0% -94.51% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8 325B ± 0% 14B ± 0% -95.69% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8 226B ± 0% 2B ± 0% ~ (p=0.079 n=4+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8 228B ± 0% 3B ± 0% -98.69% (p=0.000 n=5+4) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8 228B ± 0% 2B ± 0% -99.12% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8 228B ± 0% 2B ± 0% -99.12% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8 226B ± 0% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8 388B ± 2% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8 391B ± 2% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8 389B ± 1% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8 391B ± 2% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8 390B ± 1% 0B -100.00% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8 0.00 0.00 ~ (all equal) ``` Release note (bug fix): The GC process was improved to paginate the key versions of a key to fix OOM crashes which can occur when there are extremely large numbers of versions for a given key.

ricardocrdb added the O-community Originated from the community label Nov 20, 2019

heidawei mentioned this issue Nov 21, 2019

Fast gc #42640

Closed

ajwerner self-assigned this Nov 21, 2019

ricardocrdb added the A-kv Anything in KV that doesn't belong in a more specific category. label Nov 25, 2019

ajwerner mentioned this issue Jan 10, 2020

storage/gc: create gc pkg, reverse iteration in rditer, paginate versions during GC #43862

Merged

craig bot closed this as completed in 8c6eb45 Jan 21, 2020

ajwerner mentioned this issue Jul 10, 2020

sql: improve sequence performance by implementing cache sequence modifier #51259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM occurred while running replica GC old version #42531

OOM occurred while running replica GC old version #42531

heidawei commented Nov 18, 2019

heidawei commented Nov 18, 2019

ajwerner commented Nov 21, 2019

heidawei commented Nov 22, 2019

heidawei commented Nov 22, 2019

heidawei commented Nov 22, 2019

ajwerner commented Nov 22, 2019

heidawei commented Nov 22, 2019

heidawei commented Nov 22, 2019

heidawei commented Nov 22, 2019

heidawei commented Nov 22, 2019

ajwerner commented Nov 22, 2019

ajwerner commented Nov 22, 2019

heidawei commented Nov 26, 2019

OOM occurred while running replica GC old version #42531

OOM occurred while running replica GC old version #42531

Comments

heidawei commented Nov 18, 2019

heidawei commented Nov 18, 2019

ajwerner commented Nov 21, 2019

heidawei commented Nov 22, 2019

heidawei commented Nov 22, 2019

heidawei commented Nov 22, 2019

ajwerner commented Nov 22, 2019

heidawei commented Nov 22, 2019

heidawei commented Nov 22, 2019

heidawei commented Nov 22, 2019

heidawei commented Nov 22, 2019

ajwerner commented Nov 22, 2019

ajwerner commented Nov 22, 2019

heidawei commented Nov 26, 2019