-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: splits don't seem to take into account range size properly #21689
Comments
I just confirmed for my own sanity that this wasn't caused by #21562. |
@tschottdorf You've had your head the MVCC stats muck recently. |
This was introduced by #21078 but I don't understand what's going wrong yet -- it doesn't have to do anything with with the new functionality introduced in that change. I'm still figuring out the details, but whatever it is, I think I'll get to this soon since it's so easy to repro. In the meantime, take a look at the logging I added below. Reading these numbers, the LHS thinks it's empty, but it's actually about half of the data. The RHS thinks it's everything, but it's really only the other half (to the point where the numbers add up exactly in most columns). I'm very curious how this came to be, and even more so, how it's caused by the PR linked above.
|
with the code in the split trigger (I added leftMS, err := rditer.ComputeStatsForRange(&split.LeftDesc, batch, ts.WallTime)
if err != nil {
return enginepb.MVCCStats{}, result.Result{}, errors.Wrap(err, "unable to compute stats for LHS range after split")
}
log.Event(ctx, "computed stats for left hand side range")
log.Warningf(ctx, "orig ms %s\nleft ms %s", pretty.Sprint(origBothMS), pretty.Sprint(leftMS))
engineLeftMS, err := rditer.ComputeStatsForRange(&split.LeftDesc, rec.Engine(), ts.WallTime)
if err != nil {
return enginepb.MVCCStats{}, result.Result{}, err
}
log.Warningf(ctx, "recomp left ms %s", pretty.Sprint(engineLeftMS)) The problem is in |
Release note: None Without cockroachdb#21721, this fails with > saw 0 values in regular iterator, but expected 4. Touches cockroachdb#21721. Touches cockroachdb#21689.
Release note: None Without cockroachdb#21721, this fails with > saw 0 values in regular iterator, but expected 4. Touches cockroachdb#21721. Touches cockroachdb#21689.
I noticed this while playing with the
tpcc
1000 warehouse dataset, but it's easy to reproduce with just 10 warehouses.Under the
tpcc
data generation load, ranges get queued for splitting due to their size despite not actually being over the range size limit.To observe this, simply run a 1-node cockroach cluster locally and run
./tpcc -load -warehouses=10
. After a short while, you should notice that the cluster is performing a ton of splits on many different tables.For example, after loading 10 warehouses, the
stock
table has 232 ranges. The first one is pretty large, containing more than a single 100,000 row warehouse. The second one is of similar size. The third is fairly small, containing just below 18,000 rows, and the rest are very small, containing only a few thousand rows each. Here's a snippet of the ranges:As a total split queue novice, I poked around and added some debug output:
This log line gets fired many times for a particular range when a split happens, claiming that the result of Total() is in fact greater than 64 megabytes. This is empirically false - the rows in these small ranges are no larger bytes-wise than those in the large ranges.
So, my hypothesis is that something's overcounting range size. Since this behavior takes a while to kick in (a couple warehouses), I would guess that there's an issue during range splits themselves that causes the size of the new range to be overcounted.
cc @petermattis @tschottdorf as likely candidates for people who know about MVCC stats.
The text was updated successfully, but these errors were encountered: