Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: splits don't seem to take into account range size properly #21689

Closed
jordanlewis opened this issue Jan 23, 2018 · 4 comments
Closed
Assignees
Milestone

Comments

@jordanlewis
Copy link
Member

I noticed this while playing with the tpcc 1000 warehouse dataset, but it's easy to reproduce with just 10 warehouses.

Under the tpcc data generation load, ranges get queued for splitting due to their size despite not actually being over the range size limit.

To observe this, simply run a 1-node cockroach cluster locally and run ./tpcc -load -warehouses=10. After a short while, you should notice that the cluster is performing a ton of splits on many different tables.

For example, after loading 10 warehouses, the stock table has 232 ranges. The first one is pretty large, containing more than a single 100,000 row warehouse. The second one is of similar size. The third is fairly small, containing just below 18,000 rows, and the rest are very small, containing only a few thousand rows each. Here's a snippet of the ranges:

root@:26257/tpcc> show testing_ranges from table stock;
+-----------+-----------+----------+----------+--------------+
| Start Key |  End Key  | Range ID | Replicas | Lease Holder |
+-----------+-----------+----------+----------+--------------+
| NULL      | /1/699    |       59 | {1}      |            1 |
| /1/699    | /2/1384   |       61 | {1}      |            1 |
| /2/1384   | /2/18000  |       62 | {1}      |            1 |
| /2/18000  | /2/20000  |       63 | {1}      |            1 |
| /2/20000  | /2/23000  |       64 | {1}      |            1 |
| /2/23000  | /2/27000  |       65 | {1}      |            1 |
| /2/27000  | /2/30000  |       66 | {1}      |            1 |
| /2/30000  | /2/34000  |       67 | {1}      |            1 |
| /2/34000  | /2/36000  |       68 | {1}      |            1 |
| /2/36000  | /2/41000  |       69 | {1}      |            1 |
| /2/41000  | /2/43000  |       70 | {1}      |            1 |
<snip>

As a total split queue novice, I poked around and added some debug output:

diff --git a/pkg/storage/split_queue.go b/pkg/storage/split_queue.go
index 8d9e5f9be..e26eea045 100644
--- a/pkg/storage/split_queue.go
+++ b/pkg/storage/split_queue.go
@@ -20,6 +20,8 @@ import (

        "github.com/pkg/errors"

+       "fmt"
+
        "github.com/cockroachdb/cockroach/pkg/config"
        "github.com/cockroachdb/cockroach/pkg/gossip"
        "github.com/cockroachdb/cockroach/pkg/internal/client"
@@ -83,6 +85,7 @@ func (sq *splitQueue) shouldQueue(
        // Add priority based on the size of range compared to the max
        // size for the zone it's in.
        if ratio := float64(repl.GetMVCCStats().Total()) / float64(repl.GetMaxBytes()); ratio > 1 {
+               fmt.Println("Ratio: ", ratio, desc.RangeID, desc.StartKey, desc.EndKey, repl.GetMVCCStats().Total(), repl.GetMaxBytes())
                priority += ratio
                shouldQ = true
        }

This log line gets fired many times for a particular range when a split happens, claiming that the result of Total() is in fact greater than 64 megabytes. This is empirically false - the rows in these small ranges are no larger bytes-wise than those in the large ranges.

So, my hypothesis is that something's overcounting range size. Since this behavior takes a while to kick in (a couple warehouses), I would guess that there's an issue during range splits themselves that causes the size of the new range to be overcounted.

cc @petermattis @tschottdorf as likely candidates for people who know about MVCC stats.

@petermattis petermattis added this to the 2.0 milestone Jan 23, 2018
@nvanbenschoten
Copy link
Member

I just confirmed for my own sanity that this wasn't caused by #21562.

@petermattis
Copy link
Collaborator

@tschottdorf You've had your head the MVCC stats muck recently.

@tbg
Copy link
Member

tbg commented Jan 23, 2018

This was introduced by #21078 but I don't understand what's going wrong yet -- it doesn't have to do anything with with the new functionality introduced in that change. I'm still figuring out the details, but whatever it is, I think I'll get to this soon since it's so easy to repro. In the meantime, take a look at the logging I added below. stats LHS is what the left side of the split (post split) thinks it has. computed LHS is what it actually has. Ditto for RHS.

Reading these numbers, the LHS thinks it's empty, but it's actually about half of the data. The RHS thinks it's everything, but it's really only the other half (to the point where the numbers add up exactly in most columns). I'm very curious how this came to be, and even more so, how it's caused by the PR linked above.

stats LHS    enginepb.MVCCStats{ContainsEstimates:false, LastUpdateNanos:1516735821067912000, IntentAge:0, GCBytesAge:0, LiveBytes:0, LiveCount:0, KeyBytes:0, KeyCount:0, ValBytes:0, ValCount:0, IntentBytes:0, IntentCount:0, SysBytes:637, SysCount:6}
computed LHS enginepb.MVCCStats{ContainsEstimates:false, LastUpdateNanos:1516735821067912000, IntentAge:0, GCBytesAge:0, LiveBytes:33554320, LiveCount:100694, KeyBytes:2047617, KeyCount:100694, ValBytes:31506703, ValCount:100694, IntentBytes:0, IntentCount:0, SysBytes:1017, SysCount:12}
stats RHS    enginepb.MVCCStats{ContainsEstimates:false, LastUpdateNanos:1516735821067912000, IntentAge:0, GCBytesAge:0, LiveBytes:69638828, LiveCount:209000, KeyBytes:4247838, KeyCount:209000, ValBytes:65390990, ValCount:209000, IntentBytes:0, IntentCount:0, SysBytes:915, SysCount:16}
computed RHS enginepb.MVCCStats{ContainsEstimates:false, LastUpdateNanos:1516735821067912000, IntentAge:0, GCBytesAge:0, LiveBytes:36084508, LiveCount:108306, KeyBytes:2200221, KeyCount:108306, ValBytes:33884287, ValCount:108306, IntentBytes:0, IntentCount:0, SysBytes:535, SysCount:10}

@tbg
Copy link
Member

tbg commented Jan 23, 2018

with the code in the split trigger (I added engineLeftMS) I see that this is the origin of the problem: leftMS has most fields zero (engineLeftMS does not).

	leftMS, err := rditer.ComputeStatsForRange(&split.LeftDesc, batch, ts.WallTime)
	if err != nil {
		return enginepb.MVCCStats{}, result.Result{}, errors.Wrap(err, "unable to compute stats for LHS range after split")
	}
	log.Event(ctx, "computed stats for left hand side range")

	log.Warningf(ctx, "orig ms %s\nleft ms %s", pretty.Sprint(origBothMS), pretty.Sprint(leftMS))

	engineLeftMS, err := rditer.ComputeStatsForRange(&split.LeftDesc, rec.Engine(), ts.WallTime)
	if err != nil {
		return enginepb.MVCCStats{}, result.Result{}, err
	}
	log.Warningf(ctx, "recomp left ms %s", pretty.Sprint(engineLeftMS))

The problem is in (*rocksDBBatch).NewTimeBoundIterator which makes all future calls to NewIterator() return the time bounded iterator. Sending a fix.

@tbg tbg closed this as completed in 61dd154 Jan 23, 2018
tbg added a commit to tbg/cockroach that referenced this issue Jan 23, 2018
Release note: None

Without cockroachdb#21721, this fails with

> saw 0 values in regular iterator, but expected 4.

Touches cockroachdb#21721.
Touches cockroachdb#21689.
tbg added a commit to tbg/cockroach that referenced this issue Jan 23, 2018
Release note: None

Without cockroachdb#21721, this fails with

> saw 0 values in regular iterator, but expected 4.

Touches cockroachdb#21721.
Touches cockroachdb#21689.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants