storage: stats-based rebalancing can't handle large numbers of splits and scatters #17671

benesch · 2017-08-15T20:44:41Z

Discovered in #17644. Here's a RESTORE with stats-based rebalancing enabled:

And without:

Stats-based rebalancing interacts badly with restore at the moment, so disable it for now. Tracked in cockroachdb#17671.

a-robinson · 2017-08-16T19:01:25Z

It'd be nice if the screenshots also included the "Keys Written per Second per Store" graph, but without that data to go on I expect that what's going on here is a mismatch of expectations between scatter and stats-based rebalancing.

We balance on three dimensions -- replica count, fraction of disk used, and writes per second. We require a rebalance to improve two of those dimensions before we decide to do it, so if a cluster is well balanced in the latter two dimensions than an imbalanced range count won't motivate a rebalance.

Doing a lot of splits only raises the replica count without affecting the fraction of disk used or writes per second, so it makes sense that the allocator doesn't see the need to rebalance afterwards. Stats-based rebalancing has its issues (such as #17691), but this is more of a case of the general-purpose stats-based scoring logic not working well for the specific use case of wanting to spread out a particular subset of ranges.

petermattis · 2017-08-17T01:42:40Z

@a-robinson This is easily reproducible on sky. I believe I mentioned this elsewhere, but perhaps SCATTER should use different thresholds/settings from normal rebalancing. For example, perhaps the expectation for SCATTER is even range balance. (Longer term) we might even add an option to SCATTER to specify what we want to scatter on.

a-robinson · 2017-08-17T17:47:12Z

Weird, github never emailed me about your response here.

Yes, I agree that it would be best if SCATTER used different thresholds/settings than normal rebalancing. Its goals are notably different from general rebalancing - it wants the ranges from the provided key space as spread out as possible in preparation for some incoming load on them.

What I need to think about more, though, is whether there's something simple to do for 1.1, because I don't think an entirely different rebalancing strategy is in the cards.

petermattis · 2017-08-17T17:51:23Z

What I need to think about more, though, is whether there's something simple to do for 1.1, because I don't think an entirely different rebalancing strategy is in the cards.

Right now the thresholds for stats-based rebalancing (and whether it is enabled) is done via cluster settings. Could we instead pass in an AllocatorOptions struct and use a different set of options for scatter. For 1.1, we could have AllocatorOptions { statsBasedRebalancingEnabled bool }.

benesch · 2017-08-17T17:51:29Z

What if scatter simply plumbed "don't use stats" into the allocator?

Stats-based rebalancing interacts badly with restore at the moment, so disable it for now. Tracked in cockroachdb#17671.

a-robinson · 2017-08-28T19:29:40Z

That would be better than the current behavior, but there will still be times when that won't do anything. For example, if node x has the lease for the range being split and scattered, but it has many fewer ranges on it than other nodes in the cluster (due to its ranges having more data and higher write throughput), then even switching to a non-stats-based mode might not move anything.

Given that stats-based rebalancing will be disabled by default in 1.1 (#17968), I think we can push off a real fix to 1.2.

a6802739 · 2017-09-07T11:41:59Z

@a-robinson Sorry, I couldn't quite understand what does the scatter mean here?

I just can understand as you said For example, if node x has the lease for the range being split and scattered, but it has many fewer ranges on it than other nodes in the cluster (due to its ranges having more data and higher write throughput), the node x may have fewer ranges ue to its ranges having more data and higher write throughput. But if we pre-split a table to generate many empty ranges, it may not lead to many rebalances based on statsBasedRebalancing.

a-robinson · 2017-09-08T05:33:02Z

Scatter is a method used by RESTORE to spread around the replicas and leases for a bunch of new, empty ranges that were created by splits. It's also exposed via the ALTER TABLE ... SCATTER ... SQL statement.

The intent of the person calling Scatter is to scatter the replicas and leases for the provided key range all around the cluster so that they're very spread out.

a6802739 · 2017-09-08T11:38:43Z

So what we should do here is to decrease the thresholds for SCATTER or disable the stats-based rebalancing, And after the RESTORE has been executed, we should give back the thresholds？

a-robinson · 2017-09-08T18:00:09Z

I think there are quite a few possible solutions here, and I'm not settled on any particular one. Disabling stats-based rebalancing when scattering would be one such solution :)

a6802739 · 2017-09-13T11:17:18Z

@a-robinson I just test it manually.
Here is the diff just like I propose in PR #18426

--- a/pkg/sql/planner.go
+++ b/pkg/sql/planner.go
@@ -19,6 +19,7 @@ import (

        "github.com/cockroachdb/cockroach/pkg/config"
        "github.com/cockroachdb/cockroach/pkg/internal/client"
+       "github.com/cockroachdb/cockroach/pkg/settings/cluster"
        "github.com/cockroachdb/cockroach/pkg/sql/mon"
        "github.com/cockroachdb/cockroach/pkg/sql/parser"
        "github.com/cockroachdb/cockroach/pkg/sql/sqlbase"
@@ -133,6 +134,7 @@ func makeInternalPlanner(
                context:  ctx,
                tables:   TableCollection{databaseCache: newDatabaseCache(config.SystemConfig{})},
        }
+       s.execCfg = &ExecutorConfig{Settings: cluster.MakeClusterSettings(cluster.BinaryMinimumSupportedVersion, cluster.BinaryServerVersion)}

        s.mon = mon.MakeUnlimitedMonitor(ctx,
                "internal-root",
diff --git a/pkg/sql/split_at.go b/pkg/sql/split_at.go
index 304d14cc1..3831fe190 100644
--- a/pkg/sql/split_at.go
+++ b/pkg/sql/split_at.go
@@ -429,6 +429,14 @@ type scatterNode struct {
 }

 func (n *scatterNode) Start(params runParams) error {
+       // we should disable stats-based rebalancing first.
+       ie := InternalExecutor{LeaseManager: params.p.LeaseMgr()}
+       if _, err := ie.ExecuteStatementInTransaction(
+               params.ctx, "disable-setting", params.p.txn,
+               "SET CLUSTER SETTING kv.allocator.stat_based_rebalancing.enabled = $1", false,
+       ); err != nil {
+               return err
+       }
        db := params.p.ExecCfg().DB
        req := &roachpb.AdminScatterRequest{
                Span: roachpb.Span{Key: n.span.Key, EndKey: n.span.EndKey},
diff --git a/pkg/storage/allocator_scorer.go b/pkg/storage/allocator_scorer.go
index 8d50b92f7..e41319555 100644
--- a/pkg/storage/allocator_scorer.go
+++ b/pkg/storage/allocator_scorer.go
@@ -48,7 +48,7 @@ const (
 var EnableStatsBasedRebalancing = settings.RegisterBoolSetting(
        "kv.allocator.stat_based_rebalancing.enabled",
        "set to enable rebalancing of range replicas based on write load and disk usage",
-       false,
+       true,
 )

I start four nodes with command

./cockroach start --store=/Users/baidu/data1 --http-port=8080 --port=26660 --insecure --background
./cockroach start --store=/Users/baidu/data2 --http-port=8081 --port=26661 --join=127.0.0.1:26660 --join=127.0.0.1:26661 --insecure --background
./cockroach start --store=/Users/baidu/data3 --http-port=8082 --port=26662 --join=127.0.0.1:26660 --join=127.0.0.1:26661 --insecure --background
./cockroach start --store=/Users/baidu/data4 --http-port=8083 --port=26663 --join=127.0.0.1:26660 --join=127.0.0.1:26661 --insecure --background

and the I run the split command to split my test table to 20000 ranges:

then I run

ALTER TABLE sh_test SCATTER;

I found the replica starts to spread through the cluster.

Then I found in the system table system.table

root@:26661/> SELECT * FROM system.settings;
+---------------------------------------------+--------------------------------------------------------------------+----------------------------------+-----------+
|                    name                     |                               value                                |           lastUpdated            | valueType |
+---------------------------------------------+--------------------------------------------------------------------+----------------------------------+-----------+
| diagnostics.reporting.enabled               | true                                                               | 2017-09-13 09:57:43.49175+00:00  | b         |
| kv.allocator.stat_based_rebalancing.enabled | false                                                              | 2017-09-13 10:49:02.731652+00:00 | b         |
| trace.debug.enable                          | false                                                              | 2017-09-13 09:57:43.517504+00:00 | b         |
| version                                     | "\n\b\b\x01\x10\x00\x18\x00 \x03\x12\b\b\x01\x10\x00\x18\x00 \x03" | 2017-09-13 09:57:43.507639+00:00 | m         |
+---------------------------------------------+--------------------------------------------------------------------+----------------------------------+-----------+
(4 rows)

Time: 7.113467ms

So it seems the PR in #18426 starts to work.

So I wonder how did the setting work in CockroachDB?

a-robinson · 2017-09-17T17:53:01Z

@a6802739 the reason that worked is because you actually changed the cluster-wide setting for kv.allocator.stat_based_rebalancing.enabled from true to false. This is confirmed by your last excerpt of SELECT * FROM system.settings that shows the value is false even though it was initially set to true. We don't want to change the value of the actual cluster-wide setting when Scatter is run, since that will effect much more than just the range being scattered.

benesch assigned a-robinson Aug 15, 2017

benesch added a commit to benesch/cockroach that referenced this issue Aug 15, 2017

acceptanceccl: disable stats-based rebalancing during tests

ea8681a

Stats-based rebalancing interacts badly with restore at the moment, so disable it for now. Tracked in cockroachdb#17671.

benesch added a commit to benesch/cockroach that referenced this issue Aug 21, 2017

acceptanceccl: disable stats-based rebalancing during tests

d752c39

Stats-based rebalancing interacts badly with restore at the moment, so disable it for now. Tracked in cockroachdb#17671.

benesch added a commit to benesch/cockroach that referenced this issue Aug 22, 2017

acceptanceccl: disable stats-based rebalancing during tests

1262c29

Stats-based rebalancing interacts badly with restore at the moment, so disable it for now. Tracked in cockroachdb#17671.

a-robinson added this to the 1.2 milestone Aug 28, 2017

cuongdo mentioned this issue Aug 28, 2017

storage: re-enable stats-based rebalancing #17979

Closed

6 tasks

a6802739 mentioned this issue Sep 11, 2017

storage : disable stats-based rebalancing when we use SCATTER #18426

Merged

a-robinson closed this as completed in #18426 Oct 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: stats-based rebalancing can't handle large numbers of splits and scatters #17671

storage: stats-based rebalancing can't handle large numbers of splits and scatters #17671

benesch commented Aug 15, 2017 •

edited

Loading

a-robinson commented Aug 16, 2017 •

edited

Loading

petermattis commented Aug 17, 2017

a-robinson commented Aug 17, 2017

petermattis commented Aug 17, 2017

benesch commented Aug 17, 2017

a-robinson commented Aug 28, 2017

a6802739 commented Sep 7, 2017

a-robinson commented Sep 8, 2017

a6802739 commented Sep 8, 2017

a-robinson commented Sep 8, 2017

a6802739 commented Sep 13, 2017 •

edited

Loading

a-robinson commented Sep 17, 2017

storage: stats-based rebalancing can't handle large numbers of splits and scatters #17671

storage: stats-based rebalancing can't handle large numbers of splits and scatters #17671

Comments

benesch commented Aug 15, 2017 • edited Loading

a-robinson commented Aug 16, 2017 • edited Loading

petermattis commented Aug 17, 2017

a-robinson commented Aug 17, 2017

petermattis commented Aug 17, 2017

benesch commented Aug 17, 2017

a-robinson commented Aug 28, 2017

a6802739 commented Sep 7, 2017

a-robinson commented Sep 8, 2017

a6802739 commented Sep 8, 2017

a-robinson commented Sep 8, 2017

a6802739 commented Sep 13, 2017 • edited Loading

a-robinson commented Sep 17, 2017

benesch commented Aug 15, 2017 •

edited

Loading

a-robinson commented Aug 16, 2017 •

edited

Loading

a6802739 commented Sep 13, 2017 •

edited

Loading