Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: stats-based rebalancing can't handle large numbers of splits and scatters #17671

Closed
benesch opened this issue Aug 15, 2017 · 12 comments
Closed
Assignees
Milestone

Comments

@benesch
Copy link
Contributor

benesch commented Aug 15, 2017

Discovered in #17644. Here's a RESTORE with stats-based rebalancing enabled:

screenshot 2017-08-15 13 20 53

And without:

screenshot 2017-08-15 12 50 10

benesch added a commit to benesch/cockroach that referenced this issue Aug 15, 2017
Stats-based rebalancing interacts badly with restore at the moment, so
disable it for now. Tracked in cockroachdb#17671.
@a-robinson
Copy link
Contributor

a-robinson commented Aug 16, 2017

It'd be nice if the screenshots also included the "Keys Written per Second per Store" graph, but without that data to go on I expect that what's going on here is a mismatch of expectations between scatter and stats-based rebalancing.

We balance on three dimensions -- replica count, fraction of disk used, and writes per second. We require a rebalance to improve two of those dimensions before we decide to do it, so if a cluster is well balanced in the latter two dimensions than an imbalanced range count won't motivate a rebalance.

Doing a lot of splits only raises the replica count without affecting the fraction of disk used or writes per second, so it makes sense that the allocator doesn't see the need to rebalance afterwards. Stats-based rebalancing has its issues (such as #17691), but this is more of a case of the general-purpose stats-based scoring logic not working well for the specific use case of wanting to spread out a particular subset of ranges.

@petermattis
Copy link
Collaborator

@a-robinson This is easily reproducible on sky. I believe I mentioned this elsewhere, but perhaps SCATTER should use different thresholds/settings from normal rebalancing. For example, perhaps the expectation for SCATTER is even range balance. (Longer term) we might even add an option to SCATTER to specify what we want to scatter on.

@a-robinson
Copy link
Contributor

Weird, github never emailed me about your response here.

Yes, I agree that it would be best if SCATTER used different thresholds/settings than normal rebalancing. Its goals are notably different from general rebalancing - it wants the ranges from the provided key space as spread out as possible in preparation for some incoming load on them.

What I need to think about more, though, is whether there's something simple to do for 1.1, because I don't think an entirely different rebalancing strategy is in the cards.

@petermattis
Copy link
Collaborator

What I need to think about more, though, is whether there's something simple to do for 1.1, because I don't think an entirely different rebalancing strategy is in the cards.

Right now the thresholds for stats-based rebalancing (and whether it is enabled) is done via cluster settings. Could we instead pass in an AllocatorOptions struct and use a different set of options for scatter. For 1.1, we could have AllocatorOptions { statsBasedRebalancingEnabled bool }.

@benesch
Copy link
Contributor Author

benesch commented Aug 17, 2017

What if scatter simply plumbed "don't use stats" into the allocator?

benesch added a commit to benesch/cockroach that referenced this issue Aug 21, 2017
Stats-based rebalancing interacts badly with restore at the moment, so
disable it for now. Tracked in cockroachdb#17671.
benesch added a commit to benesch/cockroach that referenced this issue Aug 22, 2017
Stats-based rebalancing interacts badly with restore at the moment, so
disable it for now. Tracked in cockroachdb#17671.
@a-robinson a-robinson added this to the 1.2 milestone Aug 28, 2017
@a-robinson
Copy link
Contributor

That would be better than the current behavior, but there will still be times when that won't do anything. For example, if node x has the lease for the range being split and scattered, but it has many fewer ranges on it than other nodes in the cluster (due to its ranges having more data and higher write throughput), then even switching to a non-stats-based mode might not move anything.

Given that stats-based rebalancing will be disabled by default in 1.1 (#17968), I think we can push off a real fix to 1.2.

@a6802739
Copy link
Contributor

a6802739 commented Sep 7, 2017

@a-robinson Sorry, I couldn't quite understand what does the scatter mean here?

I just can understand as you said For example, if node x has the lease for the range being split and scattered, but it has many fewer ranges on it than other nodes in the cluster (due to its ranges having more data and higher write throughput), the node x may have fewer ranges ue to its ranges having more data and higher write throughput. But if we pre-split a table to generate many empty ranges, it may not lead to many rebalances based on statsBasedRebalancing.

@a-robinson
Copy link
Contributor

Scatter is a method used by RESTORE to spread around the replicas and leases for a bunch of new, empty ranges that were created by splits. It's also exposed via the ALTER TABLE ... SCATTER ... SQL statement.

The intent of the person calling Scatter is to scatter the replicas and leases for the provided key range all around the cluster so that they're very spread out.

@a6802739
Copy link
Contributor

a6802739 commented Sep 8, 2017

So what we should do here is to decrease the thresholds for SCATTER or disable the stats-based rebalancing, And after the RESTORE has been executed, we should give back the thresholds?

@a-robinson
Copy link
Contributor

I think there are quite a few possible solutions here, and I'm not settled on any particular one. Disabling stats-based rebalancing when scattering would be one such solution :)

@a6802739
Copy link
Contributor

a6802739 commented Sep 13, 2017

@a-robinson I just test it manually.
Here is the diff just like I propose in PR #18426

--- a/pkg/sql/planner.go
+++ b/pkg/sql/planner.go
@@ -19,6 +19,7 @@ import (

        "github.com/cockroachdb/cockroach/pkg/config"
        "github.com/cockroachdb/cockroach/pkg/internal/client"
+       "github.com/cockroachdb/cockroach/pkg/settings/cluster"
        "github.com/cockroachdb/cockroach/pkg/sql/mon"
        "github.com/cockroachdb/cockroach/pkg/sql/parser"
        "github.com/cockroachdb/cockroach/pkg/sql/sqlbase"
@@ -133,6 +134,7 @@ func makeInternalPlanner(
                context:  ctx,
                tables:   TableCollection{databaseCache: newDatabaseCache(config.SystemConfig{})},
        }
+       s.execCfg = &ExecutorConfig{Settings: cluster.MakeClusterSettings(cluster.BinaryMinimumSupportedVersion, cluster.BinaryServerVersion)}

        s.mon = mon.MakeUnlimitedMonitor(ctx,
                "internal-root",
diff --git a/pkg/sql/split_at.go b/pkg/sql/split_at.go
index 304d14cc1..3831fe190 100644
--- a/pkg/sql/split_at.go
+++ b/pkg/sql/split_at.go
@@ -429,6 +429,14 @@ type scatterNode struct {
 }

 func (n *scatterNode) Start(params runParams) error {
+       // we should disable stats-based rebalancing first.
+       ie := InternalExecutor{LeaseManager: params.p.LeaseMgr()}
+       if _, err := ie.ExecuteStatementInTransaction(
+               params.ctx, "disable-setting", params.p.txn,
+               "SET CLUSTER SETTING kv.allocator.stat_based_rebalancing.enabled = $1", false,
+       ); err != nil {
+               return err
+       }
        db := params.p.ExecCfg().DB
        req := &roachpb.AdminScatterRequest{
                Span: roachpb.Span{Key: n.span.Key, EndKey: n.span.EndKey},
diff --git a/pkg/storage/allocator_scorer.go b/pkg/storage/allocator_scorer.go
index 8d50b92f7..e41319555 100644
--- a/pkg/storage/allocator_scorer.go
+++ b/pkg/storage/allocator_scorer.go
@@ -48,7 +48,7 @@ const (
 var EnableStatsBasedRebalancing = settings.RegisterBoolSetting(
        "kv.allocator.stat_based_rebalancing.enabled",
        "set to enable rebalancing of range replicas based on write load and disk usage",
-       false,
+       true,
 )

I start four nodes with command

./cockroach start --store=/Users/baidu/data1 --http-port=8080 --port=26660 --insecure --background
./cockroach start --store=/Users/baidu/data2 --http-port=8081 --port=26661 --join=127.0.0.1:26660 --join=127.0.0.1:26661 --insecure --background
./cockroach start --store=/Users/baidu/data3 --http-port=8082 --port=26662 --join=127.0.0.1:26660 --join=127.0.0.1:26661 --insecure --background
./cockroach start --store=/Users/baidu/data4 --http-port=8083 --port=26663 --join=127.0.0.1:26660 --join=127.0.0.1:26661 --insecure --background

and the I run the split command to split my test table to 20000 ranges:

873d70ffb5852c72c08f0fb88d7aacf9

then I run

ALTER TABLE sh_test SCATTER;

I found the replica starts to spread through the cluster.

cf928bc7bfb0aa3ecd03d766ee40bfce

Then I found in the system table system.table

root@:26661/> SELECT * FROM system.settings;
+---------------------------------------------+--------------------------------------------------------------------+----------------------------------+-----------+
|                    name                     |                               value                                |           lastUpdated            | valueType |
+---------------------------------------------+--------------------------------------------------------------------+----------------------------------+-----------+
| diagnostics.reporting.enabled               | true                                                               | 2017-09-13 09:57:43.49175+00:00  | b         |
| kv.allocator.stat_based_rebalancing.enabled | false                                                              | 2017-09-13 10:49:02.731652+00:00 | b         |
| trace.debug.enable                          | false                                                              | 2017-09-13 09:57:43.517504+00:00 | b         |
| version                                     | "\n\b\b\x01\x10\x00\x18\x00 \x03\x12\b\b\x01\x10\x00\x18\x00 \x03" | 2017-09-13 09:57:43.507639+00:00 | m         |
+---------------------------------------------+--------------------------------------------------------------------+----------------------------------+-----------+
(4 rows)

Time: 7.113467ms

So it seems the PR in #18426 starts to work.

So I wonder how did the setting work in CockroachDB?

@a-robinson
Copy link
Contributor

@a6802739 the reason that worked is because you actually changed the cluster-wide setting for kv.allocator.stat_based_rebalancing.enabled from true to false. This is confirmed by your last excerpt of SELECT * FROM system.settings that shows the value is false even though it was initially set to true. We don't want to change the value of the actual cluster-wide setting when Scatter is run, since that will effect much more than just the range being scattered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants