Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: make lease rebalancing decisions at the store level #28340

Merged
merged 3 commits into from
Aug 20, 2018

Conversation

a-robinson
Copy link
Contributor

In order to better balance the QPS being served by each store to avoid
overloaded nodes.

Fixes #21419

Release note (performance improvement): Range leases will be
automatically rebalanced throughout the cluster to even out the amount
of QPS being handled by each node.

@a-robinson a-robinson requested review from BramGruneir and a team August 7, 2018 17:28
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@a-robinson
Copy link
Contributor Author

Tested primarily via tpc-c on roachprod and the new roachtest. I should give allocsim a look too but am going to start integrating replica-rebalancing decisions as a follow-up.

@a-robinson a-robinson force-pushed the store-rebalance branch 2 times, most recently from 05241f9 to b3cbe23 Compare August 13, 2018 20:58
Copy link
Member

@BramGruneir BramGruneir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: with some nits.

But I worry about the test being appropriate with KV.

Reviewed 3 of 3 files at r1, 8 of 8 files at r2, 3 of 3 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/cmd/roachtest/rebalance_load.go, line 16 at r3 (raw file):

// for names of contributors.

package main

So this is still checking with kv to make sure the leases are evenly distributed.

But the workload is evenly distributed.

Perhaps KV is not the right load for this and instead use something that stresses some parts more than others. And instead checks to see if QPS is evenly distributed. (This might be tough without load based rebalancing)


pkg/storage/allocator.go, line 1037 at r2 (raw file):

	// rebalancing is enabled? In happy cases it's nice to keep this working
	// to even out the number of leases in addition to the number of replicas,
	// but it's certainly a blunt instrument that could undo what we want.

I think this might be a good idea. But of course, requires a lot of testing.


pkg/storage/replicate_queue.go, line 563 at r2 (raw file):

			return false, nil
		}
		if err := rq.transferLease(ctx, repl, target); err != nil {

I think you can simplify this.

err := rq.transferLease(ctx, repl, target)
return err == nil, err

And reverse if case above from target != (roachpb.ReplicaDescriptor{}) to target == (roachpb.ReplicaDesciptor{}) and early exit. to avoid nesting.


pkg/storage/store_rebalancer.go, line 129 at r2 (raw file):

	for localDesc.Capacity.QueriesPerSecond > qpsMaxThreshold {
		log.Infof(ctx, "considering load-based lease transfers for s%d with %.2f qps (mean=%.2f, upperThreshold=%2.f)",

Are these debug messages? Need to be venet 3 I think.


pkg/storage/store_rebalancer.go, line 152 at r2 (raw file):

// TODO(a-robinson): Should we take the number of leases on each store into
// account here or just continue to let that happen in allocator.go?

How much do lease counts matter if we're balancing on qps anyway?


pkg/storage/store_rebalancer.go, line 187 at r2 (raw file):

		}

		// TODO: Don't bother moving leases whose QPS is below some fraction of the

nit: format the todo with your name. If we still rebalance leases based on counts, this is needed. If we turn that off, then this is fine.

Copy link
Contributor Author

@a-robinson a-robinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/cmd/roachtest/rebalance_load.go, line 16 at r3 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

So this is still checking with kv to make sure the leases are evenly distributed.

But the workload is evenly distributed.

Perhaps KV is not the right load for this and instead use something that stresses some parts more than others. And instead checks to see if QPS is evenly distributed. (This might be tough without load based rebalancing)

Not quite -- this is checking that the 3 leases for the table are evenly distributed, not that all leases are evenly distributed. This was a very simple test that passes with this change and fails most of the time without it because the total number of ranges in a new cluster greatly exceeds the number of ranges (3) in the table used by kv. Without load-based rebalancing, the leases for the kv table will typically all just stay on the node that had the initial range lease before the ALTER TABLE ... SPLIT command, since the other nodes have enough leases for other various ranges that the cluster is still considered balanced.

I agree that you could get slightly different test coverage with a couple other workloads, but in general all the tests I can think of would be testing that a minority of hot ranges get balanced properly, which this one does as well, just in a very simple way.


pkg/storage/allocator.go, line 1037 at r2 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

I think this might be a good idea. But of course, requires a lot of testing.

Yeah, I don't know that we have the time to fully remove this right now. We'll have to try out a couple antagonistic workloads before sticking with this, though.


pkg/storage/replicate_queue.go, line 563 at r2 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

I think you can simplify this.

err := rq.transferLease(ctx, repl, target)
return err == nil, err

And reverse if case above from target != (roachpb.ReplicaDescriptor{}) to target == (roachpb.ReplicaDesciptor{}) and early exit. to avoid nesting.

Done.


pkg/storage/store_rebalancer.go, line 129 at r2 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Are these debug messages? Need to be venet 3 I think.

Sorry, I meant to have this outside the loop. Fixed.

I do think it'll be nice to actually log this at info level, since it'll only happen a max of once a minute and even then only on heavily loaded clusters with a significant qps imbalance.


pkg/storage/store_rebalancer.go, line 152 at r2 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

How much do lease counts matter if we're balancing on qps anyway?

There's still a (small) fixed cost per lease for things like running the replica through all the various queues. It makes the time it takes to drain a node or recover from a node crash more predictable. Also, it may seem silly, but it looks a lot better to new users if the number of leases on each node is roughly even.

It seems desirable to keep lease count roughly even. I'd be happy to hear counterarguments, though, since that would make things simpler. Maybe simplicity is reason enough.


pkg/storage/store_rebalancer.go, line 187 at r2 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

nit: format the todo with your name. If we still rebalance leases based on counts, this is needed. If we turn that off, then this is fine.

Thanks for calling this out. I create TODOs without a name when I want to make sure they're resolved before merging the code.

Copy link
Member

@BramGruneir BramGruneir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

This is great, and a big step forward.

Reviewed 11 of 11 files at r4, 9 of 9 files at r5, 3 of 3 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/cmd/roachtest/rebalance_load.go, line 16 at r3 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Not quite -- this is checking that the 3 leases for the table are evenly distributed, not that all leases are evenly distributed. This was a very simple test that passes with this change and fails most of the time without it because the total number of ranges in a new cluster greatly exceeds the number of ranges (3) in the table used by kv. Without load-based rebalancing, the leases for the kv table will typically all just stay on the node that had the initial range lease before the ALTER TABLE ... SPLIT command, since the other nodes have enough leases for other various ranges that the cluster is still considered balanced.

I agree that you could get slightly different test coverage with a couple other workloads, but in general all the tests I can think of would be testing that a minority of hot ranges get balanced properly, which this one does as well, just in a very simple way.

SGTM!


pkg/storage/store_rebalancer.go, line 129 at r2 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Sorry, I meant to have this outside the loop. Fixed.

I do think it'll be nice to actually log this at info level, since it'll only happen a max of once a minute and even then only on heavily loaded clusters with a significant qps imbalance.

SGTM


pkg/storage/store_rebalancer.go, line 152 at r2 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

There's still a (small) fixed cost per lease for things like running the replica through all the various queues. It makes the time it takes to drain a node or recover from a node crash more predictable. Also, it may seem silly, but it looks a lot better to new users if the number of leases on each node is roughly even.

It seems desirable to keep lease count roughly even. I'd be happy to hear counterarguments, though, since that would make things simpler. Maybe simplicity is reason enough.

Less movement of leases is better overall. Which I think is a strong argument on it's own.

Anyway, not blocking for this PR. But good for a follow up.


pkg/storage/store_rebalancer.go, line 198 at r5 (raw file):

		// just unnecessary churn with no benefit to move leases responsible for,
		// for example, 1 qps on a store with 5000 qps.
		const minQPSFraction = .001

Is it worth pulling this const up to highlight it outside of the function?

Copy link
Contributor Author

@a-robinson a-robinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/storage/store_rebalancer.go, line 198 at r5 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Is it worth pulling this const up to highlight it outside of the function?

It'd have to be given a more descriptive name if it was pulled outside, so I'd go with no.

@a-robinson a-robinson dismissed BramGruneir’s stale review August 15, 2018 21:07

resolved/acked all comments

@a-robinson
Copy link
Contributor Author

cc @nvanbenschoten in case you wanted to take a look at this before submitting. Feel free to pass on doing so, but I wanted to give you a chance if you're interested.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @a-robinson! I'm really excited that we're getting this in for 2.1.

Reviewed 2 of 3 files at r1, 11 of 11 files at r4, 9 of 9 files at r5, 3 of 3 files at r6.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


docs/generated/settings/settings.html, line 21 at r5 (raw file):

<tr><td><code>kv.allocator.load_based_lease_rebalancing.enabled</code></td><td>boolean</td><td><code>true</code></td><td>set to enable rebalancing of range leases based on load and latency</td></tr>
<tr><td><code>kv.allocator.range_rebalance_threshold</code></td><td>float</td><td><code>0.05</code></td><td>minimum fraction away from the mean a store's range count can be before it is considered overfull or underfull</td></tr>
<tr><td><code>kv.allocator.stat_based_rebalancing.enabled</code></td><td>boolean</td><td><code>true</code></td><td>set to enable rebalancing of range replicas based on write load and disk usage</td></tr>

Did this get out of sync with the code?


pkg/cmd/roachtest/rebalance_load.go, line 16 at r3 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

SGTM!

Had the same question as @BramGruneir. That reasoning checks out with me.

Probably worth describing what happens without kv.allocator.stat_based_rebalancing.enabled=true somewhere in the test.


pkg/storage/allocator.go, line 44 at r5 (raw file):

	// baseLoadBasedLeaseRebalanceThreshold is the equivalent of
	// leaseRebalanceThreshold for load-based lease rebalance decisions (i.e.
	// "follow-the-workload"). Its the base threshold for decisions that gets

decisions that get adjusted


pkg/storage/allocator.go, line 1011 at r5 (raw file):

// TODO(a-robinson): Should this be changed to avoid even thinking about lease
// counts now that we try to spread leases and replicas based on QPS? As is it
// may fight back a little bit against store-level QPS--based rebalancing.

double "-"


pkg/storage/replica_rankings.go, line 62 at r4 (raw file):

	rr.mu.Lock()
	defer rr.mu.Unlock()
	if len(rr.mu.byQPS) == 0 && rr.mu.accumulator.qps.Len() > 0 {

The interaction between byQPS and rrAccumulator in this function is subtle. It looked very strange to me until I realized why you gave it these semantics (so that all 128 top replicas are considered and we don't end up staring at the top few every time the StoreRebalancer runs, right?). I'd call that out somewhere. Also call out that because of this behavior we have to be careful about increasing numTopReplicasToTrack or we could end up adding large delays between when the hottest few replicas are considered for rebalancing.


pkg/storage/replica_rankings.go, line 75 at r4 (raw file):

func (a *rrAccumulator) addReplica(repl replicaWithStats) {
	heap.Push(&a.qps, repl)
	if a.qps.Len() > numTopReplicasToTrack {

This is almost always going to be the case during steady-state operations, right? In that case, it seems like there's a natural fast-path that falls out where we compare against the head of the priority queue and only pop and push if the new replica will result in the head being evicted.


pkg/storage/replica_rankings.go, line 80 at r4 (raw file):

}

type rrAccumulator struct {

I'd add a comment to this as well because it is used externally of the replicaRankings abstraction. That will help justify why we have this extra layer of structure instead of having replicaRankings hold an rrPriorityQueue directly.


pkg/storage/store_rebalancer.go, line 45 at r5 (raw file):

// compares to the load on other stores in the cluster and transferring leases
// or replicas away if the local store is overloaded.
type StoreRebalancer struct {

There's an interesting reason why this can't be a queue itself. Might be worth adding that to this comment.


pkg/storage/store_rebalancer.go, line 208 at r5 (raw file):

		desc := replWithStats.repl.Desc()
		log.VEventf(ctx, 3, "considering lease transfer for r%d with %.2f qps", desc.RangeID, replWithStats.qps)
		// TODO(a-robinson): Should we sort these to first examine the stores with lower QPS?

Sorting or even just randomizing sounds like a good idea to me. Otherwise, won't there be a system-wide tendency to transfer leases to replicas at the front of their range descriptors. The comment on RangeDescriptor.Replicas says the order is arbitrary, but I'm not sure that that's true in practice.

Copy link
Contributor Author

@a-robinson a-robinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! I think the only question that may deserve more thought is the naming of the cluster settings. Using the inherited setting names is convenient, but we could do better if we don't mind throwing them out.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


docs/generated/settings/settings.html, line 21 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did this get out of sync with the code?

Good catch, looks like I messed that up. Fixed, although I couldn't figure out how to get the file regenerated automatically.


pkg/cmd/roachtest/rebalance_load.go, line 16 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Had the same question as @BramGruneir. That reasoning checks out with me.

Probably worth describing what happens without kv.allocator.stat_based_rebalancing.enabled=true somewhere in the test.

Done.


pkg/storage/allocator.go, line 44 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

decisions that get adjusted

Done. After this settles down I'd like to rename all the "load-based lease rebalance" naming to reference follow-the-workload instead.


pkg/storage/allocator.go, line 1011 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

double "-"

Done.


pkg/storage/replica_rankings.go, line 62 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The interaction between byQPS and rrAccumulator in this function is subtle. It looked very strange to me until I realized why you gave it these semantics (so that all 128 top replicas are considered and we don't end up staring at the top few every time the StoreRebalancer runs, right?). I'd call that out somewhere. Also call out that because of this behavior we have to be careful about increasing numTopReplicasToTrack or we could end up adding large delays between when the hottest few replicas are considered for rebalancing.

That's one reason, although given that the storeRebalancer continues until it either runs out of replicas or determines the store is balanced, I don't think it's a very important one.

The issue that was more on my mind was that we could be (and often will be, given that sufficient rebalancing activity triggers a re-gossiping of the store descriptor) loading a new accumulator while the storeRebalancer is partway through iterating over all the replicas, causing us to have to re-inspect a bunch of the same replicas again. I guess that's not a huge deal either, though, as long as it doesn't happen repeatedly.

It seems like maybe the consumer of the rankings should just pull out all the rankings at once (similar to how the producer puts all the rankings in at once via the accumulator).


pkg/storage/replica_rankings.go, line 75 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is almost always going to be the case during steady-state operations, right? In that case, it seems like there's a natural fast-path that falls out where we compare against the head of the priority queue and only pop and push if the new replica will result in the head being evicted.

Good point, done.


pkg/storage/replica_rankings.go, line 80 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'd add a comment to this as well because it is used externally of the replicaRankings abstraction. That will help justify why we have this extra layer of structure instead of having replicaRankings hold an rrPriorityQueue directly.

Done.


pkg/storage/store_rebalancer.go, line 45 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

There's an interesting reason why this can't be a queue itself. Might be worth adding that to this comment.

Done.


pkg/storage/store_rebalancer.go, line 208 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Sorting or even just randomizing sounds like a good idea to me. Otherwise, won't there be a system-wide tendency to transfer leases to replicas at the front of their range descriptors. The comment on RangeDescriptor.Replicas says the order is arbitrary, but I'm not sure that that's true in practice.

Done.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

I think the only question that may deserve more thought is the naming of the cluster settings. Using the inherited setting names is convenient, but we could do better if we don't mind throwing them out.

I don't have any strong opinions about this. @BramGruneir do you?

Reviewed 1 of 9 files at r8, 1 of 3 files at r9.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale)


pkg/storage/allocator.go, line 44 at r5 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Done. After this settles down I'd like to rename all the "load-based lease rebalance" naming to reference follow-the-workload instead.

"follow-the-workload" SGTM.


pkg/storage/replica_rankings.go, line 62 at r4 (raw file):

The issue that was more on my mind was that we could be (and often will be, given that sufficient rebalancing activity triggers a re-gossiping of the store descriptor) loading a new accumulator while the storeRebalancer is partway through iterating over all the replicas, causing us to have to re-inspect a bunch of the same replicas again. I guess that's not a huge deal either, though, as long as it doesn't happen repeatedly.

It's relatively cheap to inspect replicas that don't need to be rebalanced though, right?

It seems like maybe the consumer of the rankings should just pull out all the rankings at once (similar to how the producer puts all the rankings in at once via the accumulator).

That would be easier to understand. This is kind of prescriptive at the moment, so I think it makes sense to push the specialized complexity to the consumer. Feel free to leave a TODO instead of having that slow down the merge.

This makes it easier to pick replicas whose leases should be transferred
when a store is overloaded.

Release note: None
Copy link
Contributor Author

@a-robinson a-robinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale)


pkg/storage/replica_rankings.go, line 62 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The issue that was more on my mind was that we could be (and often will be, given that sufficient rebalancing activity triggers a re-gossiping of the store descriptor) loading a new accumulator while the storeRebalancer is partway through iterating over all the replicas, causing us to have to re-inspect a bunch of the same replicas again. I guess that's not a huge deal either, though, as long as it doesn't happen repeatedly.

It's relatively cheap to inspect replicas that don't need to be rebalanced though, right?

It seems like maybe the consumer of the rankings should just pull out all the rankings at once (similar to how the producer puts all the rankings in at once via the accumulator).

That would be easier to understand. This is kind of prescriptive at the moment, so I think it makes sense to push the specialized complexity to the consumer. Feel free to leave a TODO instead of having that slow down the merge.

Yeah, done.

In order to better balance the QPS being served by each store to avoid
overloaded nodes.

Fixes cockroachdb#21419

Release note (performance improvement): Range leases will be
automatically rebalanced throughout the cluster to even out the amount
of QPS being handled by each node.
It consistently passes with store-level load-based lease rebalancing,
but fails more often than not without it.

Release note: None
@BramGruneir
Copy link
Member

I think the only question that may deserve more thought is the naming of the cluster settings. Using the inherited setting names is convenient, but we could do better if we don't mind throwing them out.

I don't have any strong opinions about this. @BramGruneir do you?

I have no problem throwing out the old ones. But we have to be sure that we stop a cluster upgrade if someone is using an old one that not set to the default. They most likely changed it for a reason.

@a-robinson
Copy link
Contributor Author

I have no problem throwing out the old ones. But we have to be sure that we stop a cluster upgrade if someone is using an old one that not set to the default. They most likely changed it for a reason.

Is there any precedent for that? Using the stats-based rebalancing feature has been pretty strongly discouraged. People may be using it in test clusters, but I'd be pretty surprised if it's being used in any real production environments. Also, I want to remove the old stats-based rebalancing code as part of #28852, so the old setting isn't gonna be very useful.

@BramGruneir
Copy link
Member

Is there any precedent for that? Using the stats-based rebalancing feature has been pretty strongly discouraged. People may be using it in test clusters, but I'd be pretty surprised if it's being used in any real production environments. Also, I want to remove the old stats-based rebalancing code as part of #28852, so the old setting isn't gonna be very useful.

How do we know that nobody anyone's using it? It's there's an option of it and I'd bet people have toyed with it. I have no idea how we go about deprecating a cluster setting.
But we should warn people (at least in upgrade notes) that this is the case.

Copy link
Member

@BramGruneir BramGruneir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 13 of 13 files at r10, 9 of 9 files at r11, 3 of 3 files at r12.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale)

@a-robinson
Copy link
Contributor Author

We've removed cluster settings before (https://github.com/cockroachdb/cockroach/blob/master/pkg/settings/registry.go#L36), we've just never done anything like stopping an upgrade due to them. I'd think doing the same as before (retiring the setting and including a release note) is sufficient.

@BramGruneir
Copy link
Member

SGTM

@a-robinson
Copy link
Contributor Author

bors r+

I'll follow up with the setting changes.

craig bot pushed a commit that referenced this pull request Aug 20, 2018
28340: storage: make lease rebalancing decisions at the store level r=a-robinson a=a-robinson

In order to better balance the QPS being served by each store to avoid
overloaded nodes.

Fixes #21419

Release note (performance improvement): Range leases will be
automatically rebalanced throughout the cluster to even out the amount
of QPS being handled by each node.

28848: opt: Fix rule cycle bug r=andy-kimball a=andy-kimball

The PushFilterIntoJoinLeftAndRight and TryDecorrelateSelect rules can cycle
with one another in a rare case:

1. Right side of join has outer column due to being un-decorrelatable.
2. Filter conjunct is pushed down to both left and right side by mapping
   equivalencies in PushFilterIntoJoinLeftAndRight.
3. Left conjunct is pulled back up to join condition by TryDecorrelateSelect.

Steps #2 and #3 will cycle with one another. Cycle detection is not possible
in this case, because the left side keeps changing (because new conjuct is
pushed down to it each time).

The fix is simple: don't let PushFilterIntoJoinLeftAndRight push down filters
if either the left or right side has outer column(s).

This fixes #28818.

Release note: None

Co-authored-by: Alex Robinson <[email protected]>
Co-authored-by: Andrew Kimball <[email protected]>
@craig
Copy link
Contributor

craig bot commented Aug 20, 2018

Build succeeded

@craig craig bot merged commit 094bf14 into cockroachdb:master Aug 20, 2018
a-robinson added a commit to a-robinson/cockroach that referenced this pull request Aug 20, 2018
Follow-up to cockroachdb#28340, which did this for just leases.

Fixes cockroachdb#17979

Release note (performance improvement): Range replicas will be
automatically rebalanced throughout the cluster to even out the amount
of QPS being handled by each node.
a-robinson added a commit to a-robinson/cockroach that referenced this pull request Aug 31, 2018
Follow-up to cockroachdb#28340, which did this for just leases.

Fixes cockroachdb#17979

Release note (performance improvement): Range replicas will be
automatically rebalanced throughout the cluster to even out the amount
of QPS being handled by each node.
a-robinson added a commit to a-robinson/cockroach that referenced this pull request Sep 6, 2018
Follow-up to cockroachdb#28340, which did this for just leases.

Fixes cockroachdb#17979

Release note (performance improvement): Range replicas will be
automatically rebalanced throughout the cluster to even out the amount
of QPS being handled by each node.
a-robinson added a commit to a-robinson/cockroach that referenced this pull request Sep 6, 2018
Follow-up to cockroachdb#28340, which did this for just leases.

Fixes cockroachdb#17979

Release note (performance improvement): Range replicas will be
automatically rebalanced throughout the cluster to even out the amount
of QPS being handled by each node.
craig bot pushed a commit that referenced this pull request Sep 6, 2018
28852: storage: make load-based replica rebalancing decisions at the store level r=a-robinson a=a-robinson

Built on top of #28340, which is where the first 3 commits are from. This is still somewhat incomplete, in that it's missing unit tests and I'm only just now running tpc-c 10k on it. Sending out now to start the discussion of whether to include it in 2.1, since we're obviously very late in the intended development cycle.

Co-authored-by: Alex Robinson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants