ccl/sqlproxyccl: periodically rebalance active and idle partitions #80527

jaylim-crl · 2022-04-26T05:00:34Z

Previously, we were only transferring connections away from DRAINING pods.
This commit adds support for transferring connections away from overloaded
RUNNING pods. A pod is considered overloaded if the number of assignments to
it has exceeded the average number of assignments across all RUNNING pods by
15%. Note that active and idle partitions will be rebalanced independently.

Release note: None

cockroach-teamcity · 2022-04-26T05:00:45Z

This change is

jaylim-crl

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @jeffswenson)

pkg/ccl/sqlproxyccl/balancer/balancer.go line 448 at r1 (raw file):

	//        minimize the deviation from the mean by collecting from pods
	//        starting with the ones with the largest number of assignments,
	//        but this would require a sort.

@andy-kimball I tweaked the algorithm to only collect up to the average. With that, we can guarantee that we should have enough assignments to transfer.

Code quote:

	//     3. Second pass on distribution: greedily collect as many assignments
	//        up to X without violating the average. We could theoretically
	//        minimize the deviation from the mean by collecting from pods
	//        starting with the ones with the largest number of assignments,
	//        but this would require a sort.

jeffswenson · 2022-04-29T17:06:50Z

This is looking great! I think we would benefit from additional randomness when selecting the connections to migrate. It helps avoid transferring only the youngest connections or preferring a small number of source pods when transferring connections to nearly empty pods.

pkg/ccl/sqlproxyccl/balancer/balancer.go

jaylim-crl

TFTR!

I looked through your suggestions again, I think the randomness to reduce biases makes sense. I've updated the implementation accordingly. To summarize, the implementation currently does the following:

When choosing connections from overloaded pods, we randomly pick the ones that exceed the upper bound. Not doing this may skew selection to older connections.
When we pick connections due to missing ones, we pick a random connection without any biases on pods.
When we apply the rebalancing rate on the final list of connections, we would randomly choose to transfer. Not doing this would skew transfers to only overloaded pods.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @jeffswenson)

pkg/ccl/sqlproxyccl/balancer/balancer.go line 371 at r2 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

Can we pick a random set instead of the first n? As it is written, this does not drain from overloaded pods evenly. E.g. if there are two equally overloaded pods, this would migrate all of the surplus connections from the first sql server and none of the surplus connections from the second sql server.

Done.

pkg/ccl/sqlproxyccl/balancer/balancer.go line 390 at r2 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

nit: What do you think of renaming distribution to podAssignments?

Done.

pkg/ccl/sqlproxyccl/balancer/balancer.go line 407 at r2 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

nit: if the loop filling in empty pods is put before the loop adding assignments, it makes the code slightly simpler.

distribution := map[string][]*ServerAssignmet{}
for _, pod := range pods {
    if pod.State == tenant.RUNNING {
        distribution[pod.Addr] = nil
    }
}

for _, a := range partition {
   if _, ok := distribution[a.Addr()]; ok {
       numAssignments++
       distribution[a.Addr()] = append(distribution[a.Addr()], a)
   }
}

Done.

pkg/ccl/sqlproxyccl/balancer/balancer.go line 463 at r2 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

Selecting the connections that should be transferred from a pod is another place it would be helpful to pick at random. The connections at the start of the slice are likely to be the oldest connections. So there is a bias in which connections we pick.

We can handle both cases with a transferNRandom(source, destination, n) helper.
func transferNRandom(source []*ServerAssignment, destination*[]ServerAssignment, n int) (([]*ServerAssignment, []*ServerAssignment) {
   for i := 0; i < n && 0 < len(source); i++ {
      from := random.Intn(len(source))
      destination = append(destination, source[from])
      source[from] = source[len(source) - 1]
      source = source[:len(source) - 1]
   }
   return source, destination
}
for addr, d := range distribution {
   missingCount += int(math.Max(float64(lowerBound-len(d)), 0.0))
   excess := len(d) - upperBound
   if 0 < excess {
       distribution[addr], toMove = transferNRandom(distribution[addr], toMove, excess)
   }
}

Done.

pkg/ccl/sqlproxyccl/balancer/balancer.go line 470 at r2 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

There is a probabilistic algorithm we could use to select source connections evenly from the slightly overloaded pods:

In loop 2, we can produce two lists:
mustMove - a list of assignments that have to move
eligibleToMove - a list of assignments that can move without drawing the pod's connection count down below average.

That allows us to replace loop 3 with:
_, toMove := transferNRandom(eligibleToMove, mustMove, missingCount)

Done.

jeffswenson

LGTM

pkg/ccl/sqlproxyccl/balancer/balancer.go

Previously, we were only transferring connections away from DRAINING pods. This commit adds support for transferring connections until the partitions are balanced. A partition is considered balanced if the number of assignments to all the pods are at most 15% away from the average number of assignments. Note that active and idle partitions will be rebalanced independently. Release note: None

jaylim-crl

TFTR!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @jeffswenson)

pkg/ccl/sqlproxyccl/balancer/balancer.go line 375 at r3 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

nit: this code is not in the critical path. I think the extra allocation is worth the improved readability:
for _, assignment := range chooseNRandom(list, toMoveCount) {
   b.queue.enqueue(&rebalanceRequest{
			createdAt: b.timeSource.Now(),
			conn:      list[from].Owner(),
		})
}
It would also be possible to avoid the allocation by changing chooseNRandom so that it returns a split subslice of the argument.
e.g.
// where random and rest are both subslices of list
random, rest := chooseNRandom(list, toMoveCount)

Done. I changed the API of chooseNRandom to return two parts instead. Renamed that to partitionNRandom for clarity. Since the results are backed by the input source, there are no additional allocations, as you pointed out. I think this is a much better API, and simplifies callers as well.

jeffswenson

LGTM

jaylim-crl · 2022-05-02T19:19:40Z

TFTR!

bors r=JeffSwenson

craig · 2022-05-02T20:55:33Z

Build succeeded:

GitHub CI (Cockroach)

jaylim-crl force-pushed the jay/220425-rebalance-connections branch 3 times, most recently from d1492f7 to c355819 Compare April 29, 2022 02:27

jaylim-crl marked this pull request as ready for review April 29, 2022 02:29

jaylim-crl requested review from a team as code owners April 29, 2022 02:29

jaylim-crl requested review from jeffswenson and andy-kimball and removed request for a team April 29, 2022 02:29

jaylim-crl commented Apr 29, 2022

View reviewed changes

jaylim-crl force-pushed the jay/220425-rebalance-connections branch from c355819 to b3233e4 Compare April 29, 2022 03:52

jeffswenson reviewed Apr 29, 2022

View reviewed changes

jaylim-crl force-pushed the jay/220425-rebalance-connections branch from b3233e4 to a255d20 Compare April 30, 2022 01:38

jaylim-crl requested a review from jeffswenson April 30, 2022 01:51

jaylim-crl commented Apr 30, 2022

View reviewed changes

jeffswenson approved these changes May 2, 2022

View reviewed changes

pkg/ccl/sqlproxyccl/balancer/balancer.go Outdated Show resolved Hide resolved

jaylim-crl requested a review from jeffswenson May 2, 2022 17:50

jaylim-crl commented May 2, 2022

View reviewed changes

jaylim-crl force-pushed the jay/220425-rebalance-connections branch from a255d20 to 4142dfc Compare May 2, 2022 18:25

jeffswenson approved these changes May 2, 2022

View reviewed changes

craig bot merged commit f4db230 into cockroachdb:master May 2, 2022

jaylim-crl deleted the jay/220425-rebalance-connections branch May 2, 2022 22:06

jaylim-crl mentioned this pull request May 3, 2022

release-22.1: ccl/sqlproxyccl: backport connection migration changes #80897

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ccl/sqlproxyccl: periodically rebalance active and idle partitions #80527

ccl/sqlproxyccl: periodically rebalance active and idle partitions #80527

jaylim-crl commented Apr 26, 2022

cockroach-teamcity commented Apr 26, 2022

jaylim-crl left a comment

jeffswenson commented Apr 29, 2022

jaylim-crl left a comment

jeffswenson left a comment

jaylim-crl left a comment

jeffswenson left a comment

jaylim-crl commented May 2, 2022

craig bot commented May 2, 2022

ccl/sqlproxyccl: periodically rebalance active and idle partitions #80527

ccl/sqlproxyccl: periodically rebalance active and idle partitions #80527

Conversation

jaylim-crl commented Apr 26, 2022

cockroach-teamcity commented Apr 26, 2022

jaylim-crl left a comment

Choose a reason for hiding this comment

jeffswenson commented Apr 29, 2022

jaylim-crl left a comment

Choose a reason for hiding this comment

jeffswenson left a comment

Choose a reason for hiding this comment

jaylim-crl left a comment

Choose a reason for hiding this comment

jeffswenson left a comment

Choose a reason for hiding this comment

jaylim-crl commented May 2, 2022

craig bot commented May 2, 2022