Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/sqlproxyccl: periodically rebalance active and idle partitions #80527

Merged

Conversation

jaylim-crl
Copy link
Collaborator

Previously, we were only transferring connections away from DRAINING pods.
This commit adds support for transferring connections away from overloaded
RUNNING pods. A pod is considered overloaded if the number of assignments to
it has exceeded the average number of assignments across all RUNNING pods by
15%. Note that active and idle partitions will be rebalanced independently.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jaylim-crl jaylim-crl force-pushed the jay/220425-rebalance-connections branch 3 times, most recently from d1492f7 to c355819 Compare April 29, 2022 02:27
@jaylim-crl jaylim-crl marked this pull request as ready for review April 29, 2022 02:29
@jaylim-crl jaylim-crl requested review from a team as code owners April 29, 2022 02:29
@jaylim-crl jaylim-crl requested review from jeffswenson and andy-kimball and removed request for a team April 29, 2022 02:29
Copy link
Collaborator Author

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @jeffswenson)


pkg/ccl/sqlproxyccl/balancer/balancer.go line 448 at r1 (raw file):

	//        minimize the deviation from the mean by collecting from pods
	//        starting with the ones with the largest number of assignments,
	//        but this would require a sort.

@andy-kimball I tweaked the algorithm to only collect up to the average. With that, we can guarantee that we should have enough assignments to transfer.

Code quote:

	//     3. Second pass on distribution: greedily collect as many assignments
	//        up to X without violating the average. We could theoretically
	//        minimize the deviation from the mean by collecting from pods
	//        starting with the ones with the largest number of assignments,
	//        but this would require a sort.

@jaylim-crl jaylim-crl force-pushed the jay/220425-rebalance-connections branch from c355819 to b3233e4 Compare April 29, 2022 03:52
@jeffswenson
Copy link
Collaborator

This is looking great! I think we would benefit from additional randomness when selecting the connections to migrate. It helps avoid transferring only the youngest connections or preferring a small number of source pods when transferring connections to nearly empty pods.

pkg/ccl/sqlproxyccl/balancer/balancer.go Show resolved Hide resolved
pkg/ccl/sqlproxyccl/balancer/balancer.go Show resolved Hide resolved
pkg/ccl/sqlproxyccl/balancer/balancer.go Outdated Show resolved Hide resolved
pkg/ccl/sqlproxyccl/balancer/balancer.go Outdated Show resolved Hide resolved
pkg/ccl/sqlproxyccl/balancer/balancer.go Outdated Show resolved Hide resolved
@jaylim-crl jaylim-crl force-pushed the jay/220425-rebalance-connections branch from b3233e4 to a255d20 Compare April 30, 2022 01:38
@jaylim-crl jaylim-crl requested a review from jeffswenson April 30, 2022 01:51
Copy link
Collaborator Author

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

I looked through your suggestions again, I think the randomness to reduce biases makes sense. I've updated the implementation accordingly. To summarize, the implementation currently does the following:

  1. When choosing connections from overloaded pods, we randomly pick the ones that exceed the upper bound. Not doing this may skew selection to older connections.
  2. When we pick connections due to missing ones, we pick a random connection without any biases on pods.
  3. When we apply the rebalancing rate on the final list of connections, we would randomly choose to transfer. Not doing this would skew transfers to only overloaded pods.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @jeffswenson)


pkg/ccl/sqlproxyccl/balancer/balancer.go line 371 at r2 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

Can we pick a random set instead of the first n? As it is written, this does not drain from overloaded pods evenly. E.g. if there are two equally overloaded pods, this would migrate all of the surplus connections from the first sql server and none of the surplus connections from the second sql server.

Done.


pkg/ccl/sqlproxyccl/balancer/balancer.go line 390 at r2 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

nit: What do you think of renaming distribution to podAssignments?

Done.


pkg/ccl/sqlproxyccl/balancer/balancer.go line 407 at r2 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

nit: if the loop filling in empty pods is put before the loop adding assignments, it makes the code slightly simpler.

distribution := map[string][]*ServerAssignmet{}
for _, pod := range pods {
    if pod.State == tenant.RUNNING {
        distribution[pod.Addr] = nil
    }
}

for _, a := range partition {
   if _, ok := distribution[a.Addr()]; ok {
       numAssignments++
       distribution[a.Addr()] = append(distribution[a.Addr()], a)
   }
}

Done.


pkg/ccl/sqlproxyccl/balancer/balancer.go line 463 at r2 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

Selecting the connections that should be transferred from a pod is another place it would be helpful to pick at random. The connections at the start of the slice are likely to be the oldest connections. So there is a bias in which connections we pick.

We can handle both cases with a transferNRandom(source, destination, n) helper.

func transferNRandom(source []*ServerAssignment, destination*[]ServerAssignment, n int) (([]*ServerAssignment, []*ServerAssignment) {
   for i := 0; i < n && 0 < len(source); i++ {
      from := random.Intn(len(source))
      destination = append(destination, source[from])
      source[from] = source[len(source) - 1]
      source = source[:len(source) - 1]
   }
   return source, destination
}
for addr, d := range distribution {
   missingCount += int(math.Max(float64(lowerBound-len(d)), 0.0))
   excess := len(d) - upperBound
   if 0 < excess {
       distribution[addr], toMove = transferNRandom(distribution[addr], toMove, excess)
   }
}

Done.


pkg/ccl/sqlproxyccl/balancer/balancer.go line 470 at r2 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

There is a probabilistic algorithm we could use to select source connections evenly from the slightly overloaded pods:

In loop 2, we can produce two lists:
mustMove - a list of assignments that have to move
eligibleToMove - a list of assignments that can move without drawing the pod's connection count down below average.

That allows us to replace loop 3 with:

_, toMove := transferNRandom(eligibleToMove, mustMove, missingCount)

Done.

Copy link
Collaborator

@jeffswenson jeffswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pkg/ccl/sqlproxyccl/balancer/balancer.go Outdated Show resolved Hide resolved
Previously, we were only transferring connections away from DRAINING pods.
This commit adds support for transferring connections until the partitions are
balanced. A partition is considered balanced if the number of assignments to
all the pods are at most 15% away from the average number of assignments.
Note that active and idle partitions will be rebalanced independently.

Release note: None
@jaylim-crl jaylim-crl requested a review from jeffswenson May 2, 2022 17:50
Copy link
Collaborator Author

@jaylim-crl jaylim-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @jeffswenson)


pkg/ccl/sqlproxyccl/balancer/balancer.go line 375 at r3 (raw file):

Previously, JeffSwenson (Jeff Swenson) wrote…

nit: this code is not in the critical path. I think the extra allocation is worth the improved readability:

for _, assignment := range chooseNRandom(list, toMoveCount) {
   b.queue.enqueue(&rebalanceRequest{
			createdAt: b.timeSource.Now(),
			conn:      list[from].Owner(),
		})
}

It would also be possible to avoid the allocation by changing chooseNRandom so that it returns a split subslice of the argument.

e.g.
// where random and rest are both subslices of list
random, rest := chooseNRandom(list, toMoveCount)

Done. I changed the API of chooseNRandom to return two parts instead. Renamed that to partitionNRandom for clarity. Since the results are backed by the input source, there are no additional allocations, as you pointed out. I think this is a much better API, and simplifies callers as well.

@jaylim-crl jaylim-crl force-pushed the jay/220425-rebalance-connections branch from a255d20 to 4142dfc Compare May 2, 2022 18:25
Copy link
Collaborator

@jeffswenson jeffswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jaylim-crl
Copy link
Collaborator Author

TFTR!

bors r=JeffSwenson

@craig
Copy link
Contributor

craig bot commented May 2, 2022

Build succeeded:

@craig craig bot merged commit f4db230 into cockroachdb:master May 2, 2022
@jaylim-crl jaylim-crl deleted the jay/220425-rebalance-connections branch May 2, 2022 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants