-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ccl/sqlproxyccl: periodically rebalance active and idle partitions #80527
ccl/sqlproxyccl: periodically rebalance active and idle partitions #80527
Conversation
d1492f7
to
c355819
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @jeffswenson)
pkg/ccl/sqlproxyccl/balancer/balancer.go
line 448 at r1 (raw file):
// minimize the deviation from the mean by collecting from pods // starting with the ones with the largest number of assignments, // but this would require a sort.
@andy-kimball I tweaked the algorithm to only collect up to the average. With that, we can guarantee that we should have enough assignments to transfer.
Code quote:
// 3. Second pass on distribution: greedily collect as many assignments
// up to X without violating the average. We could theoretically
// minimize the deviation from the mean by collecting from pods
// starting with the ones with the largest number of assignments,
// but this would require a sort.
c355819
to
b3233e4
Compare
This is looking great! I think we would benefit from additional randomness when selecting the connections to migrate. It helps avoid transferring only the youngest connections or preferring a small number of source pods when transferring connections to nearly empty pods. |
b3233e4
to
a255d20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
I looked through your suggestions again, I think the randomness to reduce biases makes sense. I've updated the implementation accordingly. To summarize, the implementation currently does the following:
- When choosing connections from overloaded pods, we randomly pick the ones that exceed the upper bound. Not doing this may skew selection to older connections.
- When we pick connections due to missing ones, we pick a random connection without any biases on pods.
- When we apply the rebalancing rate on the final list of connections, we would randomly choose to transfer. Not doing this would skew transfers to only overloaded pods.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @jeffswenson)
pkg/ccl/sqlproxyccl/balancer/balancer.go
line 371 at r2 (raw file):
Previously, JeffSwenson (Jeff Swenson) wrote…
Can we pick a random set instead of the first n? As it is written, this does not drain from overloaded pods evenly. E.g. if there are two equally overloaded pods, this would migrate all of the surplus connections from the first sql server and none of the surplus connections from the second sql server.
Done.
pkg/ccl/sqlproxyccl/balancer/balancer.go
line 390 at r2 (raw file):
Previously, JeffSwenson (Jeff Swenson) wrote…
nit: What do you think of renaming
distribution
topodAssignments
?
Done.
pkg/ccl/sqlproxyccl/balancer/balancer.go
line 407 at r2 (raw file):
Previously, JeffSwenson (Jeff Swenson) wrote…
nit: if the loop filling in empty pods is put before the loop adding assignments, it makes the code slightly simpler.
distribution := map[string][]*ServerAssignmet{} for _, pod := range pods { if pod.State == tenant.RUNNING { distribution[pod.Addr] = nil } } for _, a := range partition { if _, ok := distribution[a.Addr()]; ok { numAssignments++ distribution[a.Addr()] = append(distribution[a.Addr()], a) } }
Done.
pkg/ccl/sqlproxyccl/balancer/balancer.go
line 463 at r2 (raw file):
Previously, JeffSwenson (Jeff Swenson) wrote…
Selecting the connections that should be transferred from a pod is another place it would be helpful to pick at random. The connections at the start of the slice are likely to be the oldest connections. So there is a bias in which connections we pick.
We can handle both cases with a transferNRandom(source, destination, n) helper.
func transferNRandom(source []*ServerAssignment, destination*[]ServerAssignment, n int) (([]*ServerAssignment, []*ServerAssignment) { for i := 0; i < n && 0 < len(source); i++ { from := random.Intn(len(source)) destination = append(destination, source[from]) source[from] = source[len(source) - 1] source = source[:len(source) - 1] } return source, destination }
for addr, d := range distribution { missingCount += int(math.Max(float64(lowerBound-len(d)), 0.0)) excess := len(d) - upperBound if 0 < excess { distribution[addr], toMove = transferNRandom(distribution[addr], toMove, excess) } }
Done.
pkg/ccl/sqlproxyccl/balancer/balancer.go
line 470 at r2 (raw file):
Previously, JeffSwenson (Jeff Swenson) wrote…
There is a probabilistic algorithm we could use to select source connections evenly from the slightly overloaded pods:
In loop 2, we can produce two lists:
mustMove - a list of assignments that have to move
eligibleToMove - a list of assignments that can move without drawing the pod's connection count down below average.That allows us to replace loop 3 with:
_, toMove := transferNRandom(eligibleToMove, mustMove, missingCount)
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Previously, we were only transferring connections away from DRAINING pods. This commit adds support for transferring connections until the partitions are balanced. A partition is considered balanced if the number of assignments to all the pods are at most 15% away from the average number of assignments. Note that active and idle partitions will be rebalanced independently. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @jeffswenson)
pkg/ccl/sqlproxyccl/balancer/balancer.go
line 375 at r3 (raw file):
Previously, JeffSwenson (Jeff Swenson) wrote…
nit: this code is not in the critical path. I think the extra allocation is worth the improved readability:
for _, assignment := range chooseNRandom(list, toMoveCount) { b.queue.enqueue(&rebalanceRequest{ createdAt: b.timeSource.Now(), conn: list[from].Owner(), }) }
It would also be possible to avoid the allocation by changing chooseNRandom so that it returns a split subslice of the argument.
e.g. // where random and rest are both subslices of list random, rest := chooseNRandom(list, toMoveCount)
Done. I changed the API of chooseNRandom to return two parts instead. Renamed that to partitionNRandom for clarity. Since the results are backed by the input source, there are no additional allocations, as you pointed out. I think this is a much better API, and simplifies callers as well.
a255d20
to
4142dfc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
TFTR! bors r=JeffSwenson |
Build succeeded: |
Previously, we were only transferring connections away from DRAINING pods.
This commit adds support for transferring connections away from overloaded
RUNNING pods. A pod is considered overloaded if the number of assignments to
it has exceeded the average number of assignments across all RUNNING pods by
15%. Note that active and idle partitions will be rebalanced independently.
Release note: None