-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
balance: some changes to make balance more reliable. #560
Conversation
server/cache.go
Outdated
@@ -225,6 +225,13 @@ func (r *regionsInfo) randFollowerRegion(storeID uint64) *regionInfo { | |||
return randRegion(r.followers[storeID]) | |||
} | |||
|
|||
func (r *regionsInfo) randomRegion() *regionInfo { | |||
for _, region := range r.regions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not using a random index to get the random region?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we have to use another []uint64
to record all region ids and sync it with regions map. I think we can rely on the fact that maps are iterated from random position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean if the regions is not changed, every for in range will get a random region?
maybe we should add a test to verify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. We should not rely on the randomness of map iteration. I'm working on another PR to correct it.
any test? |
@siddontang I've added test case for 2. 1 is covered by many cases. 3 basically won't happen, so not easy to test. |
server/coordinator_test.go
Outdated
} | ||
time.Sleep(time.Millisecond * 100) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need panic here?
# Conflicts: # server/cache.go
server/scheduler.go
Outdated
return nil, nil | ||
} | ||
|
||
region := cluster.randLeaderRegion(source.GetId()) | ||
if region == nil { | ||
leaderStore := cluster.getStore(region.Leader.GetStoreId()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random region let hot leader store get less chance to transfer leader. I see the maxScheduleRetries
is 10. if region large than 100000, it just sampling 0.01 percent. the most scores almost same. is it very possibly gets nil operator and schedule interval will increase to the maximum fastly? I think we also need to adjust schedule-retry with the region total number and schedule-interval increase way. or may we can directly add a schedule for the least leader store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seem we should add a metric to check the useless select counter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forget the score are the leader count in that store. may we can test it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nolouch Assume the peers are perfectly balanced, only leaders need to transfer. The rate of successfully find a region that need be balanced is mainly determined by node count and how many nodes are not well balanced.
Assume we have 100000 regions and 10 nodes, then each node contains 10000 leaders. When a region is randomly selected, the rate of its leader on a specific node is 10%.
I have checked that it works well on a 10 nodes cluster, it may be inefficient it if there are more nodes. I'll continue do more test :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so how about add a schedule the store with the least leader count as the target source. it isn't affected by node count.
any update? @disksing |
server/scheduler.go
Outdated
return nil, nil | ||
var averageLeader float64 | ||
for _, s := range stores { | ||
averageLeader += float64(cluster.getStoreLeaderCount(s.GetId())) / float64(len(stores)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s.leaderCount() directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact I intentionally avoid using s.leaderCount() here, because leader count in cache is more up to date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
server/scheduler.go
Outdated
return nil, nil | ||
} | ||
|
||
return region, region.GetStorePeer(target.GetId()) | ||
if leastLeaderStore == nil || math.Abs(mostLeaderStore.leaderScore()-averageLeader) > math.Abs(leastLeaderStore.leaderScore()-averageLeader) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if mostLeaderStore be nil will panic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have already checked if both leastLeaderStore
and mostLeaderstore
are nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I'll fix it.
if target == nil { | ||
var mostLeaderDistance, leastLeaderDistance float64 | ||
if mostLeaderStore != nil { | ||
mostLeaderDistance = math.Abs(mostLeaderStore.leaderScore() - averageLeader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also use cache way get the score in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. but we need to merge #575 before fixing it.
PTAL @nolouch @siddontang |
LGTM |
PTAL @andelf |
PTAL @siddontang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
random select region for leader balance.
Fix Select target store first for leader transfer when some nodes fall behind too much. #545. The old way makes balance slow when several stores fall behind too much, and also has the over-schedule problem because scheduling is always concentrated on the store with most leaders.
balance limit should exclude down/tombstone stores.
With this patch, balance speed will not go too fast when several stores are down.
add storageThresholdFilter for regionBalancer.
Do not move regions to a store which is almost full. It normally won't happen, but it's more safe to check it explicitly.
PTAL @siddontang @nolouch