Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

balance: slow down interval increase speed. #585

Merged
merged 3 commits into from
Mar 28, 2017

Conversation

disksing
Copy link
Contributor

As we have observed, the schedule interval increases too fast.

We should not increase interval after each retry failure (it seems to be a bug). Also the increase factor is changed from 2 to 1.3.

/cc @nolouch @siddontang @andelf

if op := s.Scheduler.Schedule(cluster); op != nil {
s.interval = minScheduleInterval
return op
}
}

// If we have no schedule, increase the interval exponentially.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update the comment here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiplied by 1.3 every time is also 'exponentially'.

maxScheduleRetries = 10
maxScheduleInterval = time.Minute
minScheduleInterval = time.Millisecond * 10
scheduleIntervalFactor = 1.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to use 1.3 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's arbitrarily selected. We need a slower grow speed here.
If factor is 2, interval reaches max value 1min after about 13 retries, which takes less than 1.5min minutes in total.
When we can't schedule an operator in 1.5 minutes, it's not always true that the cluster is balanced, it may be caused by slow heartbeat or slow snapshot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we construct a test to verify the change is ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see what I can do.

@nolouch
Copy link
Contributor

nolouch commented Mar 27, 2017

LGTM

Copy link
Contributor

@siddontang siddontang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

@disksing
Copy link
Contributor Author

PTAL @siddontang

@siddontang
Copy link
Contributor

LGTM

@siddontang siddontang merged commit b99b30f into master Mar 28, 2017
@siddontang siddontang deleted the disksing/balance-interval branch March 28, 2017 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants