Improved scheduler retry logic under high contention #787

dadgar · 2016-02-10T05:26:11Z

This PR resets the retry count if progress is made during scheduling and fails by creating a blocked eval.

…eval

c4milo · 2016-02-10T17:40:56Z

I wonder if retry attempts should be randomized, in order to avoid overwhelming the server when too many blocked evaluations are queued. Or does the max retries achieve the same effect?

armon · 2016-02-11T01:51:35Z

scheduler/generic_sched.go

+	}
+
+	e := s.ctx.Eligibility()
+	classes := e.GetClasses()


May as well not track this if HasEscaped

armon · 2016-02-11T01:55:06Z

Minor feedback, LGTM

Improved scheduler retry logic under high contention

armon · 2016-02-11T18:08:55Z

@c4milo the retry limit is there to prevent overwhelming the servers, exactly as you said!

c4milo · 2016-02-11T18:31:54Z

Nice! shouldn't retries be randomized then? So that in case of any general failure all the queued allocs aren't tried to be scheduled at the same time, DoSing the servers? Or is it unlikely to happen?

c4milo · 2016-02-11T18:34:31Z

I've seen similar scenarios happening before in other distributed systems, where a service would be unable recover due to all clients retrying at the same time and DoSing/overwhelming the service.

armon · 2016-02-12T00:57:02Z

@c4milo The evaluation broker handles this case. The scheduler limits how many retries it does in a hot loop, before yielding the scheduler thread and moving back into the evaluation broker. There is also randomization in the placement order to reduce contention under extremely high load as well.

c4milo · 2016-02-12T03:25:27Z

Great! Thanks Armon for explaining further!
On Thu, Feb 11, 2016 at 7:57 PM Armon Dadgar [email protected]
wrote:

@c4milo https://github.com/c4milo The evaluation broker handles this
case. The scheduler limits how many retries it does in a hot loop, before
yielding the scheduler thread and moving back into the evaluation broker.
There is also randomization in the placement order to reduce contention
under extremely high load as well.

—
Reply to this email directly or view it on GitHub
#787 (comment).

github-actions · 2023-04-29T02:10:19Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Reset retry count if progress is made and fail by creating a blocked …

cc0ad87

…eval

armon reviewed Feb 11, 2016
View reviewed changes

scheduler/generic_sched.go

}

e := s.ctx.Eligibility()

classes := e.GetClasses()

Copy link

Member

armon Feb 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May as well not track this if HasEscaped

Only set eligibility if the eval hasn't escaped

13d1fd0

dadgar added a commit that referenced this pull request Feb 11, 2016

Merge pull request #787 from hashicorp/f-scheduler-retries

49b4d39

Improved scheduler retry logic under high contention

dadgar merged commit 49b4d39 into master Feb 11, 2016

dadgar deleted the f-scheduler-retries branch February 11, 2016 17:49

github-actions bot locked as resolved and limited conversation to collaborators Apr 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved scheduler retry logic under high contention #787

Improved scheduler retry logic under high contention #787

dadgar commented Feb 10, 2016

c4milo commented Feb 10, 2016

armon Feb 11, 2016

armon commented Feb 11, 2016

armon commented Feb 11, 2016

c4milo commented Feb 11, 2016

c4milo commented Feb 11, 2016

armon commented Feb 12, 2016

c4milo commented Feb 12, 2016

github-actions bot commented Apr 29, 2023

Improved scheduler retry logic under high contention #787

Improved scheduler retry logic under high contention #787

Conversation

dadgar commented Feb 10, 2016

c4milo commented Feb 10, 2016

armon Feb 11, 2016

Choose a reason for hiding this comment

armon commented Feb 11, 2016

armon commented Feb 11, 2016

c4milo commented Feb 11, 2016

c4milo commented Feb 11, 2016

armon commented Feb 12, 2016

c4milo commented Feb 12, 2016

github-actions bot commented Apr 29, 2023