-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP: add non-preempting option to PriorityClasses #901
Merged
Merged
Changes from 4 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
2efa228
Added KEP for non-preempting pod priorities
vllry 16d2d64
Fact checks, clarifications, cleanup.
vllry d464db7
Changed Preempting field to be duplicated between PriorityClass and P…
vllry e280099
Address clarity and motiviation
vllry ca08319
Added feature gate, shuffled use case info.
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
155 changes: 155 additions & 0 deletions
155
keps/sig-scheduling/20190317-non-preempting-priorityclass.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,155 @@ | ||
--- | ||
title: Add NonPreempting Option For PriorityClasses | ||
authors: | ||
- "@vllry" | ||
owning-sig: sig-scheduling | ||
participating-sigs: | ||
- sig-scheduling | ||
reviewers: | ||
- "k82cn" | ||
- "wgliang" | ||
approvers: | ||
- "bsalamat" | ||
editor: Vallery Lancey | ||
creation-date: 2019-03-17 | ||
last-updated: 2019-03-28 | ||
status: implementable | ||
see-also: | ||
replaces: | ||
superseded-by: | ||
--- | ||
|
||
# Allow PriorityClasses To Be Non-Preempting | ||
|
||
## Table of Contents | ||
|
||
* [Table of Contents](#table-of-contents) | ||
* [Summary](#summary) | ||
* [Motivation](#motivation) | ||
* [Goals](#goals) | ||
* [Non-Goals](#non-goals) | ||
* [Proposal](#proposal) | ||
* [Risks and Mitigations](#risks-and-mitigations) | ||
* [Graduation Criteria](#graduation-criteria) | ||
* [Implementation History](#implementation-history) | ||
|
||
|
||
## Summary | ||
|
||
[PriorityClasses](https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/) are a GA feature as on 1.14, | ||
which impact the scheduling and eviction of pods. | ||
Pods are be scheduled according to descending priority. | ||
If a pod cannot be scheduled due to insufficient resources, | ||
lower-priority pods will be preempted to make room. | ||
|
||
This proposal makes the preempting behavior optional for a PriorityClass, | ||
by adding a new field to PriorityClasses, | ||
which in turn populates PodSpec. | ||
If a pod is waiting to be scheduled, | ||
and it does not have preemption enabled, | ||
it will not trigger preemption of other pods. | ||
|
||
## Motivation | ||
|
||
Allowing PriorityClasses to be non-preempting is important for running batch workloads. | ||
|
||
Batch workloads typically have a backlog of work, | ||
with unscheduled pods. | ||
Higher-priority workloads can be assigned a higher priority via a PriorityClass, | ||
to ensure they go to the front of the scheduling queue. | ||
However, | ||
preempting batch workloads is undesirable, | ||
as all work done by the preempted pod is typically lost. | ||
|
||
Users could create a non-preempting PriorityClasses, | ||
to ensure their most time-sensitive workloads are scheduled before other queued pods, | ||
without risking discarding the work of running pods. | ||
|
||
|
||
### Goals | ||
|
||
Add a boolean to PriorityClasses, | ||
to enable or disable preemption for pods of that PriorityClass. | ||
|
||
### Non-Goals | ||
|
||
* Protecting pods from preemption. PodDisruptionBudget should be used. | ||
|
||
## Proposal | ||
|
||
Add a Preempting field to both PodSpec and PriorityClass. | ||
This field will default to true, | ||
for backwards compatibility. | ||
|
||
If Preempting is true for a pod, | ||
the scheduler will preempt lower priority pods to schedule this pod, | ||
as is current behavior. | ||
|
||
If Preempting is false, | ||
a pod of that priority will not preempt other pods. | ||
|
||
Setting the Preempting field in PriorityClass provides a straightforward interface, | ||
and allows ResourceQuotas to restrict preemption. | ||
|
||
PriorityClass type example: | ||
``` | ||
type PriorityClass struct { | ||
metav1.TypeMeta | ||
metav1.ObjectMeta | ||
Value int32 | ||
GlobalDefault bool | ||
Description string | ||
Preempting *bool // New option | ||
} | ||
``` | ||
|
||
The Preempting field in PodSpec will be populated during pod admission, | ||
similarly to how the PriorityClass Value is populated. | ||
Storing the Preempting field in the pod spec has several benefits: | ||
* The scheduler does not need to be aware of PiorityClasses, | ||
as all relevant information is in the pod. | ||
* Mutating PriorityClass objects does not impact existing pods. | ||
* Kubelets can set Preempting on static pods. | ||
|
||
PodSpec type example: | ||
``` | ||
type PodSpec struct { | ||
... | ||
Preempting *bool | ||
... | ||
} | ||
``` | ||
|
||
Documentation must be updated to reflect the new feature, | ||
and changes to PriorityClass/PodSpec fields. | ||
|
||
### Risks and Mitigations | ||
|
||
The new feature may malfuction, | ||
or preemption may be accidentally impaired. | ||
New tests (covering both nonpreepting workloads and mixed workloads), | ||
and the existing preempting PriorityClass tests should be used to prove stability. | ||
|
||
## Graduation Criteria | ||
|
||
* Users are reporting that this resolves their workload priority use-cases | ||
(if not, additional enhancements would be tightly linked to this one). | ||
vllry marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* The feature has been stable and reliable in at least 2 releases. | ||
* Adequate documentation exists for preemption and the optional field. | ||
* Test coverage includes non-preempting use cases. | ||
* Conformance requirements for non-preempting PriorityClasses are agreed upon. | ||
|
||
## Testing Plan | ||
Add detailed unit and integration tests for nonpreempting workloads. | ||
|
||
Add basic e2e tests, to ensure all components are working together. | ||
|
||
Ensure existing tests (for preempting PriorityClasses) do not break. | ||
|
||
## Implementation History | ||
|
||
[Original Github issue](https://github.com/kubernetes/kubernetes/issues/67671) | ||
|
||
Pod Priority and Preemption are tracked as part of [enhancement#564](https://github.com/kubernetes/enhancements/issues/564). | ||
The proposal for Pod Priority can be [found here](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/pod-priority-api.md) | ||
and Preemption proposal is [here](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/pod-preemption.md). |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this feature will be featuregated or since for the most part it will be backward compatible assuming default preempting field value to be true, we'd keep it without a featuregate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we'd need a feature gate. The default behavior is the same as existing behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new fields added to stable APIs get feature gated for a release to ensure HA upgrade can succeed without data loss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, okay.