Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP: add non-preempting option to PriorityClasses #901

Merged
merged 5 commits into from
Apr 1, 2019
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions keps/sig-scheduling/20190317-non-preempting-priorityclass
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
title: Add NonPreempting Option For PriorityClasses
authors:
- "@vllry"
owning-sig: sig-scheduling
participating-sigs:
- sig-scheduling
reviewers:
- "k82cn"
- "wgliang"
approvers:
- "bsalamat"
editor: Vallery Lancey
creation-date: 2019-03-17
last-updated: 2019-03-17
status: implementable
see-also:
replaces:
superseded-by:
---

# Allow PriorityClasses To Be Non-Preempting
vllry marked this conversation as resolved.
Show resolved Hide resolved

## Table of Contents

* [Table of Contents](#table-of-contents)
* [Summary](#summary)
* [Motivation](#motivation)
* [Goals](#goals)
* [Non-Goals](#non-goals)
* [Proposal](#proposal)
* [Risks and Mitigations](#risks-and-mitigations)
* [Graduation Criteria](#graduation-criteria)
* [Implementation History](#implementation-history)


## Summary

[PriorityClasses](https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/) are a beta feature,
vllry marked this conversation as resolved.
Show resolved Hide resolved
which impact the scheduling and eviction of pods.
Pods will be scheduled according to descending priority.
If a pod cannot be scheduled due to insufficient resources,
lower-priority pods will be descheduled to make room.

This proposal makes the pre-empting (descheduling) behavior optional,
by adding a new field to PriorityClasses.
If a PriorityClass does not have preemption enabled,
the scheduler will not preempt pods in order to schedule a pod of that priority.

## Motivation

High-priority, non-preempting workloads are a common data science use case.
Preempting batch workloads is a waste, as the work unit must be repeated.

### Goals

Add a boolean to PriorityClasses,
vllry marked this conversation as resolved.
Show resolved Hide resolved
to enable or disable preemption for pods of that PriorityClass.

### Non-Goals
vllry marked this conversation as resolved.
Show resolved Hide resolved

## Proposal

Add a NonPreempting field to PriorityClasses.
vllry marked this conversation as resolved.
Show resolved Hide resolved
This field will default to false,
for backwards compatibility.

If NonPreempting is false,
the scheduler will preempt lower priority pods to schedule this pod,
as is current behavior.

If NonPreempting is true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support update this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nice thing about this KEP is that it only affects the scheduling of new pods and not the "evictability" of running pods. Updating this value seems simple to support.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.... if we add a new field in PodTemplate, it's hard to update created pods :) If we hold PriorityClass in scheduler, how about kubelet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look more into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're going with denormalizing the field into PodSpec... so from my understanding, that will present some upgrade challenges.

a pod of that class will not preempt other pods if it cannot be scheduled.

Update our documentation to reflect this new feature.

### Risks and Mitigations

The new feature may malfuction,
or preemption may be accidentally impaired.
New tests (covering both nonpreepting workloads and mixed workloads),
and the existing preempting PriorityClass tests should be used to prove stability.

## Graduation Criteria

* Users are reporting that this resolves their workload priority use-cases
(if not, additional enhancements would be tightly linked to this one).
* The feature has been stable and reliable in at least 2 releases.
* Adequate documentation exists for preemption and the optional field.
* Test coverage includes non-preempting use cases.
* Conformance requirements for non-preempting PriorityClasses are agreed upon.

## Testing Plan
Add unit and e2e tests for nonpreempting PriorityClasses to the existing scheduler tests.

Ensure existing tests (for preempting PriorityClasses) do not break.

## Implementation History

[Original Github issue](https://github.com/kubernetes/kubernetes/issues/67671)

Pod Priority and Preemption are tracked as part of [enhancement#564](https://github.com/kubernetes/enhancements/issues/564).
The proposal for Pod Priority can be [found here](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/pod-priority-api.md)
and Preemption proposal is [here](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/pod-preemption.md).