Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP: add non-preempting option to PriorityClasses #901

Merged
merged 5 commits into from
Apr 1, 2019

Conversation

vllry
Copy link
Contributor

@vllry vllry commented Mar 18, 2019

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 18, 2019
@k8s-ci-robot k8s-ci-robot requested review from bsalamat and k82cn March 18, 2019 04:32
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Mar 18, 2019
the scheduler will preempt lower priority pods to schedule this pod,
as is current behavior.

If NonPreempting is true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support update this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nice thing about this KEP is that it only affects the scheduling of new pods and not the "evictability" of running pods. Updating this value seems simple to support.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.... if we add a new field in PodTemplate, it's hard to update created pods :) If we hold PriorityClass in scheduler, how about kubelet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look more into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're going with denormalizing the field into PodSpec... so from my understanding, that will present some upgrade challenges.

keps/sig-scheduling/20190317-non-preempting-priorityclass Outdated Show resolved Hide resolved
keps/sig-scheduling/20190317-non-preempting-priorityclass Outdated Show resolved Hide resolved
keps/sig-scheduling/20190317-non-preempting-priorityclass Outdated Show resolved Hide resolved
the scheduler will preempt lower priority pods to schedule this pod,
as is current behavior.

If NonPreempting is true,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nice thing about this KEP is that it only affects the scheduling of new pods and not the "evictability" of running pods. Updating this value seems simple to support.

@derekwaynecarr
Copy link
Member

If I declare a priority class "foo" and say it is not pre-emptible, is a system-node-critical or system-cluster-critical priority based workload able to pre-empt it? It seems like we would need to always enable those priorities to force pre-emption to maintain a functional cluster.

@misterikkit
Copy link

If I declare a priority class "foo" and say it is not pre-emptible, is a system-node-critical or system-cluster-critical priority based workload able to pre-empt it? It seems like we would need to always enable those priorities to force pre-emption to maintain a functional cluster.

So I had to read the KEP carefully to get this, but I don't think that's how it works. This new setting controls whether a pod will trigger eviction of lower priority pods, not whether the pod can itself be evicted.

If that's the behavior we decide to implement, we'll need to be extra careful to document it clearly.

@derekwaynecarr
Copy link
Member

@misterikkit -- i reread, and i think this is right. the system-* priorities can pre-empt other workloads, but the "data-science" priority would not pre-empt anything else. if that was the case, that makes sense.

@vllry
Copy link
Contributor Author

vllry commented Mar 18, 2019

I'll think about rephrasing the description so that's more apparent.

@bsalamat bsalamat self-assigned this Mar 18, 2019

## Graduation Criteria

* Users are reporting that this resolves their workload priority use-cases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this feature will be featuregated or since for the most part it will be backward compatible assuming default preempting field value to be true, we'd keep it without a featuregate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we'd need a feature gate. The default behavior is the same as existing behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new fields added to stable APIs get feature gated for a release to ensure HA upgrade can succeed without data loss

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, okay.

@vllry
Copy link
Contributor Author

vllry commented Mar 24, 2019

@denkensk it would help if you could handle some of the technical questions, as the current owner of kubernetes/kubernetes#67671

Copy link
Member

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please address the remaining items to get this merged?

@vllry
Copy link
Contributor Author

vllry commented Mar 28, 2019

Will do.

Copy link
Member

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @vllry! It looks good. It would be great if you could also add a user story. You can use the info in my comment to write about usage of this feature for batch and/or other workloads that want faster scheduling without interrupting running pods.

@bsalamat
Copy link
Member

To clarify my comment, you may want to move part of the current "motivation" section to a "user story" section.

@vllry
Copy link
Contributor Author

vllry commented Mar 29, 2019

Thanks, @bsalamat, that makes sense.

Still working on the details about feature gating and pod spec upgrades... but it's starting to look a lot more complete.

@vllry
Copy link
Contributor Author

vllry commented Mar 30, 2019

I've addressed all the outstanding comments, except for compatibility and changes w/r/t the PodSpec. I'll do more reading this weekend on the implications and how this is normally handled, but suggestions would be appreciated.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 1, 2019
Copy link
Member

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

This change is backward compatible since the default value of the "preempting" field is true.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, vllry

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 1, 2019
@k8s-ci-robot k8s-ci-robot merged commit fedb26d into kubernetes:master Apr 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants