From 85344939d6a3073697aadc661d78ca06c4dd7175 Mon Sep 17 00:00:00 2001 From: ravisantoshgudimetla Date: Fri, 6 Aug 2021 18:03:10 -0400 Subject: [PATCH] Allow mutating priority classes --- .../268-priority-preemption/README.md | 131 ++++++++++++++++++ .../268-priority-preemption/kep.yaml | 4 +- 2 files changed, 134 insertions(+), 1 deletion(-) diff --git a/keps/sig-scheduling/268-priority-preemption/README.md b/keps/sig-scheduling/268-priority-preemption/README.md index 99d422af55b..92ac9c95f2f 100644 --- a/keps/sig-scheduling/268-priority-preemption/README.md +++ b/keps/sig-scheduling/268-priority-preemption/README.md @@ -10,6 +10,13 @@ - [Proposal](#proposal) - [Risks and Mitigations](#risks-and-mitigations) - [Graduation Criteria](#graduation-criteria) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature enablement and rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) - [Testing Plan](#testing-plan) - [Unit Tests](#unit-tests) - [Integration tests](#integration-tests) @@ -60,6 +67,130 @@ caused by this change. * Adequate documentation exists for the features. * Test coverage of the features is acceptable. +## Production Readiness Review Questionnaire +We'd like priority classes to be mutable going from release 1.23 because of we're technically achieving the +same effect with re-creation. Reasons for making priority classes immutable are mentioned in this +[proposal](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/pod-priority-api.md#drawbacks-of-changing-priority-classes) +but we noticed that people are anyways deleting/creating priority classes despite the reasons mentioned above. + +### Feature enablement and rollback + +* **How can this feature be enabled / disabled in a live cluster?** + + We'll introduce a new featuregate flag called `AllowPriorityClassUpdates`. By enabling this flag, this feature can be enabled. + +* **Does enabling the feature change any default behavior?** + + It changes the behavior of priority classes. Priority class values and names can be changed now. + +* **Can the feature be disabled once it has been enabled (i.e. can we rollback + the enablement)?** + + Yes, by disabling the featuregate + +* **What happens if we reenable the feature if it was previously rolled back?** + + Priority classes become immutable again. + +* **Are there any tests for feature enablement/disablement?** + + Yes, we will add unit tests at api validation package level. + +### Rollout, Upgrade and Rollback Planning + +* **How can a rollout fail? Can it impact already running workloads?** + + If the rollout fails, it won't impact the already running workloads. + +* **What specific metrics should inform a rollback?** + + N/A. + +* **Were upgrade and rollback tested? Was upgrade->downgrade->upgrade path tested?** + + Haven't been tested. Will be tested once this feature graduates. + +* **Is the rollout accompanied by any deprecations and/or removals of features, + APIs, fields of API types, flags, etc.?** + + N/A. + +### Monitoring requirements + +* **How can an operator determine if the feature is in use by workloads?** + + N/A. + +* **How can someone using this feature know that it is working for their instance?** + + By updating priority classes. If they're able to do so, it means the feature is working, if not the + feature is not working + +* **What are the SLIs (Service Level Indicators) an operator can use to + determine the health of the service?** + + N/A. + +* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?** + + N/A. + +* **Are there any missing metrics that would be useful to have to improve + observability if this feature?** + + N/A. + +### Dependencies + +* **Does this feature depend on any specific services running in the cluster?** + + No. + + +### Scalability + +* **Will enabling / using this feature result in any new API calls?** + + No + +* **Will enabling / using this feature result in introducing new API types?** + + No REST API changes. + +* **Will enabling / using this feature result in any new calls to cloud + provider?** + + No. + +* **Will enabling / using this feature result in increasing size or count + of the existing API objects?** + + No. + +* **Will enabling / using this feature result in increasing time taken by any + operations covered by [existing SLIs/SLOs][]?** + + No. + +* **Will enabling / using this feature result in non-negligible increase of + resource usage (CPU, RAM, disk, IO, ...) in any components?** + + No. + +### Troubleshooting + +* **How does this feature react if the API server and/or etcd is unavailable?** + + N/A. + +* **What are other known failure modes?** + + Errors will be logged to stderr. + +* **What steps should be taken if SLOs are not being met to determine the problem?** + + N/A. + ## Testing Plan Pod priority and preemption have unit, integration, and e2e tests. These tests are run regularly as a part of Kubernetes presubmit and CI/CD pipeline. diff --git a/keps/sig-scheduling/268-priority-preemption/kep.yaml b/keps/sig-scheduling/268-priority-preemption/kep.yaml index 3935c888ce6..70d6c844793 100644 --- a/keps/sig-scheduling/268-priority-preemption/kep.yaml +++ b/keps/sig-scheduling/268-priority-preemption/kep.yaml @@ -7,11 +7,13 @@ participating-sigs: - sig-scheduling reviewers: - "@k82cn" + - "@ahg-g" + - sig-api approvers: - "@liggitt" editor: Babak Salamat creation-date: 2019-01-31 -last-updated: 2019-01-31 +last-updated: 2021-08-06 status: implementable see-also: replaces: