Update swap KEP for 1.23 beta

Fill out remaining beta PRR questions, add test plans
kubernetes · Sep 3, 2021 · 20a8885 · 20a8885
1 parent 59a1941
commit 20a8885
Show file tree

Hide file tree

Showing 3 changed files with 61 additions and 4 deletions.
diff --git a/keps/prod-readiness/sig-node/2400.yaml b/keps/prod-readiness/sig-node/2400.yaml
@@ -1,3 +1,5 @@
 kep-number: 2400
 alpha:
   approver: "@deads2k"
+beta:
+  approver: "@deads2k"
diff --git a/keps/sig-node/2400-node-swap/README.md b/keps/sig-node/2400-node-swap/README.md
@@ -401,8 +401,12 @@ For alpha:
   and further development efforts.
   - Focus should be on supported user stories as listed above.
 
-Once this data is available, additional test plans should be added for the next
-phase of graduation.
+For beta:
+
+- Add e2e tests that exercise all available swap configurations via the CRI.
+- Add e2e tests that verify pod-level control of swap utilization.
+- Add e2e tests that verify swap performance with pods using a tmpfs.
+- Verify new system-reserved settings for swap memory.
 
 ### Graduation Criteria
 
@@ -587,13 +591,29 @@ Try to be as paranoid as possible - e.g., what if some components will restart
 mid-rollout?
 -->
 
+If a new node with swap memory fails to come online, it will not impact any
+running components.
+
+It is possible that if a cluster administrator adds swap memory to an already
+running node, and then performs an in-place upgrade, the new kubelet could fail
+to start unless the configuration was modified to tolerate swap. However, we
+would expect that if a cluster admin is adding swap to the node, they will also
+update the kubelet's configuration to not fail with swap present.
+
+Generally, it is considered best practice to add a swap memory partition at
+node image/boot time and not provision it dynamically after a kubelet is
+already running and reporting Ready on a node.
+
 ###### What specific metrics should inform a rollback?
 
 <!--
 What signals should users be paying attention to when the feature is young
 that might indicate a serious problem?
 -->
 
+Workload churn or performance degradations on nodes. The metrics will be
+application/use-case specific, but we can provide some suggestions.
+
 ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
 
 <!--
@@ -602,12 +622,17 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
 are missing a bunch of machinery and tooling and can't do that now.
 -->
 
+N/A because swap support lacks a runtime upgrade/downgrade path; kubelet must
+be restarted with or without swap support.
+
 ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
 
 <!--
 Even if applying deprecation policies, they may still surprise some users.
 -->
 
+No.
+
 ### Monitoring Requirements
 
 <!--
@@ -622,12 +647,21 @@ checking if there are objects with field X set) may be a last resort. Avoid
 logs or events for this purpose.
 -->
 
+KubeletConfiguration has set `failOnSwap: false`.
+
+The prometheus `node_exporter` will also export stats on swap memory
+utilization.
+
 ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
 
 <!--
 Pick one more of these and delete the rest.
 -->
 
+TBD. We will determine a set of metrics as part of beta graduation. We will
+need more data; there is not a single metric or set of metrics that can be used
+to generally quantify node performance.
+
 - [ ] Metrics
   - Metric name:
   - [Optional] Aggregation method:
@@ -647,13 +681,17 @@ high level (needs more precise definitions) those may be things like:
   - 99,9% of /health requests per day finish with 200 code
 -->
 
+N/A
+
 ###### Are there any missing metrics that would be useful to have to improve observability of this feature?
 
 <!--
 Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
 implementation difficulties, etc.).
 -->
 
+N/A
+
 ### Dependencies
 
 <!--
@@ -784,6 +822,8 @@ details). For now, we leave it here.
 
 ###### How does this feature react if the API server and/or etcd is unavailable?
 
+No change. Feature is specific to individual nodes.
+
 ###### What are other known failure modes?
 
 <!--
@@ -799,8 +839,23 @@ For each of them, fill in the following information by copying the below templat
     - Testing: Are there any tests for failure mode? If not, describe why.
 -->
 
+
+Individual nodes with swap memory enabled may experience performance
+degradations under load. This could potentially cause a cascading failure on
+nodes without swap: if nodes with swap fail Ready checks, workloads may be
+rescheduled en masse.
+
+Thus, cluster administrators should be careful while enabling swap. To minimize
+disruption, you may want to taint nodes with swap available to protect against
+this problem. Taints will ensure that workloads which tolerate swap will not
+spill onto nodes without swap under load.
+
 ###### What steps should be taken if SLOs are not being met to determine the problem?
 
+It is suggested that if nodes with swap memory enabled cause performance or
+stability degradations, those nodes are cordoned, drained, and replaced with
+nodes that do not use swap memory.
+
 ## Implementation History
 
 - **2015-04-24:** Discussed in [#7294](https://github.com/kubernetes/kubernetes/issues/7294).

diff --git a/keps/sig-node/2400-node-swap/kep.yaml b/keps/sig-node/2400-node-swap/kep.yaml
@@ -20,12 +20,12 @@ prr-approvers:
   - "@deads2k"
 
 # The target maturity stage in the current dev cycle for this KEP.
-stage: alpha
+stage: beta
 
 # The most recent milestone for which work toward delivery of this KEP has been
 # done. This can be the current (upcoming) milestone, if it is being actively
 # worked on.
-latest-milestone: "v1.22"
+latest-milestone: "v1.23"
 
 # The milestone at which this feature was, or is targeted to be, at each stage.
 milestone: