diff --git a/content/en/blog/_posts/2021-08-09-alpha-swap-support.md b/content/en/blog/_posts/2021-08-09-alpha-swap-support.md new file mode 100644 index 0000000000000..2d7f562c2795a --- /dev/null +++ b/content/en/blog/_posts/2021-08-09-alpha-swap-support.md @@ -0,0 +1,142 @@ +--- +layout: blog +title: 'New in Kubernetes v1.22: alpha support for using swap memory' +date: 2021-08-09 +slug: run-nodes-with-swap-alpha +--- + +**Author:** Elana Hashman (Red Hat) + +The 1.22 release introduced alpha support for configuring swap memory usage for +Kubernetes workloads on a per-node basis. + +In prior releases, Kubernetes did not support the use of swap memory on Linux, +as it is difficult to provide guarantees and account for pod memory utilization +when swap is involved. As part of Kubernetes' earlier design, swap support was +considered out of scope, and a kubelet would by default fail to start if swap +was detected on a node. + +However, there are a number of [use cases](https://github.com/kubernetes/enhancements/blob/9d127347773ad19894ca488ee04f1cd3af5774fc/keps/sig-node/2400-node-swap/README.md#user-stories) +that would benefit from Kubernetes nodes supporting swap, including improved +node stability, better support for applications with high memory overhead but +smaller working sets, the use of memory-constrained devices, and memory +flexibility. + +Hence, over the past two releases, [SIG Node](https://github.com/kubernetes/community/tree/master/sig-node#readme) has +been working to gather appropriate use cases and feedback, and propose a design +for adding swap support to nodes in a controlled, predictable manner so that +Kubernetes users can perform testing and provide data to continue building +cluster capabilities on top of swap. The alpha graduation of swap memory +support for nodes is our first milestone towards this goal! + +## How does it work? + +There are a number of possible ways that one could envision swap use on a node. +To keep the scope manageable for this initial implementation, when swap is +already provisioned and available on a node, [we have proposed](https://github.com/kubernetes/enhancements/blob/9d127347773ad19894ca488ee04f1cd3af5774fc/keps/sig-node/2400-node-swap/README.md#proposal) +the kubelet should be able to be configured such that: + +- It can start with swap on. +- It will direct the Container Runtime Interface to allocate zero swap memory + to Kubernetes workloads by default. +- You can configure the kubelet to specify swap utilization for the entire + node. + +Swap configuration on a node is exposed to a cluster admin via the +[`memorySwap` in the KubeletConfiguration](/docs/reference/config-api/kubelet-config.v1beta1/). +As a cluster administrator, you can specify the node's behaviour in the +presence of swap memory by setting `memorySwap.swapBehavior`. + +This is possible through the addition of a `memory_swap_limit_in_bytes` field +to the container runtime interface (CRI). The kubelet's config will control how +much swap memory the kubelet instructs the container runtime to allocate to +each container via the CRI. The container runtime will then write the swap +settings to the container level cgroup. + +## How do I use it? + +On a node where swap memory is already provisioned, Kubernetes use of swap on a +node can be enabled by enabling the `NodeSwap` feature gate on the kubelet, and +disabling the `failSwapOn` [configuration setting](/docs/reference/config-api/kubelet-config.v1beta1/#kubelet-config-k8s-io-v1beta1-KubeletConfiguration) +or the `--fail-swap-on` command line flag. + +You can also optionally configure `memorySwap.swapBehavior` in order to +specify how a node will use swap memory. For example, + +```yaml +memorySwap: + swapBehavior: LimitedSwap +``` + +The available configuration options for `swapBehavior` are: + +- `LimitedSwap` (default): Kubernetes workloads are limited in how much swap + they can use. Workloads on the node not managed by Kubernetes can still swap. +- `UnlimitedSwap`: Kubernetes workloads can use as much swap memory as they + request, up to the system limit. + +If configuration for `memorySwap` is not specified and the feature gate is +enabled, by default the kubelet will apply the same behaviour as the +`LimitedSwap` setting. + +The behaviour of the `LimitedSwap` setting depends if the node is running with +v1 or v2 of control groups (also known as "cgroups"): + +- **cgroups v1:** Kubernetes workloads can use any combination of memory and + swap, up to the pod's memory limit, if set. +- **cgroups v2:** Kubernetes workloads cannot use swap memory. + +### Caveats + +Having swap available on a system reduces predictability. Swap's performance is +worse than regular memory, sometimes by many orders of magnitude, which can +cause unexpected performance regressions. Furthermore, swap changes a system's +behaviour under memory pressure, and applications cannot directly control what +portions of their memory usage are swapped out. Since enabling swap permits +greater memory usage for workloads in Kubernetes that cannot be predictably +accounted for, it also increases the risk of noisy neighbours and unexpected +packing configurations, as the scheduler cannot account for swap memory usage. + +The performance of a node with swap memory enabled depends on the underlying +physical storage. When swap memory is in use, performance will be significantly +worse in an I/O operations per second (IOPS) constrained environment, such as a +cloud VM with I/O throttling, when compared to faster storage mediums like +solid-state drives or NVMe. + +Hence, we do not recommend the use of swap for certain performance-constrained +workloads or environments. Cluster administrators and developers should +benchmark their nodes and applications before using swap in production +scenarios, and [we need your help](#how-do-i-get-involved) with that! + +## Looking ahead + +The Kubernetes 1.22 release introduces alpha support for swap memory on nodes, +and we will continue to work towards beta graduation in the 1.23 release. This +will include: + +* Adding support for controlling swap consumption at the Pod level via cgroups. + * This will include the ability to set a system-reserved quantity of swap + from what kubelet detects on the host. +* Determining a set of metrics for node QoS in order to evaluate the + performance and stability of nodes with and without swap enabled. +* Collecting feedback from test user cases. + * We will consider introducing new configuration modes for swap, such as a + node-wide swap limit for workloads. + +## How can I learn more? + +You can review the current [documentation](https://kubernetes.io/docs/concepts/architecture/nodes/#swap-memory) +on the Kubernetes website. + +For more information, and to assist with testing and provide feedback, please +see [KEP-2400](https://github.com/kubernetes/enhancements/issues/2400) and its +[design proposal](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/2400-node-swap/README.md). + +## How do I get involved? + +Your feedback is always welcome! SIG Node [meets regularly](https://github.com/kubernetes/community/tree/master/sig-node#meetings) +and [can be reached](https://github.com/kubernetes/community/tree/master/sig-node#contact) +via [Slack](https://slack.k8s.io/) (channel **#sig-node**), or the SIG's +[mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-node). +Feel free to reach out to me, Elana Hashman (**@ehashman** on Slack and GitHub) +if you'd like to help.