Skip to content

Latest commit

 

History

History
657 lines (465 loc) · 28.3 KB

File metadata and controls

657 lines (465 loc) · 28.3 KB

KEP-4569: Moving cgroup v1 support into maintenance mode

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

  • (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
  • (R) KEP approvers have approved the KEP status as implementable
  • (R) Design details are appropriately documented
  • (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
  • (R) Graduation criteria is in place
  • (R) Production readiness review completed
  • (R) Production readiness review approved
  • "Implementation History" section is up-to-date for milestone
  • User-facing documentation has been created in kubernetes/website, for publication to kubernetes.io
  • Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

Move cgroup v1 support in Kubernetes into maintenance mode, aligning with the industry's move towards cgroup v2 as the default for Linux kernel resource management and isolation.

Motivation

The Linux kernel community has made cgroup v2 the focus for new features, offering better functionality, a more consistent interface, and improved scalability. As a result, major Linux distributions and projects like systemd are phasing out support for cgroup v1. This trend puts pressure on Kubernetes to follow suit.

By shifting cgroup v1 support to maintenance mode, Kubernetes can stay in line with these changes, ensuring compatibility and taking advantage of the improvements in cgroup v2. This transition encourages the use of a more secure and efficient technology while acknowledging that the broader ecosystem, including essential components like the Linux kernel and systemd, is moving beyond cgroup v1.

For those needing long-term support for cgroup v1, it's important to note that this KEP reflects a broader shift in the ecosystem. The reality is that many critical dependencies are moving to cgroup v2, making it necessary for Kubernetes to adapt accordingly.

Goals

  1. Feature Freeze: No new features will be added to the cgroup v1 support code. The existing functionality of cgroup v1 will be considered complete and stable.

  2. e2e Testing: Maintain a set of e2e tests to ensure ongoing validation of cgroup v1 for the currently supported features

  3. Security Maintenance: Kubernetes community may provide security fixes for Critical and Important CVEs related to cgroup v1 as long as the release is not in end of life.

  4. Best-Effort Bug Fixes:

    • Address critical security vulnerabilities in cgroup v1 on priority.
    • Major bugs in cgroup v1 will be evaluated and potentially fixed if a feasible solution exists.
    • Acknowledging that some bugs, particularly those requiring substantial changes, may not be resolvable given the constraints around maintaining cgroup v1 support, some issues may need fixes in the kernel or other dependencies that may not happen and so will not be fixed.
  5. Migration Support: Provide clear and practical migration guidance for users using cgroup v1, facilitating a smoother transition to cgroup v2.

  6. Enhancing cgroup v2 Support: Address all known pending bugs in Kubernetes’ cgroup v2 support to ensure it reaches a level of reliability and functionality that encourages users to transition from cgroup v1.

Non-Goals

Removing cgroup v1 support. Deprecation and removal will be addressed in a future KEP.

Proposal

The proposal outlines a plan to move cgroup v1 support in Kubernetes into maintenance mode, encouraging the community and users to transition to cgroup v2.

Risks and Mitigations

The primary risk involves potential disruptions for users who migrate to cgroup v2 with incompatible workloads.

Users depending on the following technologies will need to ensure they are using the specified versions or later, which support cgroup v2:

  • OpenJDK / HotSpot: jdk8u372, 11.0.16, 15 and later
  • NodeJs 20.3.0 or later
  • IBM Semeru Runtimes: jdk8u345-b01, 11.0.16.0, 17.0.4.0, 18.0.2.0 and later
  • IBM SDK Java Technology Edition Version (IBM Java): 8.0.7.15 and later
  • If users run any third-party monitoring and security agents that depend on the cgroup file system, they need to update the agents to a version that supports cgroup v2.

Mitigations include providing clear documentation on the migration process, offering community support for common issues encountered during migration, and keeping the cgroup v1 support in the maintenance mode for allowing users additional time to switch to cgroup v2 without any major disruptions.

Design Details

This enhancement outlines the steps required to transition existing cgroup v1 support into maintenance mode, and as such, no new feature gates are proposed in this document.

Introduction of cgroup version metric

A new metric, kubelet_cgroup_version, is proposed. This metric will report values 1 or 2, indicating whether the host is utilizing cgroup version 1 or 2, respectively. The kubelet will assess the host's cgroup version at startup and emit this metric accordingly.

The introduction of this metric aims to streamline the process for cluster administrators in determining the cgroup version deployed across their hosts. This metric removes the need for manual node inspection, providing a clear insight into the cgroup version each node operates on.

Implementing a warning log and an event for cgroup v1 usage

Starting from 1.31, during kubelet startup if the host is running on cgroup v1, kubelet will log a warning message like,

klog.Warning("cgroup v1 detected. cgroup v1 support has been transitioned into maintenance mode, please plan for the migration towards cgroup v2. More information at https://git.k8s.io/enhancements/keps/sig-node/4569-cgroup-v1-maintenance-mode")

and also emit a corresponding event,

eventRecorder.Event(pod, v1.EventTypeWarning, "CgroupV1", fmt.Sprint("cgroup v1 detected. cgroup v1 support has been transitioned into maintenance mode, please plan for the migration towards cgroup v2. More information at https://git.k8s.io/enhancements/keps/sig-node/4569-cgroup-v1-maintenance-mode"))

Introduce a kubelet flag to disable cgroup v1 support

A new boolean kubelet flag, --disable-cgroupv1-support, will be introduced. By default, this flag will be set to false to ensure users can continue to use cgroup v1 without any issues. The primary objective of introducing this flag is to set it to true in CI, ensuring that all blocking and new CI jobs use only cgroup v2 by default (unless the job explicitly wants to run on cgroup v1).

Code modifications for default cgroup assumptions

Code segments that default to cgroup v1 logic will be inverted to assume cgroup v2 as the default. This shift underscores the transition to cgroup v2.

Original Code Snippet:

	memLimitFile := "memory.limit_in_bytes"
	if libcontainercgroups.IsCgroup2UnifiedMode() {
		memLimitFile = "memory.max"
	}

Revised Code Snippet:

	memLimitFile := "memory.max"
	if !libcontainercgroups.IsCgroup2UnifiedMode() {
		memLimitFile = "memory.limit_in_bytes"
	}

Separation of cgroup v1 and cgroup v2 Code Paths

Within cgroup manager in the kubelet cgroup v1 and v2 code is intertwined. To maintain clear separation and facilitate focused maintenance, CgroupManager interface will have cgroup v1 and v2 specific implementations respectively. The existing common implementation of CgroupManager will be split and a common code will be moved the helper functions.

API Changes

N/A

Test Plan

[X] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.

Prerequisite testing updates

All existing test jobs that use cgroup v2 should continue to pass without any flakiness. Additionally, test jobs for cgroup v1 must also continue to pass, as we will be modifying a significant amount of kubelet code and could inadvertently break v1 as well.

Unit tests

The respective kubelet subcomponents should have unit tests cases to handle cgroup v1 and v2.

Integration tests

N/A

e2e tests
  1. Monitor both cgroup v1 and v2 CI jobs.

  2. Ensure all features coverage by running all tests on cgroup v2 (while some may still run on cgroup v1 to test back compatibility)

  3. Make cgroup v2 host mandatory for new e2e and node e2e tests.

Graduation Criteria

Alpha

This feature won't follow the normal cycle of alpha->beta->GA, and will instead be all implemented in GA

Beta

This feature won't follow the normal cycle of alpha->beta->GA, and will instead be all implemented in GA

GA

  • Kubelet detects the host using cgroup v1, it will not only log a warning message but also generate an event to highlight the cgroup v1 moving to maintenance mode.

  • Introduce a new metric, kubelet_cgroup_version, to provide insights into the cgroup version utilized by the hosts.

  • Introduce a boolean kubelet flag --disable-cgroupv1-support and set it to false by default.

  • Blog post on advantages of using cgroup v2 with kubernetes.

  • Code modifications for to assume cgroup v2 by default. Check this section for details.

  • Ensure all features coverage by running all tests on cgroup v2 (while some may still run on cgroup v1 to test back compatibility)

  • Make cgroup v2 host mandatory for new e2e and node e2e jobs. Set --disable-cgroupv1-support to true for those jobs.

  • Fix all pending known bugs in cgroup v2 support in kubernetes.

  • Separation of cgroup v1 and cgroup v2 Code Paths. Check this section for details.

  • Mark cgroup v1 support in maintenance mode in the documenatation.

Upgrade / Downgrade Strategy

  • For clusters upgrading to a version of Kubernetes where cgroup v1 is in maintenance mode, administrators should ensure that all nodes are compatible with cgroup v2 prior to upgrading. This might include operating system upgrades or workload configuration changes.
  • Downgrading and switching to cgroup v1 requires careful consideration. If users rely on features that only work with cgroup v2, such as swap support, they will need to either discontinue using those features or keep their systems on cgroup v2.

Version Skew Strategy

Kubernetes components that interact with node cgroups should be tolerant of both cgroup v1 and cgroup v2. This includes kubelet, the container runtime interface (CRI) implementations, and any cloud-provider-specific agents running on the node.

Production Readiness Review Questionnaire

Feature Enablement and Rollback

N/A

How can this feature be enabled / disabled in a live cluster?

N/A

Does enabling the feature change any default behavior?

N/A

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

N/A

What happens if we reenable the feature if it was previously rolled back?

N/A

Are there any tests for feature enablement/disablement?

N/A.

Rollout, Upgrade and Rollback Planning

How can a rollout fail? Can it impact already running workloads?

N/A

What specific metrics should inform a rollback?

kubelet_cgroup_version metric should be used to make determine the cgroup version on the cluster nodes.

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

N/A

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

No

Monitoring Requirements

How can someone using this feature know that it is working for their instance?

A Warning log as well as an event will be emitted about cgroup v1 maintenance mode when the hosts are still using cgroup v1 from 1.31 onwards.

User will also be able to probe the cgroup version on the hosts using the metric kubelet_cgroup_version.

How can an operator determine if the feature is in use by workloads?

Operators can use kubelet_cgroup_version metric to determine the version of the cgroup on the cluster hosts. They can also monitor the log and event as described in this section.

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
What are the reasonable SLOs (Service Level Objectives) for the above SLIs?

N/A

Are there any missing metrics that would be useful to have to improve observability of this feature?

N/A

Dependencies

Does this feature depend on any specific services running in the cluster?

No.

Scalability

Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

No.

Will enabling / using this feature result in any new API calls?

No.

Will enabling / using this feature result in introducing new API types?

No.

Will enabling / using this feature result in any new calls to the cloud provider?

No.

Will enabling / using this feature result in increasing size or count of the existing API objects?

No.

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

No.

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?

No.

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?

No.

What are other known failure modes?

Kubernetes components are compatible with both cgroup v1 and cgroup v2. The failure can occur within workload if it depends on the cgroup version and does not support the version used by the host. But such workload related failures are outside the scope of kubernetes.

What steps should be taken if SLOs are not being met to determine the problem?

N/A

Implementation History

  • 2024-04-05: KEP for moving cgroup v1 to maintenance mode.

Drawbacks

Moving cgroup v1 to maintenance mode presents transitional challenges, including:

  1. Operational Overhead: Migrating to cgroup v2 requires updating underlying hosts, imposing significant operational efforts.

  2. Compatibility Concerns: Workloads or tools not yet adapted for cgroup v2 may experience compatibility issues, despite ongoing community efforts to ensure broad compatibility.

Alternatives

An alternative to moving cgroup v1 into maintenance mode would be to continue its full support. However, this approach would prevent users from accessing the improvements and features available in cgroup v2. Additionally, it would expose them to risks as key subsystems such as systemd and major operating systems like RHEL 9 have already deprecated or are planning to deprecate cgroup v1 support. This could lead to compatibility and maintenance issues in the future.

Another option could be deprecating cgroup v1 support with the eventual goal of removing it altogether. While this approach might accelerate the adoption of cgroup v2, it could significantly impact users who rely on legacy versions of operating systems or kernels that remain supported under long-term support plans, potentially creating substantial challenges for maintaining their systems.