Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle kubelet skew with and without dockershim support #2626

Closed
7 of 8 tasks
neolit123 opened this issue Dec 10, 2021 · 25 comments
Closed
7 of 8 tasks

handle kubelet skew with and without dockershim support #2626

neolit123 opened this issue Dec 10, 2021 · 25 comments
Assignees
Labels
area/cri kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@neolit123
Copy link
Member

neolit123 commented Dec 10, 2021

dockershim was removed from the kubelet in early 1.24 development. related flags are also being removed.

history here:
#1412

a few tasks need to be performed to adapt kubeadm for 1.24 and later.

1.24:

1.25:
in kubeadm 1.25 kubelet 1.23 will go out of support, because kubeadm 1.25 would only support kubelet 1.25 and 1.24:

1.26

1.27

@neolit123 neolit123 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Dec 10, 2021
@neolit123 neolit123 added this to the v1.24 milestone Dec 10, 2021
@neolit123
Copy link
Member Author

neolit123 commented Dec 10, 2021

first 1.24 PR is here:
kubernetes/kubernetes#106973

@pacoxu
Copy link
Member

pacoxu commented Dec 14, 2021

I may take change autodetection to not mix containerd and docker sockets for docker 18.xx+ task.
/assign

@neolit123
Copy link
Member Author

neolit123 commented Dec 14, 2021 via email

@neolit123
Copy link
Member Author

second PR that changes the kubeadm defaults / auto detection is here:
kubernetes/kubernetes#107317

@neolit123
Copy link
Member Author

tasks are completed for 1.24, moving to 1.25 milestone.

@pacoxu pacoxu removed their assignment Feb 15, 2022
@neolit123 neolit123 removed the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Feb 26, 2022
@neolit123 neolit123 added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label May 12, 2022
@neolit123
Copy link
Member Author

1.25 cleanup PR is here:
kubernetes/kubernetes#110022

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@neolit123 neolit123 added this to the v1.29 milestone Jul 21, 2023
@neolit123
Copy link
Member Author

neolit123 commented Oct 13, 2023

so, one issue is that since we now support kubelet skew of N-3, we can't remove the flag --pod-infra-container-image in kubeadm deployments until 1.29 kubelet goes out of support (if my math is correct). EDIT: or 1.30 if the kubelet removes the flag in 1.30...

  • we can do this - stop applying the flag for new clusters created with kubelet >= 1.29
  • but we cannot do this - remove the flag from systemd drop-ins that kubeadm uses to start the kubelet. kubelet on workers is upgraded out of band and we don't know what version the user will choose.
    https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/upgrading-linux-nodes/
    if they choose 1.29 we could remove the flag from the systemd file, but if the kubelet is older we cannot.

up until now we waited 2 releases before removing a flag, because the skew was N-1. but N-3 changes this.
so we either need to modify kubeadm upgrade to wait for kubelet upgrade, check the version and then perform the flag cleanup (behavior change) or ask the kubelet maintainers to delay the removal of --pod-infra-container-image, until 1.29 goes out of support (if my math is correct).

@liggitt this is an example of the complexity at hand for N-3.
cc @SataQiu @pacoxu @afbjorklund

x-posting a note here:
kubernetes/kubernetes#106893 (comment)

@liggitt
Copy link
Member

liggitt commented Oct 13, 2023

having some way to indicate the version kubeadm should make a config for seems like it could help simplify this

@neolit123
Copy link
Member Author

having some way to indicate the version kubeadm should make a config for seems like it could help simplify this

agreed, and seems like an oversight that kubeadm upgrade is not wrapping this kubelet upgrade and checking what version if kubelet the user installs, and then perform some action.

one quick fix might be to call "kubeadm upgrade ..." another time after the user upgrades a kubelet on a node. for control plane nodes this will be a bit slow.

what i don't like is the deprecation progression of the --pod-infra-container-image flag in the kubelet and i think it is messy.

  • the flag was needed to pin the pause image and prevent GC
  • in 1.29 the flag is no-op-ed
  • in 1.30 it is planned for removal

that doesn't seem right, and will force all tools on top of kubelet to have version branching around the k8s support skew. if they care about the GC problem, that is..

https://github.com/kubernetes/kubernetes/pull/118544/files#r1359721938

@liggitt
Copy link
Member

liggitt commented Oct 15, 2023

what i don't like is the deprecation progression of the --pod-infra-container-image flag in the kubelet and i think it is messy.

  • the flag was needed to pin the pause image and prevent GC
  • in 1.29 the flag is no-op-ed
  • in 1.30 it is planned for removal

Making it easy for tools to keep setting the no-op flag until the release where it was needed hits EOL would be nice, especially if it is ~zero cost for sig-node. I'd ask them about that.

@neolit123 neolit123 modified the milestones: v1.29, v1.30 Nov 3, 2023
@carlory
Copy link
Member

carlory commented Nov 24, 2023

Hi @liggitt any update about it?

kubeadm and scripts are still using this flag in 1.29. Can we remove this flag from kubelet in 1.30?

if cleanFlagSet.Changed("pod-infra-container-image") {
klog.InfoS("--pod-infra-container-image will not be pruned by the image garbage collector in kubelet and should also be set in the remote runtime")
_ = cmd.Flags().MarkDeprecated("pod-infra-container-image", "--pod-infra-container-image will be removed in 1.30. Image garbage collector will get sandbox image information from CRI.")
}

Or update the deprecated message with a different release?

@liggitt
Copy link
Member

liggitt commented Nov 27, 2023

In what release can we count on CRI having the info required so --pod-infra-container-image no longer needs to be passed?

@liggitt
Copy link
Member

liggitt commented Nov 27, 2023

in 1.29 the flag is no-op-ed

If the flag is no-op and cheap for node to keep around, I'd ask if they can keep it until 1.29 is the oldest release.

Separately, kubeadm knowing what kubelet version it is generating a config for seem like a really important gap to close, and would let kubeadm stop setting this flag for >=1.29 kubelets.

@neolit123
Copy link
Member Author

In what release can we count on CRI having the info required so --pod-infra-container-image no longer needs to be passed?

@adisky do you know?
the problem here is the variance in CRI implementor schedules - e.g. if cri-o implements the latest CRI that has the feature, that would mean that users need to update cri-o, updating kubelet will not suffice.

in 1.29 the flag is no-op-ed

If the flag is no-op and cheap for node to keep around, I'd ask if they can keep it until 1.29 is the oldest release.

+1, but it seems if the flag is no-op-ed this would mean that users need to upgrade their container runtime as well.
otherwise there will be nothing to handle the GC problem (CR is old, kubelet is no-op).

Separately, kubeadm knowing what kubelet version it is generating a config for seem like a really important gap to close, and would let kubeadm stop setting this flag for >=1.29 kubelets.

yes, that is the upgrade problem discussed earlier.
#2626 (comment)
currently the kubelet binary upgrade is done after kubeadm upgrade and is manually done by the user.
it needs discussion - we need to figure out what to do with this.

@pacoxu
Copy link
Member

pacoxu commented Dec 22, 2023

kubernetes/kubernetes#106893 (comment) some updates about the pinned image feature in containerd side:

From containerd side, containerd v1.7.3 and containerd 1.6.22 includes the support of pinned image:

However, there are still two bugfix pull requests in progress:

EDITED Add below context
The two WIP PR are not blockers.
containerd/containerd#9381 is not a blocker as well.(It will pin some more than expected.)

To use that feature, we at least should use containerd v1.7.3+ and containerd 1.6.22+.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 24, 2024
@carlory
Copy link
Member

carlory commented Mar 25, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 25, 2024
@neolit123 neolit123 modified the milestones: v1.30, v1.31 Apr 5, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 4, 2024
@neolit123 neolit123 removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 4, 2024
@neolit123 neolit123 modified the milestones: v1.31, v1.32 Aug 7, 2024
@neolit123
Copy link
Member Author

cleanup remaining flags pod-infra-container-image .. ?
kubernetes/kubernetes#106893

scoped this item in a dedicated ticket:

all other AIs are done here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cri kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

7 participants