Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adjust the kubeadm / kubelet skew policy #2924

Closed
2 tasks done
NorthFuture opened this issue Sep 4, 2023 · 20 comments
Closed
2 tasks done

adjust the kubeadm / kubelet skew policy #2924

NorthFuture opened this issue Sep 4, 2023 · 20 comments
Labels
area/controlplane area/kubelet kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@NorthFuture
Copy link

NorthFuture commented Sep 4, 2023

edit by neolit123

action items:

  • adjust all kubelet skew constants / checks for init/join/upgrade.
    kubeadm: adjust kubeadm skew policy for upgrades kubernetes#120825
  • update the "create a cluster with kubeadm" page where our skew is documented. it must be for the release-1.29 k/website branch.
    v1.29: kubeadm skew policy for kubelet is n-3 website#43769
    - [ ] add the -f flag for upgrade node. this is a nice to have, we missed 1.29, so it can be added in 1.30.
    TODO

    - [ ] add one new e2e test, that upgrades with the new kubelet skew without -f.
    (this is actually difficult because -f is required in our CI; it's used for kubeadm to allow upgrading to a pre-release / CI artifact)
    TODO

    - [ ] in 1.32 remove --ignore-preflight-errors=KubeletVersion from the kinder kubelet skew jobs see this note for details
    TODO

UPDATE
not needed
#2944 (review)

how was 1.32 established (UPDATE note needed)

kubeadm 1.29 is the first release that supports the new skew
kubeadm 1.29 supports deploying kubelet 1.29, 1.28, 1.27, 1.26
k8s (kubeadm) support window is 3x releases at a time
target kubeadm version to drop the preflight = when 1.28 goes out of support / 1.32 is released.
note: the kubeadm/CP skew is ignored in this case even if the kubelet/CP skew is the actual target of this change

FEATURE REQUEST:

with kubernetes 1.28 the skew policy for control plane components has been updated, and now you can have control plane components that's three versions ahead than kubelets.

However kubeadm has still a n-1 skew policy. Such policy prevents to skip some kubelet upgrates of worker nodes, that could save a lot of time during upgrades in large clusters.

edit(neolit123): KEP LINK:
Support Oldest Node And Newest Control Plane
https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/3935-oldest-node-newest-control-plane

Versions

kubeadm version (use kubeadm version): 1.28

Environment:

  • Kubernetes version (use kubectl version): 1.28
  • Cloud provider or hardware configuration: N/A
  • OS (e.g. from /etc/os-release): N/A
  • Kernel (e.g. uname -a): N/A
  • Container runtime (CRI) (e.g. containerd, cri-o): N/A
  • Container networking plugin (CNI) (e.g. Calico, Cilium): N/A
  • Others:

What happened?

during kubeadm upgrade apply the upgrade processes si stopped with the following error

  • There are kubelets in this cluster that are too old that have these versions [v1.x.yy]

What you expected to happen?

To be able to upgrade the control plane to 1.28 even if the cluster has 1.26 or 1.25 kubletes.

@neolit123 neolit123 added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 4, 2023
@neolit123 neolit123 added this to the v1.29 milestone Sep 4, 2023
@neolit123 neolit123 added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. area/kubelet area/controlplane labels Sep 4, 2023
@neolit123
Copy link
Member

neolit123 commented Sep 4, 2023

@NorthFuture thanks for logging the issue.

Such policy prevents to skip some kubelet upgrates of worker nodes, that could save a lot of time during upgrades in large clusters.

upgrade apply has -f where it forces the upgrade.
but upgrade node for workers lacks the -f:
https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-upgrade/

one quick workaround is to add the flag.

but another workaround is to just skip the kubelet-config phase with --skip-phases
download it manually from kube-system/kubelet-config, and restart the kubelet (systemd) with the desired version.
https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-upgrade/#cmd-upgrade-node

However kubeadm has still a n-1 skew policy.

as noted on slack:
https://kubernetes.slack.com/archives/C09NXKJKA/p1693814176564259

it's actually two polices:
kubeadm vs kubelet is n-1
kubeadm vs control plane is n-1

you can read more about the current state of the kubeadm skew here:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#version-skew-policy

so technically if the user deploys kubeadm version == control-plane version and we extend the kubelet skew to n-2 we will align with the new policy.

@neolit123
Copy link
Member

@SataQiu @pacoxu @chendave

but upgrade node for workers lacks the -f
...
one quick workaround is to add the flag.

should we add this flag and would it help?

so technically if the user deploys kubeadm version == control-plane version and we extend the kubelet skew to n-2 we will align with the new policy.

this however means we need to support it. if the kubelet starts making some drastic changes we will need to spend time maintaining this n-2 skew

one good aspect is that the kubelet has not been chaning much and we already have e2e tests for this unsupported by kubeadm skew:
https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm

see the kubeadm-kinder-kubelet-x-on-y tests all the way to n-3 (e.g. k8s at 1.28, kubelet at 1.25), which tests were requested by SIG node at some point.

WDYT?

@pacoxu
Copy link
Member

pacoxu commented Sep 4, 2023

this seems to be valid for annual node upgrade. I need to check the kep by Jordon tomorrow to confirm about the details(that is valid kubelet skew)

@NorthFuture
Copy link
Author

NorthFuture commented Sep 4, 2023

upgrade apply has -f where it forces the upgrade.

Just a consideration on the usage of this flag: while it's true that the upgrade can be forced, usually I'm not inclined to force some action when there's a blocking error (if someone put in place a blocking error, there might a reason 😄 )

if kubeadm skew policy is extended, no -f flag is required correct?

@neolit123
Copy link
Member

upgrade apply has -f where it forces the upgrade.

Just a consideration on the usage of this flag: while it's true that the upgrade can be forced, usually I'm not inclined to force some action when there's a blocking error (if someone put in place a blocking error, there might a reason 😄 )

if kubeadm skew policy is extended, no -f flag is required correct?

agreed that not blocking with a skew error is the preferred action. the flag is just missing for "upgrade node", which is more of a side issue.

@SataQiu
Copy link
Member

SataQiu commented Sep 5, 2023

We can add -f flag for upgrade node as a workaround in the short term, and make it consistent with upgrade apply.

I found that kubeadm upgrade node can work well without the -f flag because it doesn't check the kubelet version skew.
The preflight will only execute RunRootCheckOnly and RunPullImagesCheck(when it is a control-plane node) .
https://github.com/kubernetes/kubernetes/blob/7e9fbc449ddccab5be18d7d5fa0a4158ec8227f2/cmd/kubeadm/app/cmd/phases/upgrade/node/preflight.go#L45-L75

But ideally, I think we'd better make kubeadm align with the skew strategy of Kubernetes.
How about adjusting kubeadm upgrade apply to allow lower kubelet versions?

@pacoxu
Copy link
Member

pacoxu commented Sep 5, 2023

https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/3935-oldest-node-newest-control-plane

There is an upgrade proposal to make annual node upgrade possible:

  1. Begin: control plane and nodes on v1.40
  2. Control plane upgrade: v1.40 → v1.41 → v1.42 → v1.43
  3. Node upgrades: v1.40 → v1.43

@neolit123 neolit123 changed the title kubeadm skew policy for upgrades adjust kubeadm skew policy for upgrades Sep 8, 2023
@liggitt
Copy link
Member

liggitt commented Sep 18, 2023

but upgrade node for workers lacks the -f
...
one quick workaround is to add the flag.

should we add this flag and would it help?
...
but another workaround is to just skip the kubelet-config phase with --skip-phases

In the short-term, having --force, --skip-phases, and --ignore-preflight-errors supported across kubeadm commands would make kubeadm more consistent and would be really helpful, especially so consumers that just want to tolerate kubelet skew (until kubeadm expands to match core components supported skew) could narrowly skip that with --ignore-preflight-errors=KubeletVersion instead of skipping all checks with --force.

this however means we need to support it. if the kubelet starts making some drastic changes we will need to spend time maintaining this n-2 skew

Making sure node folks are aware of tools (like kubeadm) wanting to manage nodes with consistent flags/config across supported versions would help inform / temper drastic changes they are considering. Even if new features add optional flags / config, being careful about new required flags / config would make this skewed support easier to maintain.

one good aspect is that the kubelet has not been changing much and we already have e2e tests for this unsupported by kubeadm skew:
https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm

+100 on already having visibility to how well this works for some kubeadm commands (and it's been working well). Having visibility to whether it actually works to use kubeadm to upgrade a control-plane while nodes are at n-1 or n-2 (using --force or --ignore-preflight-errors, etc) would be a great next step, would help node folks see impact of kubelet changes early, and would help cluster-lifecycle judge the stability of this operation before committing to support it officially.

But ideally, I think we'd better make kubeadm align with the skew strategy of Kubernetes. How about adjusting kubeadm upgrade apply to allow lower kubelet versions?

This would be my ideal as well. For the folks that know kubeadm well:

  • is the primary question whether kubelet command-line or config will require a change (either dropping support for some flag/field or requiring some new flag/field) that will make kubeadm start to have to do multiple version-specific kubelet configurations?
  • Is version-specific node configuration possible today in kubeadm? Is it relatively clean to maintain?
  • Have you talked with node folks about plans for rolling out kubelet flag / config changes in a way that helps cluster admins configuring multiple node versions?

@neolit123
Copy link
Member

but another workaround is to just skip the kubelet-config phase with --skip-phases

In the short-term, having --force, --skip-phases, and --ignore-preflight-errors supported across kubeadm commands would make kubeadm more consistent and would be really helpful, especially so consumers that just want to tolerate kubelet skew (until kubeadm expands to match core components supported skew) could narrowly skip that with --ignore-preflight-errors=KubeletVersion instead of skipping all checks with --force.

+1

this however means we need to support it. if the kubelet starts making some drastic changes we will need to spend time maintaining this n-2 skew

Making sure node folks are aware of tools (like kubeadm) wanting to manage nodes with consistent flags/config across supported versions would help inform / temper drastic changes they are considering. Even if new features add optional flags / config, being careful about new required flags / config would make this skewed support easier to maintain.

our tests can hopefully catch such changes.

one good aspect is that the kubelet has not been changing much and we already have e2e tests for this unsupported by kubeadm skew:
https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm

+100 on already having visibility to how well this works for some kubeadm commands (and it's been working well). Having visibility to whether it actually works to use kubeadm to upgrade a control-plane while nodes are at n-1 or n-2 (using --force or --ignore-preflight-errors, etc) would be a great next step, would help node folks see impact of kubelet changes early, and would help cluster-lifecycle judge the stability of this operation before committing to support it officially.

yes, we are going to have to add an explicit e2e test for the n-2 kubelet upgrade.

  • is the primary question whether kubelet command-line or config will require a change (either dropping support for some flag/field or requiring some new flag/field) that will make kubeadm start to have to do multiple version-specific kubelet configurations?

i was thinking that likely flags/config changes can break us.

for example the --bootstrap-kubeconfig and --kubeconfig flag of the kubelet has been deprecated (?) and planned for removal, that was discussed in an issue somewhere.

the kubelet v1beta1 does not have the fields for these options yet.

as an example, and if this is still in the plans, the kubelet maintainers must carefully execute the removal of the flag and addition of the options in config. this will have a wider effect than kubeadm.
for kubeadm we can certainly adapt somehow, but the change may not be so easy since the flags are hardcoded in systemd files, distributed in packages.

  • Is version-specific node configuration possible today in kubeadm? Is it relatively clean to maintain?

in the past we have done different kubelet flags/fields for different kubelet versions, which is just branching in the kubeadm code that manages the kubelet and it is kept for 1,2 releases with a TODO for a later cleanup.

there is no persistent node specific component configuration support per se on the API server side, but there is:

  • custom kubelet flags per this node stored in a systemd environment file
  • kubelet v1beta1 patches that persist on disk
  • Have you talked with node folks about plans for rolling out kubelet flag / config changes in a way that helps cluster admins configuring multiple node versions?

AFAIK, no.
the latest work in this area was kubernetes/enhancements#3983

IIRC, the rule in the kubelet is to always add a field in the config, the corresponding CLI flag may or may not be added (?).

not having the micro-versions or history in the kubeletconfiguration makes it a bit difficult for the admin to determine what version of the API has the new option Foo. they could still use a single file that has Foo even for kubelet versions that do not support the option, since the kubelet would not warning/error for unknown fields when parsing the API.

kubeadm will throw a warning for the unknown field depending on the kubelet public type it imported.

@pacoxu pacoxu self-assigned this Sep 19, 2023
@pacoxu
Copy link
Member

pacoxu commented Sep 19, 2023

  • is the primary question whether kubelet command-line or config will require a change (either dropping support for some flag/field or requiring some new flag/field) that will make kubeadm start to have to do multiple version-specific kubelet configurations?

i was thinking that likely flags/config changes can break us.

for example the --bootstrap-kubeconfig and --kubeconfig flag of the kubelet has been deprecated (?) and planned for removal, that was discussed in an issue somewhere.

the kubelet v1beta1 does not have the fields for these options yet.

as an example, and if this is still in the plans, the kubelet maintainers must carefully execute the removal of the flag and addition of the options in config. this will have a wider effect than kubeadm. for kubeadm we can certainly adapt somehow, but the change may not be so easy since the flags are hardcoded in systemd files, distributed in packages.

Can we keep the kubelet configuration as is if the kubelet version is n-1 or n-2 which is not the same as the kubeadm version? Then kubelet will have no risk of being corrupted.

When we upgrade the kubelet and run the kubeadm upgrade node again, we can update the kubelet configuration at that time.

We only keep the control-plane version compatible kubelet configurations in the configmap kubelet-config.

@neolit123
Copy link
Member

neolit123 commented Sep 19, 2023

Can we keep the kubelet configuration as is if the kubelet version is n-1 or n-2 which is not the same as the kubeadm version? Then kubelet will have no risk of being corrupted.

When we upgrade the kubelet and run the kubeadm upgrade node again, we can update the kubelet configuration at that time.

We only keep the control-plane version compatible kubelet configurations in the configmap kubelet-config.

we are going to continue managing a single kubeletconfiguration config map for now. but eventually if kubeletconfiguration v1 is released and v1beta1 deprecated and removed, we need to plan how to upgrade users and whether we need to store both v1 and v1beta1 in a config map, temporarily.

@liggitt
Copy link
Member

liggitt commented Sep 19, 2023

Given how long v1beta1 kubelet config has been around, supporting it in parallel with any eventual v1 config file for several releases (maybe 4 so n-3 would work) would seem reasonable to me.

If kubeadm folks have an idea of how parallel v1beta1 and v1 config blobs could be provided as well, that's another possibility.

Sounds like there's no known issues that make this impossible, more uncertainty about kubelet config evolution and compatibility across multiple versions. I think sig-node would be amenable to making any config transitions as easy as possible for cluster admins (like kubeadm and others).

@pacoxu
Copy link
Member

pacoxu commented Sep 22, 2023

Summary of what may need an update later(if we make the skew policy of kubelet can be n-3 of kube-apiserver):

  1. upgrade apply with some node with n-2/n-3 kubelet: detect too old kubelets
    https://github.com/kubernetes/kubernetes/blob/4eb6b3907a68514e1b2679b31d95d61f4559c181/cmd/kubeadm/app/phases/upgrade/policy.go#L170-L189

// newK8sVersion.Minor() > kubeletVersion.Minor()+MaximumAllowedMinorVersionKubeletSkew
MaximumAllowedMinorVersionKubeletSkew = 1 to 3.

// kubeadm upgrade apply v1.28.1
[upgrade/version] FATAL: the --version argument is invalid due to these errors:

	- There are kubelets in this cluster that are too old that have these versions [v1.26.0]

Can be bypassed if you pass the --force flag

workaround: kubeadm upgrade apply -f v1.28.1

  1. join node kubelet n-2 :
    https://github.com/kubernetes/kubernetes/blob/4eb6b3907a68514e1b2679b31d95d61f4559c181/cmd/kubeadm/app/preflight/checks.go#L610-L616

MinimumKubeletVersion = getSkewedKubernetesVersion(-1) to -3.

	[ERROR KubeletVersion]: Kubelet version "1.26.0" is lower than kubeadm can support. Please upgrade kubelet

workaround: --ignore-preflight-errors=KubeletVersion

@pacoxu
Copy link
Member

pacoxu commented Sep 22, 2023

@neolit123 should we make it in v1.29? Are there any other action items before this?

@neolit123
Copy link
Member

neolit123 commented Sep 22, 2023

@neolit123 should we make it in v1.29? Are there any other action items before this?

we can try for 1.29.

actions:

  • add the -f flag for upgrade node.
  • adjust all kubelet skew constants / checks for init/join/upgrade.
  • update the "create a cluster with kubeadm" page where our skew is documented. it must be for the release-1.29 k/website branch.
  • add one new e2e test, that upgrades with the new kubelet skew without -f.

@lalitc375
Copy link

I can see that the skew policy has been updated. @pacoxu Are you also adding the e2e test ?

@pacoxu
Copy link
Member

pacoxu commented Oct 18, 2023

I can see that the skew policy has been updated. @pacoxu Are you also adding the e2e test ?

Already some e2e using ignorePreflightErrors:KubeletVersion to workaround.
#2944 to update the CI.

@neolit123
Copy link
Member

neolit123 commented Nov 13, 2023

I can see that the skew policy has been updated. @pacoxu Are you also adding the e2e test ?

Already some e2e using ignorePreflightErrors:KubeletVersion to workaround. #2944 to update the CI.

updated the OP here, with some notes on what we decided to do with our existing kubelet skew jobs in terms of this preflight error:
#2924 (comment)

@neolit123 neolit123 modified the milestones: Next, v1.30 Nov 13, 2023
@neolit123 neolit123 changed the title adjust kubeadm skew policy for upgrades adjust the kubeadm / kubelet skew policy Nov 30, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 28, 2024
@neolit123
Copy link
Member

@neolit123 should we make it in v1.29? Are there any other action items before this?

we can try for 1.29.

actions:

  • add the -f flag for upgrade node.

we realized this is not needed, as ignoring the preflight error (optionally) is enough.

  • adjust all kubelet skew constants / checks for init/join/upgrade.

done

  • update the "create a cluster with kubeadm" page where our skew is documented. it must be for the release-1.29 k/website branch.

done

  • add one new e2e test, that upgrades with the new kubelet skew without -f.

we understood we cannot remove the -f for upgrade apply due to how -f is designed and used to also for pre-release versions.

i consider the tasks here done on a best effort.
on a best effort we also maintain the kubelet skew with the kubeadm code and e2e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controlplane area/kubelet kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

8 participants