-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mutable upgrades: upgrade the kube-proxy DS after all control-plane components #2346
Comments
Don't know how hard it would be but could the proxy maybe be converted to a static pod manifest to solve this?
This should also play nicely with immutable nodes. |
having kube-proxy as a static pod would definitely help around the fact kubelet and kube-proxy versions need to be in sync during upgrade. there are pros on cons of both DS vs static pods. in general we are likely to see users preferring to run replicas of kube-proxy on all nodes (ala DS), and for multi-platform support variants of the DS can be deployed, which is what the kubeadm for Windows deployment is doing today. this also opens some questions around instance specific configuration, and how to manage being possible to configure kube-proxy differently on different nodes: at this point however it is quite the breaking change to user expectations - e.g. if they have patches for the podspec of the DS to pass extra flags or modify pod resources. wondering if we are pass the point of changing this.
cluster-api discussion from today that mentions nodeselectors / labels and multiple DSes as one way of solving the problems for immutable upgrades: |
@neolit123 thanks for raising this point. If I got this right v of kube-proxy is
This is something that should be discussed with focus to:
WRT to the latest point, the idea of an operator managing automatically the merge of cluster level and node level setting would be a perfect fit for preserving a nice user experience no matter of how kube-proxy is deployed.... |
Ok given there is interest in the ststic pod approach we should raise it
for discussion in the next sig meeting. Its tuesday next week.
|
Just to return to this, have been reviewing the Draft STIG for Kubernetes, and CNTR-K8-000440 states the following:
FWIW, I have requested that we give feedback for the following:
However, I can understand that they will want to stand by not enabling static pods on nodes, which means we may not want to pursue converting the kube-proxy deployment to a static pod. |
not enabling static pods at all is disagreement with the k8s project maintainers basically.
are these requested changes for them to do?
i don't like kube-proxy as a static pod mainly because it imposes tight coupling with the kubeadm binary. i.e. it get's promoted from a optional addon that may eventually get applied via kubectl and templated manifests to something that is hardcoded or available via an interface that requires root (i.e. an addon interface that deploys static pods). |
It's part of the the US DOD
Yes, I've requested the changes.
Ah, ok. I thought that maybe work had started on converting it to a static pod. |
How about creating a job to do the upgrade of kube-proxy? The job will wait for all apiserver versions to the newer one, and then upgrade kube-proxy. We have used similar logic in some cluster installation for special components like cni initial and storage components. |
but that would still happen before the kubelet upgrade on the node? at this point i'm -1 on moving kube-proxy to a static pod. the k-sigs/cluster-addons had some ideas around a kube-proxy operator and we do like to move kube-proxy management outside of kubeadm eventually. but also it seems the kubelet/kube-proxy skew does not matter for the time being: |
Agree, an external kube-proxy operator would be a better option. |
i'm going to go ahead and close this after the discussion here.
if kube-apiserver / kube-proxy start having more drift in features the skew can become a problem here...but the system should fix itself after some time. what should be a concern is potential downtime while unsupported skew is in effect during upgrade. |
that is not a bad idea. we need to investigate the corner cases around it and how to detect the apiserver versions. to me this feels safer:
|
Is this target of 1.22? Add to my TODO list then. I'd like to write it next week. |
given we haven't seen actual problems due to the proxy / server skew during upgrade it's not urgent. ok, i'm going to re-open this and tag with 1.22 milestone. |
@neolit123 I write an initial proposal on https://docs.google.com/document/d/1tjAp9QRJgwoyheMvgLjUHbD7MYTdDP2jSOlf6bAtuZM/edit?usp=sharing. It can be edited. |
thank you @pacoxu ! i've added some comments and i like the idea. cc @fabriziopandini @SataQiu please take a look. |
i think yes. ideally all users should just move to use the standard code path with the FG on, but for the rest of the users we should have signal that the old code path works. we can clean this test in 1.29. |
@neolit123 When I tried to add the e2e tests, I found that kubeadm disallowed the deprecated feature gate: if featureSpec.PreRelease == featuregate.Deprecated {
return nil, errors.Errorf("feature-gate key is deprecated: %s", k)
} // Check if feature gate flags used in the cluster are consistent with the set of features currently supported by kubeadm
if msg := features.CheckDeprecatedFlags(&features.InitFeatureGates, cfg.FeatureGates); len(msg) > 0 {
for _, m := range msg {
printer.Printf(null, m)
}
return nil, nil, nil, errors.New("[upgrade/config] FATAL. Unable to upgrade a cluster using deprecated feature-gate flags. Please see the release notes")
} I haven't found this limitation in any code other than kubeadm. |
maybe it can klog.Warning in those codepaths. +1 |
how do core components signal to the user an FG is deprecated; is it docs only? if core (e.g. kubelet) prints warnings we probably should too. |
+1 for a warning only. BTW, there is no warning message for the GAed FG in kubeadm, and do we also need to add it like above? |
Even we fix it as a bugfix with a deprecated FG UpgradeAddonsBeforeControlPlane, we may keep this FG for another release cycle before removing it. I prefer to remove it in v1.31 or later. @SataQiu @neolit123
|
I'm also +1 to keep this FG for some release cycles before removing it. |
i think we all agree that 1.29 is too soon. |
With kubernetes/enhancements#4330 WIP, we may have a better solution for this scenario using the compatibility version flag. |
unclear to me how we can leverage the new flags from that KEP yet. |
You are right. This solution is still needed with the kep. If the upgrade path changed to something like upgrading every control plane components to new version with '--compatibility-version=perious-version' and then after every control plane upgraded, then we change compatibility-version to the new one. As the kube-proxy skew policy is not changed, it should run with n or n+1 apiserver, we should upgrade it after all control plane are upgraded. |
yeah, but kubeadm has no way to automate the flag --compatibility-version change during and after upgrade, which means users will have to trigger it manually somehow. and to do that we need to add more kubeadm upgrade commands (a new phase maybe, but also "upgrade apply" doesn't have phases yet). i think we much rather just solve the skew problem with lkube-proxy as proposed in this ticket. |
@SataQiu WDYT, should we remove it in 1.30, or 1.31? it would require updating docs and e2e too: i can handle the docs for 1.30 as i have an open placeholder PR |
this has been completed. |
during mutable upgrades, upgrade kube-proxy only after the kube-apiserver on all nodes have been upgraded.
currently kubeadm does the following:
the problem with that is that it can create issues for a period of time when kube-proxy's version is ahead of a kube-apiserver version in an HA setup.
https://kubernetes.io/docs/setup/release/version-skew-policy/#kube-proxy
there is also another problem where kube-proxy would be updated before the kubelets on nodes and the policy suggests they should match.
this would require:
kubeadm upgrade apply phases #1318
https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
TODO List:
https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#feature-gates
The text was updated successfully, but these errors were encountered: