diff --git a/keps/prod-readiness/sig-network/3705.yaml b/keps/prod-readiness/sig-network/3705.yaml new file mode 100644 index 00000000000..6e6e5a5e6b9 --- /dev/null +++ b/keps/prod-readiness/sig-network/3705.yaml @@ -0,0 +1,6 @@ +# The KEP must have an approver from the +# "prod-readiness-approvers" group +# of http://git.k8s.io/enhancements/OWNERS_ALIASES +kep-number: 3705 +alpha: + approver: "wojtek-t" diff --git a/keps/sig-network/3705-cloud-node-ips/README.md b/keps/sig-network/3705-cloud-node-ips/README.md index 70def8561e9..7baba727442 100644 --- a/keps/sig-network/3705-cloud-node-ips/README.md +++ b/keps/sig-network/3705-cloud-node-ips/README.md @@ -20,6 +20,10 @@ - [Integration tests](#integration-tests) - [e2e tests](#e2e-tests) - [Graduation Criteria](#graduation-criteria) + - [Alpha](#alpha) + - [Beta](#beta) + - [GA](#ga) + - [Deprecation](#deprecation) - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) - [Version Skew Strategy](#version-skew-strategy) - [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) @@ -94,6 +98,9 @@ using a cloud provider. This KEP proposes to fix that. - Update the code in `k8s.io/cloud-provider/node/helpers` to implement the needed algorithms for the new behaviors. +- Rename the `alpha.kubernetes.io/provided-node-ip` annotation, which + has not been alpha for a long time. + ### Non-Goals - Changing the behavior of nodes using legacy cloud providers. @@ -146,7 +153,7 @@ If `--cloud-provider` and `--node-ip` are both specified (and annotation to the node, `alpha.kubernetes.io/provided-node-ip`. Cloud providers expect this annotation to conform to the current expected `--node-ip` syntax (ie, a single value); if it does not, then they -will log an error and, not remove the +will log an error and not remove the `node.cloudprovider.kubernetes.io/uninitialized` taint from the node, causing the node to remain unusable until kubelet is restarted with a valid (or absent) `--node-ip`. @@ -208,25 +215,45 @@ will try to parse it as a single IP address, fail, log an error, and leave the node in the tainted state, which is exactly what we wanted it to do if it can't interpret the `--node-ip` value correctly. -Therefore, we do not need a new annotation for the new `--node-ip` +Therefore, we do not _need_ a new annotation for the new `--node-ip` values; we can continue to use the existing annotation, assuming existing cloud providers will treat unrecognized values as errors. -``` -<<[UNRESOLVED annotation-name ]>> - -The annotation name is `alpha.kubernetes.io/provided-node-ip` -but it hasn't been "alpha" for a long time. Should we rename it? In -that case, we probably need to keep supporting both versions for a -while. - -<<[/UNRESOLVED]>> -``` +That said, the existing annotation name is +`alpha.kubernetes.io/provided-node-ip` but it hasn't been "alpha" for +a long time. We should fix this. So: + + 1. When `--node-ip` is unset, kubelet should delete both the legacy + `alpha.kubernetes.io/provided-node-ip` annotation and the new + `kubernetes.io/provided-node-ip` annotation (regardless of + whether the feature gate is enabled or not, to avoid problems + with rollback and skew). + + 2. When the `CloudNodeIPs` feature is enabled and `--node-ip` is + set, kubelet will set both the legacy annotation and the new + annotation. (It always sets them both to the same value, even if + that's a value that old cloud providers won't understand). + + 2. When the `CloudNodeIPs` feature is enabled, the cloud provider + will use the new `kubernetes.io/provided-node-ip` annotation _if + the legacy alpha annotation is not set_. (But if both annotations + are set, it will prefer the legacy annotation, so as to handle + rollbacks correctly.) + + 3. A few releases after GA, kubelet can stop setting the legacy + annotation, and switch to unconditionally deleting it, and + setting/deleting the new annotation depending on whether + `--node-ip` was set or not. Cloud providers will also switch to + only using the new annotation, and perhaps logging a warning if + they see a node with the old annotation but not the new + annotation. Kubelet will preserve the existing behavior of _not_ passing -"`0.0.0.0`" or "`::`" to the cloud provider, for backward -compatibility, but it _will_ pass "`IPv4`" and "`IPv6`" if they are -passed as the `--node-ip`. +"`0.0.0.0`" or "`::`" to the cloud provider, even via the new +annotation. This is needed to preserve backward compatibility with +current behavior in clusters using those `--node-ip` values. However, +it _will_ pass "`IPv4`" and/or "`IPv6`" if they are passed as the +`--node-ip`. ### Changes to cloud providers @@ -266,9 +293,31 @@ Notes: dual-stack IPv4-primary configuration. (In particular, you would have IPv4-primary nodes but IPv6-primary pods.) - - `--node-ip 1.2.3.4,abcd::ef01` was previous valid syntax when + - `--node-ip 1.2.3.4,abcd::ef01` was previously valid syntax when using no `--cloud-provider`, but was not valid for cloud kubelets. +<<[UNRESOLVED multiple-addresses ]>> + +In the examples above, `--node-ip IPv4` means "pick any one IPv4 +address", not "use all IPv4 addresses". This seemed to me to be +consistent with the behavior when specifying a specific IP address (in +which case all other IPs of the same NodeAddressType are removed from +the list), but is inconsistent with the behavior of `--node-ip +0.0.0.0` / `--node-ip ::` with legacy cloud providers and bare metal +(where that affects the _sorting_ of IPs but does not do any +_filtering_). + +In past discussions (eg https://kep.k8s.io/1664, +https://issues.k8s.io/95768) no one has ever been entirely sure what +the point of having multiple node.status.addresses values is, and +there has never been a way to get multiple node.status.addresses +values (of the same IP family) on "bare metal", so that would tend to +prevent people from designing features that depended on multiple node +addresses. So it's not clear that there's really any downside in +filtering out the "unused" node IPs... + +<<[/UNRESOLVED]>> + If the cloud only had IPv4 IPs for the node, then the same examples would look like: | `--node-ip` value | New? | Annotation | Resulting node addresses | @@ -316,26 +365,13 @@ implementing this enhancement to ensure the enhancements have also solid foundat ##### Unit tests - - - +Most of the changes will be in `k8s.io/cloud-provider/node/helpers`. +There will also be small changes in kubelet startup. -- ``: `` - `` +- `k8s.io/kubernetes/pkg/kubelet`: `2023-01-30` - `66.9` +- `k8s.io/kubernetes/pkg/kubelet/nodestatus`: `2023-01-30` - `91.2` +- `k8s.io/kubernetes/vendor/k8s.io/cloud-provider/node/helpers`: `2023-01-30` - `31.7` +- `k8s.io/kubernetes/vendor/k8s.io/cloud-provider/node/helpers/address.go`: `2023-01-30` - `100` ##### Integration tests @@ -347,6 +383,17 @@ For Beta and GA, add links to added tests together with links to k8s-triage for https://storage.googleapis.com/k8s-triage/index.html --> +``` +<<[UNRESOLVED integration ]>> + +I don't know what existing integration testing of kubelet and cloud +providers there is. Given unit tests for the new `--node-ip` parsing +and `node.status.addresses`-setting code, and e2e tests of some sort, +we probably don't need integration tests. + +<<[/UNRESOLVED]>> +``` + - : ##### e2e tests @@ -366,7 +413,19 @@ We expect no non-infra related flakes in the last month as a GA graduation crite I'm not sure how we currently handle cloud-provider e2e. GCP does not support IPv6 and `kind` does not use a cloud provider, so we cannot -test the new code/behavior in any of the "core" e2e tests. +test the new code/behavior in any of the "core" e2e jobs. + +@aojea suggests we _could_ use `kind` with an external cloud provider. +This would presumably have to be a bespoke dummy cloud provider, but +since we are only doing this to test the new `k/cloud-provider` code, +having a mostly-dummied-out cloud provider should not be a problem? It +could perhaps have a default `NodeAddresses` return value of +", , +, ", and then we could +pass `--node-ip ,IPv6` to kubelet. If the cloud +provider did not receive and interpret the `--node-ip` value +correctly, then we would end up with node IPs that didn't work and the +test job should fail. <<[/UNRESOLVED]>> ``` @@ -375,67 +434,38 @@ test the new code/behavior in any of the "core" e2e tests. ### Graduation Criteria - +- Two releases after GA, kubelet will stop setting the legacy + `alpha.kubernetes.io/provided-node-ip` annotation (and start + deleting it). Cloud providers will stop honoring the legacy + annotation. + +- (Two releases after that, we can remove the code to delete the + legacy annotation when present.) ### Upgrade / Downgrade Strategy @@ -450,18 +480,53 @@ to start after downgrade. ### Version Skew Strategy -By design, "old kubelet / new cloud provider" (or "new kubelet with -old `--node-ip` value / new cloud provider") will work fine, because -any `--node-ip` values accepted by old kubelet are defined to have the -same meaning with old and new cloud providers. - -OTOH, "new kubelet with new `--node-ip` value / old cloud provider" -will (intentionally) fail, because the old cloud provider won't be -able to fulfil the new `--node-ip` request. - -For future upgrades/downgrades where both kubelet and the cloud -provider support the new `--node-ip` behavior, there are no skew -issues. +- Old kubelet / old cloud provider: Kubelet will only set the old + annotation, and the cloud provider will only read the old + annotation. Everything works (even if there's a stale new annotation + left over from a cluster rollback). + +- Old kubelet / new cloud provider: Kubelet will only set the old + annotation. The cloud provider will read the old annotation, and + will interpret it in the same way an old cloud provider would + (because all `--node-ip` values accepted by an old kubelet are + interpreted the same way by both old and new cloud providers). + Everything works (even if there's a stale new annotation left over + from a kubelet rollback, because the new cloud provider still + prefers the old annotation). + +- New kubelet, single-stack `--node-ip` value / old cloud provider: + Kubelet will set both annotations (to the same value). The cloud + provider will read the old annotation. Everything works, because + this is an "old" `--node-ip` value, and the old cloud provider knows + how to interpret it correctly. + +- New kubelet, dual-stack `--node-ip` value / old cloud provider: + Kubelet will set both annotations (to the same value). The cloud + provider will read the old annotation, but it will _not_ know how to + interpret it because it's a "new" value. So it will log an error and + leave the node tainted. (This is the desired result, since the cloud + provider is not able to obey the `--node-ip` value the administrator + provided.) + +- New kubelet / new cloud provider: Kubelet will set both annotations + (to the same value). The cloud provider will read the old + annotation, and will know how to interpret it regardless of whether + it's an "old" or "new" value. Everything works. + +- Future (post-GA) kubelet / GA cloud provider: Kubelet will set + the new annotation, and delete the old annotation if present. The + cloud provider will see that only the new annotation is set, and so + will use that. Everything works (even if there had been a stale old + annotation left over from the upgrade). + +- GA kubelet / Future (post-GA) cloud provider: Kubelet will set + both annotations (to the same value). The cloud provider will only + look at the new annotation. Everything works. + +- Future (post-GA) kubelet / Future (post-GA) cloud provider: Kubelet + will set the new annotation, and delete the old annotation if + present. The cloud provider will only look at the new annotation. + Everything works. ## Production Readiness Review Questionnaire @@ -477,31 +542,46 @@ issues. ###### Does enabling the feature change any default behavior? -No +Kubelet will begin setting two annotations on each node rather than +one. However, cloud providers will still prefer the old annotation and +will only read the new annotation if the old one isn't there. + +So assuming no bugs, the only visible change after (just) enabling the +feature is the duplicate annotation. ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? Yes, as long as you also roll back the kubelet configuration to no longer use the new feature. +One edge case: if a user who was not previously setting `--node-ip` +upgrades, enables the feature, starts using `--node-ip`, and then +rolls back the kubelet and its configuration but _not_ the cloud +provider, then they will end up with a node that has a stale "new" +annotation (with the `--node-ip` value they had been using while +upgraded), but no "old" annotation (because the rolled-back kubelet +configuration no longer specifies `--node-ip` so kubelet will delete +the old annotation). The (not-rolled-back) cloud provider will then +mistake this as being the future rather than the past, assuming +kubelet intentionally set only the new annotation and not the old +annotation, and so it will obey the (stale) new annotation. + +That could be prevented by stretching out the feature enablement over +a few more releases, and not having the cloud-provider switch from +"always use the old annotation" to "prefer the old annotation when +it's set but use the new annotation if it's the only one" until a few +releases later. If we were going to do that, then perhaps we should +split the annotation renaming into a separate KEP from the dual-stack +`--node-ip` handling, so that the edge cases of the annotation +renaming don't have to slow down the dual-stack `--node-ip` adoption. + ###### What happens if we reenable the feature if it was previously rolled back? It works. ###### Are there any tests for feature enablement/disablement? - +TBD; we should have a test that the annotation is cleared on rollback ### Rollout, Upgrade and Rollback Planning @@ -608,9 +688,8 @@ No ###### Will enabling / using this feature result in increasing size or count of the existing API objects? -Not really. - -The size of the `alpha.kubernetes.io/provided-node-ip` annotation may +Not really. Kubelet will now create two node-IP-related annotations on +each Node rather than just one, and the value of the annotation may be slightly larger (eg because it now contains two IP addresses rather than one), and some users may change their `--node-ip` to take advantage of the new functionality in a way that would cause more node diff --git a/keps/sig-network/3705-cloud-node-ips/kep.yaml b/keps/sig-network/3705-cloud-node-ips/kep.yaml index e1e8d02a63e..ec689cb80ac 100644 --- a/keps/sig-network/3705-cloud-node-ips/kep.yaml +++ b/keps/sig-network/3705-cloud-node-ips/kep.yaml @@ -6,15 +6,17 @@ owning-sig: sig-network participating-sigs: - sig-node - sig-cloud-provider -status: provisional +status: implementable creation-date: 2022-11-29 reviewers: + - "@aojea" + - "@JoelSpeed" - "@mdbooth" - "@khenidak" - "@thockin" - - TBD approvers: - - TBD + - "@thockin" + - TBD from sig-cloud-provider see-also: - "https://github.com/kubernetes/enhancements/issues/1664" @@ -45,4 +47,3 @@ disable-supported: true # The following PRR answers are required at beta release metrics: - - my_feature_metric