- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable
- (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- "Implementation History" section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website, for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Kubelet supports dual-stack --node-ip
values for clusters with
no cloud provider (eg, "bare metal" clusters), but not for clusters
using a cloud provider. This KEP proposes to fix that.
-
Allow administrators of clusters using external cloud providers to override both node IPs on a node in a dual-stack cluster.
-
Define how kubelet will communicate this new intent to cloud providers.
-
Update the code in
k8s.io/cloud-provider/node/helpers
to implement the needed algorithms for the new behaviors.
-
Changing the behavior of nodes using legacy cloud providers.
-
Changing the node-IP-selection behavior in any currently-existing Kubernetes cluster. This means that the default behavior when
--node-ip
is not specified will remain the same, and the behavior of any currently-allowed--node-ip
value will remain the same. New behavior will only be triggered by--node-ip
values that would have been rejected by kubelet in older clusters. -
Adding the ability for nodes in clusters with cloud providers to use node IPs that are not already known to the cloud provider. (In particular, this implies that we will continue to not support dual-stack nodes in clouds that do not themselves support dual-stack.)
-
Improving node IP handling in any other ways. The original version of this KEP proposed some other
--node-ip
features to help with single-stack IPv6 and dual-stack clusters but they turned out to be insufficient and would require a larger redesign of node IP handling in order to fix. -
Renaming
alpha.kubernetes.io/provided-node-ip
annotation. This was also proposed in the original version of this KEP, but is no longer planned as part of this feature.
As the intention is to not change the user-visible behavior except in clusters where administrators explicitly make use of the new functionality, there should be no risk of breaking existing clusters, nor of surprising administrators by suddenly exposing node services on unexpected IPs.
Currently, when --cloud-provider
is passed to kubelet, kubelet
expects --node-ip
to be either unset, or a single IP address. (If it
is unset, that is equivalent to passing --node-ip 0.0.0.0
, which
means "autodetect an IPv4 address, or if there are no usable IPv4
addresses, autodetect an IPv6 address".)
If --cloud-provider
and --node-ip
are both specified (and
--node-ip
is not "0.0.0.0
" or "::
"), then kubelet will add an
annotation to the node, alpha.kubernetes.io/provided-node-ip
. Cloud
providers expect this annotation to conform to the current expected
--node-ip
syntax (ie, a single value); if it does not, then they
will log an error and not remove the
node.cloudprovider.kubernetes.io/uninitialized
taint from the node,
causing the node to remain unusable until kubelet is restarted with a
valid (or absent) --node-ip
.
When --cloud-provider
is not passed, the --node-ip
value can also
be a comma-separated pair of dual-stack IP addresses. However, unlike
in the single-stack case, the IPs in the dual-stack case are not
currently allowed to be "unspecified" IPs (ie 0.0.0.0
or ::
); you
can only make a (non-cloud) node be dual-stack if you explicitly
specify both IPs that you want it to use.
We will allow comma-separated dual-stack --node-ip
values in
clusters using external cloud providers (but not in clusters using
legacy cloud providers).
No other changes to --node-ip
handling are being made as part of
this KEP.
Currently, if the user passes an IP address to --node-ip
which is
not recognized by the cloud provider as being a valid IP for that
node, kubelet will set that value in the provided-node-ip
annotation, and the cloud provider will see it, realize that the node
IP request can't be fulfilled, log an error, and leave the node in the
tainted state.
It makes sense to have the same behavior if the user passes a
dual-stack --node-ip
value to kubelet, but the cloud provider does
not recognize the new syntax and thus can't fulfil the request.
Conveniently, we can do this just by passing the dual-stack
--node-ip
value in the existing annotation; the old cloud provider
will try to parse it as a single IP address, fail, log an error, and
leave the node in the tainted state, which is exactly what we wanted
it to do if it can't interpret the --node-ip
value correctly.
Therefore, we do not need a new annotation for the new --node-ip
values; we can continue to use the existing annotation, assuming
existing cloud providers will treat unrecognized values as errors.
Assuming that all cloud providers use the "k8s.io/cloud-provider"
code to handle the node IP annotation and node address management, no
cloud-provider-specific changes should be needed; we should be able to
make the needed changes in the cloud-provider
module, and then the
individual providers just need to revendor to the new version.
Assume a node where the cloud has assigned the IPs 1.2.3.4
,
5.6.7.8
, abcd::1234
and abcd::5678
, in that order of preference.
("SS" = "Single-Stack", "DS" = "Dual-Stack")
--node-ip value |
New? | Annotation | Resulting node addresses |
---|---|---|---|
(none) | no | (unset) | ["1.2.3.4", "5.6.7.8", "abcd::1234", "abcd::5678"] (DS IPv4-primary) |
0.0.0.0 |
no | (unset) | ["1.2.3.4", "5.6.7.8", "abcd::1234", "abcd::5678"] (DS IPv4-primary) |
:: |
no | (unset) | ["1.2.3.4", "5.6.7.8", "abcd::1234", "abcd::5678"] (DS IPv4-primary *) |
1.2.3.4 |
no | "1.2.3.4" |
["1.2.3.4"] (SS IPv4) |
9.10.11.12 |
no | "9.10.11.12" |
(error, because the requested IP is not available) |
abcd::5678 |
no | "abcd::5678" |
["abcd::5678"] (SS IPv6) |
1.2.3.4,abcd::1234 |
yes* | "1.2.3.4,abcd::1234" |
["1.2.3.4", "abcd::1234"] (DS IPv4-primary) |
Notes:
-
In the
--node-ip ::
case, kubelet will be expecting a single-stack IPv6 or dual-stack IPv6-primary setup and so would get slightly confused in this case since the cloud gave it a dual-stack IPv4-primary configuration. (In particular, you would have IPv4-primary nodes but IPv6-primary pods.) -
--node-ip 1.2.3.4,abcd::ef01
was previously valid syntax when using no--cloud-provider
, but was not valid for cloud kubelets.
If the cloud only had IPv4 IPs for the node, then the same examples would look like:
--node-ip value |
New? | Annotation | Resulting node addresses |
---|---|---|---|
(none) | no | (unset) | ["1.2.3.4", "5.6.7.8"] (SS IPv4) |
0.0.0.0 |
no | (unset) | ["1.2.3.4", "5.6.7.8"] (SS IPv4) |
:: |
no | (unset) | ["1.2.3.4", "5.6.7.8"] (SS IPv4 *) |
1.2.3.4 |
no | "1.2.3.4" |
["1.2.3.4"] (SS IPv4) |
9.10.11.12 |
no | "9.10.11.12" |
(error, because the requested IP is not available) |
abcd::5678 |
no | "abcd::5678" |
(error, because the requested IP is not available) |
1.2.3.4,abcd::1234 |
yes* | "1.2.3.4,abcd::1234" |
(error, because the requested IPv6 IP is not available) |
In this case, kubelet would be even more confused in the
--node-ip ::
case, and some things would likely not work.
Due to backward-compatibility constraints, it is not possible to end up with a cluster of every type (single-stack/dual-stack, IPv4-primary/IPv6-primary) in all cases. For example, given the initial NodeAddress list:
addresses:
- type: InternalIP
address: 10.0.0.1
- type: InternalIP
address: 10.0.0.2
- type: InternalIP
address: fd00::1
- type: InternalIP
address: fd00::2
- type: ExternalIP
address: 192.168.0.1
You can request to get a single-stack IPv4 cluster with any of the
three IPv4 IPs as the node IP (--node-ip 10.0.0.1
, --node-ip 10.0.0.2
, --node-ip 192.168.0.1
); a dual-stack IPv4-primary cluster
with any combination of the IPv4 and IPv6 IPs (--node-ip 10.0.0.2,fd00::2
, etc); or a dual-stack IPv6-primary cluster with any
combination of IPs (--node-ip fd00::1,192.168.0.1
, etc).
But there is no way to get a single-stack IPv6 cluster, because passing
--node-ip fd00::1
results in a dual-stack cluster, because the
current, backward-compatible semantics of single-valued --node-ip
values means that the IPv4 ExternalIP
will be preserved.
[X] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
(None.)
Most of the changes will be in k8s.io/cloud-provider/node/helpers
.
There will also be small changes in kubelet startup.
k8s.io/kubernetes/pkg/kubelet
:2023-01-30
-66.9
k8s.io/kubernetes/pkg/kubelet/nodestatus
:2023-01-30
-91.2
k8s.io/kubernetes/vendor/k8s.io/cloud-provider/node/helpers
:2023-01-30
-31.7
k8s.io/kubernetes/vendor/k8s.io/cloud-provider/node/helpers/address.go
:2023-01-30
-100
Given unit tests for the new --node-ip
parsing and
node.status.addresses
-setting code, and e2e tests of some sort, we
probably don't need integration tests.
There is now a [cloud-provider-kind]
that can be used in kind-based
clusters to implement cloud-provider-based functionality.
By modifying this provider to allow manually overriding the default
node IPs, we should be able to create an e2e test job that brings up a
kind cluster with nodes having various IPs, and then tests different
kubelet --node-ip
arguments to ensure that they have the expected
effect.
-
Dual-stack
--node-ip
handling implemented behind a feature flag -
Unit tests updated
-
Positive feedback / no bugs
-
At least one release after Alpha
-
Implement e2e test using
cloud-provider-kind
. -
Upgrade/rollback have been tested manually
- Positive feedback / no bugs
No behavioral changes will happen automatically on upgrade, or automatically on feature enablement; users must opt in to the feature by changing their kubelet configuration after upgrading the cluster to a version that supports the new feature.
On downgrade/disablement, it is necessary to revert the kubelet configuration changes before downgrading kubelet, or kubelet will fail to start after downgrade.
-
Old kubelet / new cloud provider: Kubelet will set the annotation. The cloud provider will read it and will interpret it in the same way an old cloud provider would (because all
--node-ip
values accepted by an old kubelet are interpreted the same way by both old and new cloud providers). Everything works. -
New kubelet, single-stack
--node-ip
value / old cloud provider: Kubelet will set the annotations. The cloud provider will read it. Everything works, because this is an "old"--node-ip
value, and the old cloud provider knows how to interpret it correctly. -
New kubelet, dual-stack
--node-ip
value / old cloud provider: Kubelet will set the annotation. The cloud provider will read it, but it will not know how to interpret it because it's a "new" value. So it will log an error and leave the node tainted. (This is the desired result, since the cloud provider is not able to obey the--node-ip
value the administrator provided.)
- Feature gate (also fill in values in
kep.yaml
)- Feature gate name: CloudDualStackNodeIPs
- Components depending on the feature gate:
- kubelet
- cloud-controller-manager
No
Yes, as long as you also roll back the kubelet configuration (to no longer use the new feature) either earlier or together with the feature disablement.
It works.
No; enabling/disabling the feature gate has no immediate effect, and
changing between a single-stack and dual-stack --node-ip
value is no
different, code-wise, than changing from one single-stack --node-ip
value to a new single-stack value.
Assuming no drastic bugs (eg, the cloud provider assigns Node X's IP to Node Y for no obvious reason), just restarting the cloud provider or kubelet with the new feature enabled-but-unused cannot fail.
The cloud provider will not "re-taint" an existing working node if its node IP annotation becomes invalid. Thus, if the administrator accidentally rolls out a kubelet config that does something completely wrong (eg, specifying a new secondary node IP value which is not actually one of that node's IPs) then the only effect would be that the cloud provider will log "Failed to update node addresses for node" for that node.
The most likely failure mode would be that in the process of reconfiguring nodes to use the new feature, the administrator reconfigures them incorrectly. (In particular, if nodes were previously using auto-detected primary node IPs, and the administrator needs to switch them to manually-specified dual-stack node IPs, they might end up manually specifying wrong (but valid) primary IPs.) In that case, the cloud provider would accept the new node IP value and update the node's addresses, possibly resulting in disruption. However, this would only happen as each kubelet was reconfigured and restarted, so as long as the administrator did not roll out the new configurations to every node simultaneously without any sanity-checking, they should only break a single node.
There are no relevant metrics; however, the feature will only affect nodes that have been reconfigured to use the new feature, so it should be obvious to the administrator if the feature is not working correctly.
TODO; do a manual test
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
No.
The operator is the one who would be using the feature (and they can
tell by looking at the kubelet configuration to see if a "new"
--node-ip
value was passed).
The Node will have the IPs they expect it to have.
- API .status
- Other field: node.status.addresses
N/A. This changes the startup behavior of kubelet (and does not affect startup speed). There is no ongoing "service".
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
N/A. This changes the startup behavior of kubelet (and does not affect startup speed). There is no ongoing "service".
Are there any missing metrics that would be useful to have to improve observability of this feature?
No
The feature depends on kubelet/cloud provider communication, but it is just an update to an existing feature that already depends on kubelet/cloud provider communication. It does not create any additional dependencies, and it does not add any new failure modes if either component fails.
No
No
No
No
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
No
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No
It does not add any new failure modes. (The kubelet and cloud provider use an annotation and an object field to communicate with each other, but they already do that. And the failure mode there is just "updates don't get processed until the API server comes back".)
N/A: there are no SLOs.
- Initial proposal: 2022-12-30
The status quo is slightly simpler, but some people need the additional functionality.
The original version of this KEP proposed further changes to
--node-ip
handling:
Additionally, the fact that kubelet does not currently pass "
0.0.0.0
" and "::
" to the cloud provider creates a compatibility problem: we would like for administrators to be able to say "use an IPv6 node IP but I don't care which one" in cloud-provider clusters like they can in non-cloud-provider clusters, but for compatibility reasons, we can't change the existing behavior of "--cloud-provider external --node-ip ::
" (which doesn't do what it's "supposed to", but does have possibly-useful side effects that have led some users to use it anyway; see kubernetes #111695.)So instead, we will add new syntax, and allow administrators to say "
--node-ip IPv4
" or "--node-ip IPv6
" if they want to explicitly require that the cloud provider pick a node IP of a specific family. (This also improves on the behavior of the existing "0.0.0.0
" and "::
" options, because you almost never actually want the "or fall back to the other family if there are no IPs of this family" behavior that "0.0.0.0
" and "::
" imply.)Additionally, we will update the code to allow including "
IPv4
" and "IPv6
" in dual-stack--node-ip
values as well (in both cloud and non-cloud clusters). This code will have to check the status of the feature gate until the feature is GA.
As well as to the Node IP annotation:
That said, the existing annotation name is
alpha.kubernetes.io/provided-node-ip
but it hasn't been "alpha" for a long time. We should fix this. So:
When
--node-ip
is unset, kubelet should delete both the legacyalpha.kubernetes.io/provided-node-ip
annotation and the newkubernetes.io/provided-node-ip
annotation (regardless of whether the feature gate is enabled or not, to avoid problems with rollback and skew).When the
CloudDualStackNodeIPs
feature is enabled and--node-ip
is set, kubelet will set both the legacy annotation and the new annotation. (It always sets them both to the same value, even if that's a value that old cloud providers won't understand).When the
CloudDualStackNodeIPs
feature is enabled, the cloud provider will use the newkubernetes.io/provided-node-ip
annotation if the legacy alpha annotation is not set. (But if both annotations are set, it will prefer the legacy annotation, so as to handle rollbacks correctly.)A few releases after GA, kubelet can stop setting the legacy annotation, and switch to unconditionally deleting it, and setting/deleting the new annotation depending on whether
--node-ip
was set or not. Cloud providers will also switch to only using the new annotation, and perhaps logging a warning if they see a node with the old annotation but not the new annotation.Kubelet will preserve the existing behavior of not passing "
0.0.0.0
" or "::
" to the cloud provider, even via the new annotation. This is needed to preserve backward compatibility with current behavior in clusters using those--node-ip
values. However, it will pass "IPv4
" and/or "IPv6
" if they are passed as the--node-ip
.
However, trying to implement this behavior turned up problems:
While implementing the above behavior, it became clear that retaining backward compatibility with old
--node-ip
values means the overall behavior is idiosyncratic and full of loopholes. For example, given the initial NodeAddress list:addresses: - type: InternalIP address: 10.0.0.1 - type: InternalIP address: 10.0.0.2 - type: InternalIP address: fd00::1 - type: InternalIP address: fd00::2 - type: ExternalIP address: 192.168.0.1You can request to get a single-stack IPv4 cluster with any of the three IPv4 IPs as the node IP (
--node-ip 10.0.0.1
,--node-ip 10.0.0.2
,--node-ip 192.168.0.1
); a dual-stack IPv4-primary cluster with any combination of the IPv4 and IPv6 IPs (--node-ip 10.0.0.2,fd00::2
, etc); a dual-stack IPv6-primary cluster with any combination of IPs (--node-ip fd00::1,192.168.0.1
, etc); or an IPv6 single-stack cluster using the first IPv6 IP (--node-ip IPv6
).But there is no way to get a single-stack IPv6 cluster using the second IPv6 IP, because passing
--node-ip fd00::2
results in a dual-stack cluster, because the current, backward-compatible semantics of single-valued--node-ip
values means that the IPv4ExternalIP
will be preserved.
In the discussion around KEP-1664 there was talk of replacing
--node-ip
with a new --node-ips
(plural) field with
new-and-improved semantics, and I think this is what we're going to
have to do if we want to make this better. But that will have to wait
for another KEP.