Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should pod IP be immutable? #108281

Closed
shyamjvs opened this issue Feb 22, 2022 · 10 comments
Closed

Should pod IP be immutable? #108281

shyamjvs opened this issue Feb 22, 2022 · 10 comments
Labels
kind/support Categorizes issue or PR as a support question. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@shyamjvs
Copy link
Member

shyamjvs commented Feb 22, 2022

What happened?

Today in k8s the IP assigned to a pod, which is a part of the pod status, is mutable. Changing the field through kubectl edit doesn't seem to take effect, but the object can be changed through a json patch directly (more under repro steps below). While the IP is again reset back to the original value as part of the next pod status update (done by kubelet periodically), the change itself goes through.

I've cut this issue to understand the reason (if there's one) for letting the pod IP be mutable even after the value is set initially as part of pod creation. For e.g is it allowed to change through the lifetime of a pod?

/sig network
/sig api-machinery

What did you expect to happen?

Pod IP shouldn't have changed on attempting the patch. Maybe the API call itself should've been rejected?

How can we reproduce it (as minimally and precisely as possible)?

  • Create a pod:
$ cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: some-pod
  namespace: default
spec:
  containers:
  - image: nginx
    name: nginx
    ports:
    - containerPort: 80
EOF
  • Wait for it to start running on a node, by then kubelet would've assigned the pod an IP:
$ kubectl get pod some-pod -o yaml
apiVersion: v1
kind: Pod
...
status:
  phase: Running
  podIP: 10.192.21.35
  podIPs:
  - ip: 10.192.21.35
...
  • Then change the pod IP using json patch:
$ curl -k -XPATCH \
> -H "Content-Type: application/json-patch+json" \
> --data '[{"op": "add", "path": "/status/phase", "value": "Running"},{"op": "add", "path": "/status/podIP", "value": "0.0.0.0"}]' \
> http://localhost:8080/api/v1/namespaces/default/pods/some-pod/status
...
  "status": {
    "phase": "Running",
    "podIP": "0.0.0.0",
    "podIPs": [
      {
        "ip": "0.0.0.0"
      }
    ],
...

Typically within few seconds, the IP is again reset back to the original value as part of kubelet's periodic pod status update. But the changed value is present in the interim.

Anything else we need to know?

No response

Kubernetes version

Not version-specific AFAIU.

Cloud provider

N/A

OS version

Install tools

Container runtime (CRI) and and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@shyamjvs shyamjvs added the kind/bug Categorizes issue or PR as related to a bug. label Feb 22, 2022
@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 22, 2022
@jpbetz
Copy link
Contributor

jpbetz commented Feb 22, 2022

/assign @kubernetes/api-reviewers
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 22, 2022
@liggitt
Copy link
Member

liggitt commented Feb 22, 2022

The status is intentionally mutable. I'll defer to @kubernetes/sig-network-api-reviews on whether there are known/expected scenarios where the IP changes during a pod's lifetime.

Write access to pod status is only granted to system components like scheduler, kubelets, and node-controller by default. Cluster superusers can also write to status but are also able to do disruptive things to cluster objects.

@k8s-ci-robot k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Feb 22, 2022
@liggitt liggitt added kind/support Categorizes issue or PR as a support question. and removed kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Feb 22, 2022
@k8s-ci-robot
Copy link
Contributor

@shyamjvs: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 22, 2022
@aojea
Copy link
Member

aojea commented Feb 22, 2022

This conversation is related to this topic, specially interesting this comment from @smarterclayton

#93898 (comment)

We never fully specified the behavior of CNI plugins - whether they had to guarantee the pod IP remains the same when various bits of state were lost.

@knobunc tested this and we confirmed a number of plugins in the ecosystem do not preserve pod IP after reset of the node and loss of the container cache.

At this point, while preservation of IP is very desirable for consumers, without a strong statement in the early days (which we didn’t make or test) the current state is “podIP can change”.

At a minimum, other clients should be aware that any podIP can change at any time and we should test this. We may wish to offer guidance and tests for CNI plugins to encourage certain behaviors. But we can’t assume they are immutable in practice nor enforce that without breaking existing plugins.

@sftim
Copy link
Contributor

sftim commented Feb 22, 2022

I recommend opening a documentation issue about making this behavior / risk clear to readers.

@sftim
Copy link
Contributor

sftim commented Feb 22, 2022

Separately, does anyone want to suggest a feature gate that people can disable if they want to make Pod IP assignments immutable?

@aojea
Copy link
Member

aojea commented Feb 23, 2022

I recommend opening a documentation issue about making this behavior / risk clear to readers.

yeah, I was going through the code and I still don't have clear all the ramifications of this, host-network and network pods behave differently, there are also a lot of things that depend on the podIPs that are only created at start time , like environment variables and /etc/hosts file on the pods.

the IP is again reset back to the original value as part of the next pod status update (done by kubelet periodically)

@thockin @khenidak this was discussed on the latest sig-network meeting, the doubt we had is what happens to the host network pods if the kubelet restarts

// set HostIP and initialize PodIP/PodIPs for host network pods
if kl.kubeClient != nil {
hostIPs, err := kl.getHostIPsAnyWay()
if err != nil {
klog.V(4).InfoS("Cannot get host IPs", "err", err)
} else {
s.HostIP = hostIPs[0].String()
// HostNetwork Pods inherit the node IPs as PodIPs. They are immutable once set,
// other than that if the node becomes dual-stack, we add the secondary IP.
if kubecontainer.IsHostNetworkPod(pod) {
// Primary IP is not set
if s.PodIP == "" {
s.PodIP = hostIPs[0].String()
s.PodIPs = []v1.PodIP{{IP: s.PodIP}}
}
// Secondary IP is not set #105320
if len(hostIPs) == 2 && len(s.PodIPs) == 1 {
s.PodIPs = append(s.PodIPs, v1.PodIP{IP: hostIPs[1].String()})
}
}
}
}

I think kubelet rebuilds the current status, and the host network pods reflect the right IPs ... that I think is the correct behavior, otherwise,

@khenidak
Copy link
Contributor

khenidak commented Mar 1, 2022

This is by design, for the following reason: Pod API object is tied to a sandbox (network namespace) on a node. The sandbox itself can restart/crash/deleted and come back with a different IPs. Node restart will cause recreation of sandboxes and hence new set of IPs. The pod object itself didn't change but its representation on the node (aka status) has changed.

What you are seeing is kubelet figuring api-server pod.status.IPs != curret-IPs so it must be patched. As for risks, I don't think there are any per-se because everything that operates on PodIPs is designed with mutability in mind. Specifically DNS, Endpoint, and EndpointSlice controllers.

The comment made by @aojea above is related to host-net pod and the impact of a node changing its IP (and thus pod IPs much change as well) today we don't do that and we want to.

@thockin
Copy link
Member

thockin commented Mar 3, 2022

Write access to pod status is only granted to system components_

100%

does anyone want to suggest a feature gate that people can disable if they want to make Pod IP assignments immutable?

I don't think we should do that. If you have untrusted actors setting that
value, you have already lost.

there are also a lot of things that depend on the podIPs that are only created at start time_

I don't think we are really saying that the IP can change underneath a running
pod. The Pod object in the API has a longer lifetime than a process or even a
cgroup.

The sandbox itself can restart/crash/deleted and come back with a different IPs

Exactly. Consider:

  • pod is created
  • kubelet starts pod lifecycle
  • CNI assigns IP "X"
  • kubelet published pod IP X in status
  • node crashes
  • node reboots
  • kubelet restarts
  • pod still exists in API!
  • kubelet start the pod again
  • CNI assigns IP "Y"
  • kubelet published pod IP Y in status

@thockin
Copy link
Member

thockin commented Mar 3, 2022

Closing for now - working as intended

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests

8 participants