Kubelet reports secondary InternalIP in AWS with multiple ENIs #61921

micahhausler · 2018-03-30T00:31:33Z

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug
/sig aws

What happened:
When a node gets a second ENI (elastic network interface) attached to it, the kubelet PATCH's up to the API server all IP's for all network interfaces on the node (it gets this from the EC2 metadata) The payload might look like this:

{
  "status": {
    "$setElementOrder/addresses": [
      {
        "type": "InternalIP"
      },
      {
        "type": "InternalIP"
      },
      {
        "type": "InternalIP"
      },
 ... 
      {
        "type": "InternalIP"
      },
      {
        "type": "InternalDNS"
      },
      {
        "type": "ExternalDNS"
      },
      {
        "type": "Hostname"
      }
    ],
    "addresses": [
      {
        "address": "10.0.116.30", # A secondary IP
        "type": "InternalIP"
      },
      {
        "address": "10.0.106.136", # A secondary IP
        "type": "InternalIP"
      },
      {
        "address": "10.0.110.215", # A secondary IP
        "type": "InternalIP"
      },
...
      {
        "address": "10.0.104.85", # Correct primary IP
        "type": "InternalIP"
      },
      {
        "address": "10.0.103.102",  # A secondary IP, the one incorrectly reported for the node
        "type": "InternalIP"
      }
    ]
  }
}

When trying to detect if the node has changed, the node controller manager de-duplicates addresses of the same Type from each node, picking the last one the in the list the Kubelet reports.

What you expected to happen:
The eth0 IP should be reported as the PrivateIP address on the node's .status.addresses[] list.

How to reproduce it (as minimally and precisely as possible):
Create a new cluster (with kops), install the AWS CNI provider, and check the node's InternalIP

kubectl get no -o json | jq .items[].status.addresses

You'll get something like

[
  {
    # NOTE: Different than the IP in the InternalDNS name
    "type": "InternalIP",
    "address": "10.0.103.102"
  },
  {
    "type": "ExternalIP",
    "address": "53.11.22.33"
  },
  {
    "type": "InternalDNS",
    "address": "ip-10-0-104-85.us-west-2.compute.internal"
  },
  {
    "type": "ExternalDNS",
    "address": "ec2-52-11-22-33.us-west-2.compute.amazonaws.com"
  },
  {
    "type": "Hostname",
    "address": "ip-10-0-104-85.us-west-2.compute.internal"
  }
]

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): 1.9.x
Cloud provider or hardware configuration: AWS
OS (e.g. from /etc/os-release): Amazon Linux
Kernel (e.g. uname -a): 4.9.85-47.59.amzn2.x86_64

Other reports:

Maybe related to #42125

Since the Node controller only keeps the last InternalIP address in the PATCH, can we have the EC2 metadata lookup query http://169.254.169.254/latest/meta-data/local-ipv4 for the InternalIP? There may be something I'm missing, and I'd love to get pointed in the right direction. If this is unintentional, is this something that we could get a fix for backported onto the 1.9 branch? I'm happy to get a patch out for this.

ping @justinsb @chrislovecnm

The text was updated successfully, but these errors were encountered:

mtaufen · 2018-04-03T17:57:10Z

@cheftako would you know anything about this?

/cc @bowei @cheftako

micahhausler · 2018-04-10T20:55:52Z

@cheftako Do you have any insight into why the apiserver is deduplicating node addresses of the same type?

rtripat · 2018-04-11T05:20:56Z

Reading through a couple of places (below) in API server, it looks like it assumes there is exactly 1 Address per NodeAddressType for a kubelet.

How preferred node address is determined from Node status object
How node address change is detected in node controller

The issue and corresponding PR where the change was made in kubelet to add IPs from all ENIs attached to an EC2 instance. If my understanding above is right, did this PR result in conflict with that assumption leading to this behavior?

micahhausler · 2018-04-11T15:39:51Z

@wlan0 Do you know anything about this one?

wlan0 · 2018-04-11T16:47:54Z

ping @nckturner

nckturner · 2018-04-11T20:27:56Z

@wlan0 @cheftako Our question is: should kubelet be reporting exactly 1 address for each NodeAddressType? When there are multiple interfaces on a node, kubelet (the AWS cloud provider code) sends a list of addresses, which (I think) leads to the node controller picking 1 without knowing which is the correct one to choose. Basically we think #50112 leads to incorrect behavior.

cheftako · 2018-04-12T00:04:44Z

Cloud node controller seems to assume this behavior when it computes whether an IP has changed https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/cloud/node_controller.go#L444. I believe this is corresponds to https://github.com/kubernetes/kubernetes/blob/master/pkg/util/node/node.go#L81 and https://github.com/kubernetes/kubernetes/blob/master/pkg/util/node/node.go#L99. So right now we do seem to have some baked in assumptions in various parts of the system about a node having 1 IP per Type. Beyond that I would suggest talking to sig-networking. Maybe @bowei can help?

nckturner · 2018-04-12T01:17:12Z

Yeah, that's what we found as well. I think in the original PR the user was trying to specify a secondary IP with the --node-ip flag which was failing. An ideal fix would allow --node-ip to use any IP on the instance, but in either case kubelet would not send a list of addresses, only the default private IP or the IP specified by --node-ip.

micahhausler · 2018-04-13T20:53:31Z

Unless anyone has some strong objections, what we'd like to do is go back to using the local-ipv4 from the ec2-metadata, but still let users specify a secondary ip for the --node-ip flag (as #44686 asked for). I'll create a PR next week

andrewsykim · 2018-04-24T21:00:53Z

Our question is: should kubelet be reporting exactly 1 address for each NodeAddressType? When there are multiple interfaces on a node, kubelet (the AWS cloud provider code) sends a list of addresses, which (I think) leads to the node controller picking 1 without knowing which is the correct one to choose. Basically we think #50112 leads to incorrect behavior.

I still don't have much context to this issue yet but it seems like setting multiple addresses of the same type on node status is the wrong direction as some people have already mentioned. For handling multiple private addresses that can be set on the node interface we should either create new address types (although I prefer if we don't) or have whatever IP provided by --node-ip take precedence. @micahhausler no objections from me to push forward with your PR to just use local-ipv4 but it would be good to get feedback from @jlzhao27 in case we are missing some context here.

jlzhao27 · 2018-04-24T21:13:06Z

Ya this seems fine. I agree with @nckturner's comment above. The exact issue I ran into was that the old code did not allow the user to use a secondary ENI with the --node-ip flag since it was not being parsed by the Kubelet at all. It seems like my fix broke some other assumptions, maybe the change here is to fix the code path for --node-ip and make the kubelet report either the user defined ip or the default internal ec2 instance ip.

andrewsykim · 2018-04-24T21:13:45Z

An ideal fix would allow --node-ip to use any IP on the instance, but in either case kubelet would not send a list of addresses, only the default private IP or the IP specified by --node-ip.

^ it seems like we override all the node address types if the kubelet sets --node-ip which seems wrong to me. I've opened this PR #63101 to address that (need to fix tests still). We may want a follow up PR that will do a better job of finding what the actual --node-ip is (based on type?)

andrewsykim · 2018-04-24T21:17:52Z

@jlzhao27 thanks for the quick reply, seems like steps forward are:

use ec2 metadata local-ipv4 by default as node address InternalIP (@micahhausler you said you can do this one?)
override the InternalIP address type on node controller if --node-ip is set on the kubelet
maybe modify --node-ip to specify the exact address type we want to override

micahhausler · 2018-04-24T21:42:22Z

Here's a summary of where I'm at with this.

The kubelet informs the API server of it's addresses in the setNodeAddress() method. If there is a CloudProvider (set by the --cloud-provider flag), the kubelet calls the cloud interface's NodeAddresses() method which is implemented by the AWS cloud provider here.

PR #50112 changed the AWS implementation of NodeAddresses() by querying the EC2 metadata for all IP's on all network interfaces and returning the list, rather than just getting the EC2 metadata's local-ipv4. This was done to allow an operator to use a secondary ENI with a single IP as the kubelet address. This is a legitimate case, where the operator may attach a secondary ENI with one assigned IP and wish to use that as the Kubelet's IP.

When using the AWS CNI plugin, the secondary ENI with multiple IPs are not used by the node itself, but assigned to individual pods. In this case, the kubelet doesn't get enough information from the EC2 metadata about which secondary ENI's or IPs are actually used by the node itself, and ends up reporting all IP's (in the case of an m4.large, that is 10 IP's per ENI) to the API server. The API server does a reduction somewhere along the line where it only accepts one NodeAddress per type (types being InternalIP, ExternalIP, InternalDNS, ExternalDNS) and seems to pick the last one in the list, which is often the wrong IP. (The list comes from the enumeration of IP's for each ENI from the EC2 metadata.) When querying the API server for the node's .status.address list, the InternalIP is one of the pod's IPs, which the kubelet is not listening on. This incorrect address causes any calls back to the node (kubectl exec/logs/proxy) to fail.

Our current workaround is to specify the --node-ip flag on the kubelet to the correct IP, but this has 2 issues. First, it requires anyone using the AWS CNI plugin to use the --node-ip flag in order to ensure the correct node IP. This is inconvenient, and requires an out-of-band scripting step. Second, when specifying a node IP, ExternalDNS, ExternalIP, and InternalDNS node addresses are no longer reported. This is information we don't want to loose, as many workloads and use-cases need to know this information.

The kubelet logic around validating the --node-ip requires that the operator-provided IP be included in the list that the cloud provider NodeAddresses() returns. This protects the node from reporting an IP it shouldn't be requesting, but drops the ExternalDNS, ExternalIP, and InternalDNS.

As far as solutions, there are 3 things I've considered.

In the AWS implementation of NodeAddresses(), make sure that the last address of type InternalIP is the primary ENI's IP (the eth0). This feels like a hack and relies on observed behavior in the API.
Make the kubelet return ExternalIP and ExternalDNS addresses even when --node-ip is specified. This breaks the current behavior of --node-ip, but we can make sure the IP specified is the right one.
Not validate that the supplied --node-ip is in the NodeAddresses() list, and use local-ipv4 by default.

Short of those 3 solutions, I'm not sure how to allow for a user to specify a secondary ENI as the node IP and how to to correctly report the primary IP when using the AWS CNI provider. Am I missing anything big on option 3?

andrewsykim · 2018-04-25T13:51:22Z

@micahhausler thanks for the comprehensive update! Would you be able to join the cloud provider WG meeting today again so we can hash out some final details? I think we're close to a workig solution :).

micahhausler · 2018-04-25T15:55:52Z

I'll be there!

nckturner · 2018-04-25T17:33:15Z

Summary of the conversation in WG Cloud Provider:

Fix AWS cloud provider to only report 1 Address of each type by default (@micahhausler). Is this possible to do and still allow a --node-ip to use a secondary IP?
Modify kubelet to allow the reporting ExternalDNS, ExternalIP, InternalDNS when --node-ip is set.

micahhausler · 2018-04-25T17:54:23Z

Created #63158 around the --node-ip issue

fejta-bot · 2018-07-24T18:33:53Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-08-23T19:20:37Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2018-09-22T20:13:13Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

jnicholls · 2020-01-28T16:04:01Z

It seems like getting the local-ipv4 metadata was the correct approach for the use case of API server communication, with --node-ip as the override...while network/interfaces/macs/<mac>/local-ipv4s is fine for the overall NodeAddresses() use case. Rather than just choosing one of the IPs from that list blindly, since the order of interfaces can switch in some circumstances (the assigned MAC for the ENI can change when the ENIs are attached to a replacement node). Are we concluding that folks using cluster orchestrators like kops must automate the means for passing --node-ip explicitly to have a deterministic ENI selection?

jnicholls · 2020-01-28T16:10:28Z

Not validate that the supplied --node-ip is in the NodeAddresses() list, and use local-ipv4 by default.

I think this option (particularly the part of using local-ipv4 by default) would have been a good approach for avoiding the out-of-order issue when MACs flip around, and avoids the need to script up a solution for specifying --node-ip that would ultimately (in this case at least) reference local-ipv4.

jnicholls · 2020-01-28T16:12:17Z

/reopen
/remove-lifecycle rotten

k8s-ci-robot · 2020-01-28T16:12:24Z

@jnicholls: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

bowei · 2020-01-28T18:51:08Z

/reopen
/remove-lifecycle rotten

k8s-ci-robot · 2020-01-28T18:51:10Z

@bowei: Reopened this issue.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2020-01-28T18:51:16Z

@micahhausler: There are no sig labels on this issue. Please add an appropriate label by using one of the following commands:

/sig <group-name>
/wg <group-name>
/committee <group-name>

Please see the group list for a listing of the SIGs, working groups, and committees available.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2020-01-28T18:54:07Z

@bowei: The label(s) sig/aws cannot be applied, because the repository doesn't have them

In response to this:

/sig aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2020-01-28T18:55:49Z

@bowei: The label(s) sig/provider-aws cannot be applied, because the repository doesn't have them

In response to this:

/sig provider-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2020-01-28T21:11:53Z

@bowei: The label(s) sig/cloud-provider-aws cannot be applied, because the repository doesn't have them

In response to this:

/sig cloud-provider-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2020-01-28T21:12:05Z

@bowei: The label(s) sig/cloud-provider-aws cannot be applied, because the repository doesn't have them

In response to this:

/sig cloud-provider-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2020-01-28T21:12:10Z

@bowei: The label(s) sig/provider-aws cannot be applied, because the repository doesn't have them

In response to this:

/sig provider-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2020-01-28T21:12:16Z

@bowei: The label(s) sig/aws cannot be applied, because the repository doesn't have them

In response to this:

/sig aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

bowei · 2020-01-28T21:13:38Z

/area provider/aws

k8s-ci-robot · 2020-01-28T21:13:40Z

@bowei: The label(s) area/ cannot be applied, because the repository doesn't have them

In response to this:

/area provider/aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

bowei · 2020-01-28T21:13:49Z

:sigh:

wongma7 · 2020-02-14T00:19:30Z

@jnicholls

Rather than just choosing one of the IPs from that list blindly, since the order of interfaces can switch in some circumstances (the assigned MAC for the ENI can change when the ENIs are attached to a replacement node).

It doesn't choose blindly, it's supposed to sort by device-number https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/legacy-cloud-providers/aws/aws.go#L1409 so eth0 should be the default.

Feel free to create a new issue to discuss further, IMO it's a bit confusing to continue discussion here as technically the original issue has been fixed.

wongma7 · 2020-02-14T00:19:34Z

/close

k8s-ci-robot · 2020-02-14T00:19:36Z

@wongma7: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. sig/aws labels Mar 30, 2018

micahhausler changed the title ~~Kubelet reports wrong InternalIP in AWS with multiple ENIs~~ Kubelet reports secondary InternalIP in AWS with multiple ENIs Apr 2, 2018

mattlandis mentioned this issue Apr 9, 2018

A pod with host networking drops traffic if k8s describes secondary IP for a node. aws/amazon-vpc-cni-k8s#58

Closed

andrewsykim mentioned this issue Apr 24, 2018

cloud node controller: don't override all node addresses when validating node ip from kubelet #63101

Closed

micahhausler mentioned this issue Apr 25, 2018

Kubelet doesn't report ExternalIP/DNS with --node-ip #63158

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 24, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 23, 2018

kayrus mentioned this issue Dec 4, 2019

CCM: Incorrect network interfaces order, when multiple interfaces are used kubernetes/cloud-provider-openstack#863

Closed

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 28, 2020

k8s-ci-robot reopened this Jan 28, 2020

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 28, 2020

k8s-ci-robot added the area/provider/aws Issues or PRs related to aws provider label Jan 28, 2020

k8s-ci-robot closed this as completed Feb 14, 2020

amdonov mentioned this issue Sep 30, 2020

sort internal ips based on device number #95194

Closed

dashpole mentioned this issue Jan 19, 2021

Add k8s.node in semantic conventions open-telemetry/opentelemetry-specification#1341

Closed

kolorful mentioned this issue Aug 17, 2021

[sync-catalog] Fix NodePort register service with wrong internal IP when multiple internal IPs are reported on the node hashicorp/consul-k8s#619

Merged

2 tasks

This was referenced Aug 2, 2022

All traffic to PODs on remote nodes DROPPED after Cilium restart cilium/cilium#20243

Closed

Better handle when K8s Nodes have more than one IP address per address type in their status field cilium/cilium#20787

Closed

Kubelet reports secondary InternalIP in AWS with multiple ENIs #61921

Kubelet reports secondary InternalIP in AWS with multiple ENIs #61921

Comments

micahhausler commented Mar 30, 2018 • edited by mikedanese Loading

mtaufen commented Apr 3, 2018

micahhausler commented Apr 10, 2018

rtripat commented Apr 11, 2018 • edited Loading

micahhausler commented Apr 11, 2018

wlan0 commented Apr 11, 2018

nckturner commented Apr 11, 2018 • edited Loading

cheftako commented Apr 12, 2018

nckturner commented Apr 12, 2018

micahhausler commented Apr 13, 2018

andrewsykim commented Apr 24, 2018

jlzhao27 commented Apr 24, 2018

andrewsykim commented Apr 24, 2018 • edited Loading

andrewsykim commented Apr 24, 2018 • edited Loading

micahhausler commented Apr 24, 2018 • edited Loading

andrewsykim commented Apr 25, 2018

micahhausler commented Apr 25, 2018

nckturner commented Apr 25, 2018 • edited Loading

micahhausler commented Apr 25, 2018

fejta-bot commented Jul 24, 2018

fejta-bot commented Aug 23, 2018

fejta-bot commented Sep 22, 2018

jnicholls commented Jan 28, 2020

jnicholls commented Jan 28, 2020 • edited Loading

jnicholls commented Jan 28, 2020

k8s-ci-robot commented Jan 28, 2020

bowei commented Jan 28, 2020

k8s-ci-robot commented Jan 28, 2020

k8s-ci-robot commented Jan 28, 2020

k8s-ci-robot commented Jan 28, 2020

k8s-ci-robot commented Jan 28, 2020

k8s-ci-robot commented Jan 28, 2020

k8s-ci-robot commented Jan 28, 2020

k8s-ci-robot commented Jan 28, 2020

k8s-ci-robot commented Jan 28, 2020

bowei commented Jan 28, 2020

k8s-ci-robot commented Jan 28, 2020

bowei commented Jan 28, 2020

wongma7 commented Feb 14, 2020

wongma7 commented Feb 14, 2020

k8s-ci-robot commented Feb 14, 2020

micahhausler commented Mar 30, 2018 •

edited by mikedanese

Loading

rtripat commented Apr 11, 2018 •

edited

Loading

nckturner commented Apr 11, 2018 •

edited

Loading

andrewsykim commented Apr 24, 2018 •

edited

Loading

andrewsykim commented Apr 24, 2018 •

edited

Loading

micahhausler commented Apr 24, 2018 •

edited

Loading

nckturner commented Apr 25, 2018 •

edited

Loading

jnicholls commented Jan 28, 2020 •

edited

Loading