kubectl logs breaks when using the cni plugin #21

dezmodue · 2018-01-10T16:32:11Z

Hi, moving kubernetes/kops#4218 here as suggested

I am getting an error when running kubectl logs POD -n NAMESPACE, specifically:

Error from server: Get https://10.103.20.110:10250/containerLogs/monitoring/grafana-0/grafana: dial tcp 10.103.20.110:10250: getsockopt: connection timed out

This error seems to be related to the fact that the kubelet is registering the wrong IP, it seems that the kubelet reports one of the secondary private IPs (on eth0 as far as I can tell)
In the example error reported:

Error from server: Get https://10.103.20.110:10250/containerLogs/monitoring/grafana-0/grafana: dial tcp 10.103.20.110:10250: getsockopt: connection timed out

10.103.20.110 is a secondary private IP on eth0 and it is the IP shown by kubectl describe node:

Addresses:
  InternalIP:   10.103.20.110
  InternalDNS:  ip-10-103-21-40.megamind.internal
  Hostname:     ip-10-103-21-40.megamind.internal

Note that only a single InternalIP is reported and it is not the one the kubelet is listening on

Locally curl works on both the primary IPs on eth0 and eth1

The problem has also been occurring on the master nodes and it manifests with new nodes unable to join the cluster because the kubelet is unable to contact the API (the IPs are wrong)

If I pass the --node-ip=$(curl http://169.254.169.254/latest/meta-data/local-ipv4) argument to the kubelet everything works as expected

I could not reproduce this specific issue by simply adding an extra eni and secondary IPs (trying to emulate the plugin behaviour) on a cluster running the flannel overlay network - the kubelet reports all available IP Addresses correctly

Versions:
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.4", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"clean", BuildDate:"2017-11-20T05:17:43Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

The text was updated successfully, but these errors were encountered:

liwenwu-amazon · 2018-01-14T16:11:04Z

Hi @dezmodue , thank you for reporting the issue and also providing a solution. The cni plugin back-end (L-IPAM) allocates IP addresses on the primary ENI interface right after it is initialized. So at the time when kubelet is reporting node address, the primary ENI interface already have multiple secondary IP addresses assigned.

So the recommendation (which you have already verified) is to pass the --node-ip=$(curl http://169.254.169.254/latest/meta-data/local-ipv4) argument to the kubelet.

I am closing the issue. Please reopen it if you still think this is a problem.

thanks

dezmodue · 2018-01-14T22:43:21Z

@liwenwu-amazon thanks for your response. I think if this is the official recommendation then it is worth making it explicit in the readme for users and other projects.

Did you see the same issue occurring in EKS? Is the kubelet passed --node-ip?

I wasn't able to reproduce the issue when using a different cni and manually attaching a second eni + secondary IPs on the enis at boot (the kubelet correctly reports the InternalIP and all the secondary IPs on eth0). Do you have any insight into how the issue could be triggered only by this cni?

liwenwu-amazon · 2018-01-15T18:32:14Z

@dezmodue Thank you again. Yes we need to make it explicit in the readme. I am reopening the issue to track this.

This CNI allocates the secondary IP addresses on the instance's primary ENI right after it is initialized. I suspect other CNIs do not perform this action at boot time. Also, were you manually allocate secondary IP addresses on the instance's primary ENI at boot time (in another word even before kubelet finishing initialization)?

dezmodue · 2018-01-16T08:58:52Z

@liwenwu-amazon in my test I would:

create a secondary eni in the console
increase the desired instance count of the nodes autoscaling group to trigger a new instance start
Add the secondary eni to the instance as it appeared in the console
Add secondary IPs to both the primary eni and the secondary eni

What I could see is that in that case the kubelet was correctly reporting all the IPs associated to the Primary eni (primary and secondary) and correctly set the InternalIP as the primary IP on the primary eni (which also confirmed the IPs and eni were associated before the kubelet was started)

liwenwu-amazon · 2018-01-17T00:27:47Z

@dezmodue thank you again for your test. I suspect in your case, the kubelete reports the node address before the change that you made through console finally get realized on the node.

Here is what I am seeing on my setup:

at time1 (19:04:59) cni allocates secondary IP address on primary interface

/var/log/aws-routed-eni/ipamd.log.xxx
2018-01-16T19:04:59Z [INFO] Trying to allocate all available ip addresses on eni: eni-6001ee4e

at time2(19:05:07) kubelet report "NodeReady", which include node addresses

/var/log/daemon.log
Jan 16 19:05:07 ip-20-0-50-227 kubelet[1206]: I0116 19:05:07.544983    1206 kubelet_node_status.go:443] Recording NodeReady event message for node ip-20-0-50-227.us-west-1.compute.internal

Here are snips of kubelet code

// defaultNodeStatusFuncs is a factory that generates the default set of
// setNodeStatus funcs
func (kl *Kubelet) defaultNodeStatusFuncs() []func(*v1.Node) error {
        // initial set of node status update handlers, can be modified by Option's
        withoutError := func(f func(*v1.Node)) func(*v1.Node) error {
                return func(n *v1.Node) error {
                        f(n)
                        return nil
                }
        }
        return []func(*v1.Node) error{
                kl.setNodeAddress,
                withoutError(kl.setNodeStatusInfo),
                withoutError(kl.setNodeOODCondition),
                withoutError(kl.setNodeMemoryPressureCondition),
                withoutError(kl.setNodeDiskPressureCondition),
                withoutError(kl.setNodeReadyCondition),
                withoutError(kl.setNodeVolumesInUseStatus),
                withoutError(kl.recordNodeSchedulableEvent),
        }
}  

// Kubelet is the main kubelet implementation.
type Kubelet struct {
...
// handlers called during the tryUpdateNodeStatus cycle
        setNodeStatusFuncs []func(*v1.Node) error
...
}

// NewMainKubelet instantiates a new Kubelet object along with all the required internal modules.
// No initialization of Kubelet and its modules should happen here.
func NewMainKubelet(kubeCfg *kubeletconfiginternal.KubeletConfiguration, {
...
klet.setNodeStatusFuncs = klet.defaultNodeStatusFuncs()
...
}

dezmodue · 2018-01-24T20:51:49Z

@liwenwu-amazon thanks for your reply.
What is next step and is there anything I can do to help?

liwenwu-amazon · 2018-01-26T18:16:50Z

@dezmodue Thank you and we will update README.md and make it explicit to users and other projects.

Update readme for issue #21

labria · 2018-03-08T15:59:21Z

For anyone struggling with the issue and using kops, this has been recently fixed in kops: kubernetes/kops@e406dbf.
I've tested this (custom-build nodeup, it's not release yet I think) and it works like a charm!

liwenwu-amazon closed this as completed Jan 14, 2018

liwenwu-amazon reopened this Jan 15, 2018

liwenwu-amazon added a commit to liwenwu-amazon/amazon-vpc-cni-k8s-1 that referenced this issue Jan 26, 2018

Update readme for issue aws#21

335c8e0

liwenwu-amazon added a commit that referenced this issue Jan 30, 2018

Merge pull request #24 from liwenwu-amazon/readme-issue21

6bbd6da

Update readme for issue #21

liwenwu-amazon closed this as completed Jan 30, 2018

micahhausler mentioned this issue Mar 30, 2018

Kubelet reports secondary InternalIP in AWS with multiple ENIs kubernetes/kubernetes#61921

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubectl logs breaks when using the cni plugin #21

kubectl logs breaks when using the cni plugin #21

dezmodue commented Jan 10, 2018

liwenwu-amazon commented Jan 14, 2018

dezmodue commented Jan 14, 2018

liwenwu-amazon commented Jan 15, 2018

dezmodue commented Jan 16, 2018

liwenwu-amazon commented Jan 17, 2018

dezmodue commented Jan 24, 2018

liwenwu-amazon commented Jan 26, 2018

labria commented Mar 8, 2018

kubectl logs breaks when using the cni plugin #21

kubectl logs breaks when using the cni plugin #21

Comments

dezmodue commented Jan 10, 2018

liwenwu-amazon commented Jan 14, 2018

dezmodue commented Jan 14, 2018

liwenwu-amazon commented Jan 15, 2018

dezmodue commented Jan 16, 2018

liwenwu-amazon commented Jan 17, 2018

dezmodue commented Jan 24, 2018

liwenwu-amazon commented Jan 26, 2018

labria commented Mar 8, 2018