Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubectl logs breaks when using the cni plugin #21

Closed
dezmodue opened this issue Jan 10, 2018 · 8 comments
Closed

kubectl logs breaks when using the cni plugin #21

dezmodue opened this issue Jan 10, 2018 · 8 comments

Comments

@dezmodue
Copy link

Hi, moving kubernetes/kops#4218 here as suggested

I am getting an error when running kubectl logs POD -n NAMESPACE, specifically:

Error from server: Get https://10.103.20.110:10250/containerLogs/monitoring/grafana-0/grafana: dial tcp 10.103.20.110:10250: getsockopt: connection timed out

This error seems to be related to the fact that the kubelet is registering the wrong IP, it seems that the kubelet reports one of the secondary private IPs (on eth0 as far as I can tell)
In the example error reported:

Error from server: Get https://10.103.20.110:10250/containerLogs/monitoring/grafana-0/grafana: dial tcp 10.103.20.110:10250: getsockopt: connection timed out

10.103.20.110 is a secondary private IP on eth0 and it is the IP shown by kubectl describe node:

Addresses:
  InternalIP:   10.103.20.110
  InternalDNS:  ip-10-103-21-40.megamind.internal
  Hostname:     ip-10-103-21-40.megamind.internal

Note that only a single InternalIP is reported and it is not the one the kubelet is listening on

Locally curl works on both the primary IPs on eth0 and eth1

The problem has also been occurring on the master nodes and it manifests with new nodes unable to join the cluster because the kubelet is unable to contact the API (the IPs are wrong)

If I pass the --node-ip=$(curl http://169.254.169.254/latest/meta-data/local-ipv4) argument to the kubelet everything works as expected

I could not reproduce this specific issue by simply adding an extra eni and secondary IPs (trying to emulate the plugin behaviour) on a cluster running the flannel overlay network - the kubelet reports all available IP Addresses correctly

Versions:
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.4", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"clean", BuildDate:"2017-11-20T05:17:43Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

@liwenwu-amazon
Copy link
Contributor

Hi @dezmodue , thank you for reporting the issue and also providing a solution. The cni plugin back-end (L-IPAM) allocates IP addresses on the primary ENI interface right after it is initialized. So at the time when kubelet is reporting node address, the primary ENI interface already have multiple secondary IP addresses assigned.

So the recommendation (which you have already verified) is to pass the --node-ip=$(curl http://169.254.169.254/latest/meta-data/local-ipv4) argument to the kubelet.

I am closing the issue. Please reopen it if you still think this is a problem.

thanks

@dezmodue
Copy link
Author

@liwenwu-amazon thanks for your response. I think if this is the official recommendation then it is worth making it explicit in the readme for users and other projects.

Did you see the same issue occurring in EKS? Is the kubelet passed --node-ip?

I wasn't able to reproduce the issue when using a different cni and manually attaching a second eni + secondary IPs on the enis at boot (the kubelet correctly reports the InternalIP and all the secondary IPs on eth0). Do you have any insight into how the issue could be triggered only by this cni?

@liwenwu-amazon
Copy link
Contributor

@dezmodue Thank you again. Yes we need to make it explicit in the readme. I am reopening the issue to track this.

This CNI allocates the secondary IP addresses on the instance's primary ENI right after it is initialized. I suspect other CNIs do not perform this action at boot time. Also, were you manually allocate secondary IP addresses on the instance's primary ENI at boot time (in another word even before kubelet finishing initialization)?

@dezmodue
Copy link
Author

@liwenwu-amazon in my test I would:

  • create a secondary eni in the console
  • increase the desired instance count of the nodes autoscaling group to trigger a new instance start
  • Add the secondary eni to the instance as it appeared in the console
  • Add secondary IPs to both the primary eni and the secondary eni

What I could see is that in that case the kubelet was correctly reporting all the IPs associated to the Primary eni (primary and secondary) and correctly set the InternalIP as the primary IP on the primary eni (which also confirmed the IPs and eni were associated before the kubelet was started)

@liwenwu-amazon
Copy link
Contributor

@dezmodue thank you again for your test. I suspect in your case, the kubelete reports the node address before the change that you made through console finally get realized on the node.

Here is what I am seeing on my setup:

  • at time1 (19:04:59) cni allocates secondary IP address on primary interface
/var/log/aws-routed-eni/ipamd.log.xxx
2018-01-16T19:04:59Z [INFO] Trying to allocate all available ip addresses on eni: eni-6001ee4e
  • at time2(19:05:07) kubelet report "NodeReady", which include node addresses
/var/log/daemon.log
Jan 16 19:05:07 ip-20-0-50-227 kubelet[1206]: I0116 19:05:07.544983    1206 kubelet_node_status.go:443] Recording NodeReady event message for node ip-20-0-50-227.us-west-1.compute.internal

Here are snips of kubelet code

// defaultNodeStatusFuncs is a factory that generates the default set of
// setNodeStatus funcs
func (kl *Kubelet) defaultNodeStatusFuncs() []func(*v1.Node) error {
        // initial set of node status update handlers, can be modified by Option's
        withoutError := func(f func(*v1.Node)) func(*v1.Node) error {
                return func(n *v1.Node) error {
                        f(n)
                        return nil
                }
        }
        return []func(*v1.Node) error{
                kl.setNodeAddress,
                withoutError(kl.setNodeStatusInfo),
                withoutError(kl.setNodeOODCondition),
                withoutError(kl.setNodeMemoryPressureCondition),
                withoutError(kl.setNodeDiskPressureCondition),
                withoutError(kl.setNodeReadyCondition),
                withoutError(kl.setNodeVolumesInUseStatus),
                withoutError(kl.recordNodeSchedulableEvent),
        }
}  

// Kubelet is the main kubelet implementation.
type Kubelet struct {
...
// handlers called during the tryUpdateNodeStatus cycle
        setNodeStatusFuncs []func(*v1.Node) error
...
}

// NewMainKubelet instantiates a new Kubelet object along with all the required internal modules.
// No initialization of Kubelet and its modules should happen here.
func NewMainKubelet(kubeCfg *kubeletconfiginternal.KubeletConfiguration, {
...
klet.setNodeStatusFuncs = klet.defaultNodeStatusFuncs()
...
}

@dezmodue
Copy link
Author

@liwenwu-amazon thanks for your reply.
What is next step and is there anything I can do to help?

@liwenwu-amazon
Copy link
Contributor

@dezmodue Thank you and we will update README.md and make it explicit to users and other projects.

liwenwu-amazon added a commit to liwenwu-amazon/amazon-vpc-cni-k8s-1 that referenced this issue Jan 26, 2018
liwenwu-amazon added a commit that referenced this issue Jan 30, 2018
@labria
Copy link
Contributor

labria commented Mar 8, 2018

For anyone struggling with the issue and using kops, this has been recently fixed in kops: kubernetes/kops@e406dbf.
I've tested this (custom-build nodeup, it's not release yet I think) and it works like a charm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants