Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreDNS version breaks networking in pods #328

Closed
0verc1ocker opened this issue Feb 21, 2019 · 3 comments
Closed

CoreDNS version breaks networking in pods #328

0verc1ocker opened this issue Feb 21, 2019 · 3 comments

Comments

@0verc1ocker
Copy link

I am creating two node groups and passing in --register-with-taints to the kubelet of one of them. The group that has extra argument to kubelet registers the taints properly but then networking inside of the containers on those nodes stop working. Nothing in the logs for aws-node, kubelet, or the CNI. Everything is fine when this same nodegroup starts up without the kubelet extra argument --register-with-taints.

The taint is passed to bootstrap.sh as such:

--kubelet-extra-args '--register-with-taints=\"dedicated=jobs:NoSchedule\" --node-labels=testing/role=shared'

Using the latest ami for us-east-1 and using amazon-k8s-cni:v1.3.2 image for aws-node ds.

@0verc1ocker 0verc1ocker changed the title Adding --register-with-taints breaks networking on nodes Adding --register-with-taints breaks networking in pods Feb 21, 2019
@0verc1ocker
Copy link
Author

I've tried numerous ways to try and solve this today but it's still unsolvable. I've set the bootstrap.sh kubelet-extra-args using the CloudFormation templates and escaping single-quotes and double-quotes. I reset docker network bridge and tested networking inside the nodes. All networking from the host network interface is fine. Attaching a docker container to the host network works and networking inside works as well. Docker info and settings are identical to the docker settings on nodes that have working pod networking. All of this leads me to believe it is not docker networking issue but specifically CNI. Whenever the taints and labels are on the nodes, the CNI networking from inside the pods fails to work, DNS resolution also fails. /etc/resolv.conf is identical across all the nodes. Kubelet starts up fine and shows no errors, which is really confusing.

Eventually, I decided to bypass the bootstrap.sh script all together and use sed command inside the UserData section of the CloudFormation template to set the taints and labels in the kubelet systemd manifest files because maybe the bootstrap.sh was causing something to fail:

sed -i 's#/usr/bin/kubelet#/usr/bin/kubelet --register-with-taints=dedicated=jobs:NoSchedule --node-labels=testing/role=shared,testin/tools-any=true,testing/tenants-any=true#g' /etc/systemd/system/kubelet.service

This works and the taints and labels are there but the CNI networking from inside the pods is still broken...

This looks like a amazon-vpc-cni-k8s issue. We are looking for a resolution please. I would like to make the case to use EKS for our infrastructure migration from kops, but with issues like these we might have to look into GKE.

@micahhausler
Copy link
Member

Accidental close! I have confirmed this is an issue. We'll get someone to take a look at it,.

@0verc1ocker
Copy link
Author

This issue was eventually resolved and turned out to be an issue with the version of coredns that comes on a new 1.11 EKS cluster.

See awslabs/amazon-eks-ami#200

@0verc1ocker 0verc1ocker changed the title Adding --register-with-taints breaks networking in pods CoreDNS version breaks networking in pods Feb 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants