-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coredns stuck on ContainerCreating with FailedCreatePodSandBox
warning for CNI versions 1.7.6 with Cilium 1.9.1
#1314
Comments
Hi @mmochan , Can you please check if you are hitting this issue - #1265. RC for this issue - #1265 (comment). Thanks. |
Hi jayanthvn AWS_VPC_K8S_PLUGIN_LOG_FILE is being set as expected
But /host/etc/cni/net.d/05-cilium.conflist doesn't match 05-cilium.conflist in issue - [#1265]
Thanks |
Hi @mmochan Yes you will have to add these 2 lines -
Something like this -
|
Hi @jayanthvn Great that works, all pods now running. Are you able to give an ETA for a permanent fix? Thanks for your help. Mike |
Good to know it works, #1275 is merged and we are planning for the next release, I will provide you the dates in a week or so. |
Thanks again @jayanthvn |
Unfortunately does not work for me. Containers stuck with another error like: And there are errors with stack traces in cilium-agent on the node like: Could you please help? btw, is there docker image with the fix above to check it? Thanks! p.s. this is my issue cilium/cilium#14379 (comment) |
CNI Plugin v1.7 does not work with Cilium 1.9! @jayanthvn @mmochan Could you please re-check? |
Hi @kovalyukm, I was just able to run Celium 1.9 in chaining mode with CNIv1.7.5. I added the the following lines to
Can you make sure:
Please let me know if that works. |
Hi @couralex6 ,
and tried like in:
Maybe there is issue in software versions. I use EKS Kubernetes 1.18 and Cilium 1.9.1. Thanks! |
It was also an EKS 1.18 cluster. Your Did you install Celium through Helm as described here: https://docs.cilium.io/en/v1.9/gettingstarted/cni-chaining-aws-cni/ ? |
Seems you use Cilium 1.9.0. Yes, I use Cilium doc to manage it. The workaround works with Cilium 1.9.0, but doesn't work with Cilium 1.9.1. (Seems this version is broken - cilium/cilium#14403 (comment)) Thanks, waiting for CNIv1.7.8 with fix. |
CNIv1.7.8 does not work, the same error like "invalid character '{' after top-level value". |
Hi @kovalyukm Sure we will try Cilium 1.9.1 and get back to you. But @mmochan has tried with Cilium 1.9.1 and the recommended work around. |
@jayanthvn what is the ETA to release that fix that doesn't require any manual modification of the nodes? 2 versions already released after this PR was merged, but this fix was ignored in both of them. |
Hi @shaikatz Sorry for the delay. Will take this up in rel 1.7.9 planned for January. |
Hi @jayanthvn, Can you give an ETA on 1.7.9 release date? Thanks Mike |
Any movement on this? This is a blocker for us as well. |
Thanks fo your patience and sorry for the delay. We are working on the release, it will be part of 1.8. There are other changes which needs to be tested hence we are still working on the timeline. I will keep you all updated often. |
Hi, We have integrated the fix as part of 1.7.9. Release candidate is out - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.7.9-rc1. 1.7.9 Release should be out this week if there are no issues with the deployment. Will update if the date changes. Thanks. |
@jayanthvn I have applied 1.7.9 and the same issue exists when new nodes are added.
|
Hi @mmochan, Could give me a little more context around your issue?
I just ran the following test to confirm the fix was working: Created a new 1.18 EKS cluster, which was running CNI 1.7.5 (default version). Then installed cilium 1.9.4, which broke the cluster as expected. Then installed 1.7.9, which solved the issue on all node. I haven't tried adding new nodes yet. Waiting for your response to tried reproducing. Thanks |
Hi @couralex6 It turns out my CNI upgrade was being overwritten. I can confirm that it is now working. Apologies... |
Awesome, glad to hear it's working @mmochan! |
Hello, @couralex6, Seems something wrong with CNI 1.7.9 and cilium 1.9.4 on 1.19 EKS cluster (kube-proxy:v1.19.6-eksbuild.2, AMI v1.19.6-eks-49a6c0). Assigning of IPs to pods are working, but there are some connectivity issue and the most pods are restarting with probes failed and timeouts. Have you tested it on 1.19 EKS cluster, is everything fine? Thanks! |
Hi @kovalyukm , I just tested again on EKS 1.19 with CNI v1.7.9 chained with both Cilium v1.9.3 and v1.9.4. I deployed a sample Nginx deployment and performed basic connectivity tests (ping between pods on same node and across nodes). Everything looked fine and I am not seeing any failed probes or timeouts. Are you still experiencing the issue? |
Hi @couralex6 , thank you for testing. Yes, there is an issue on new nodes in cluster after replacing nodes with Cilium 1.9.0 version. (i/o timeout in kube-dns logs to VPC dns-servers, EKS API, so on; probes failed of pods and pod CrashLoopBackOff) Upgrading to 1.9.4 fixes the issue, but if there are an old nodes another issue appears here with pod creation (cilium-agents trow exceptions in logs). Thanks! |
Hi @couralex6, Today our EKS cluster had a node refresh and we are now facing the same issues originally reported in this issue/ticket. We have been running Cilium 1.9.1 and CNI 1.7.9 on EKS v1.18.9 for the last 3 weeks, but as a result of the node refresh CNI is not populating
We upgraded to Cilium 1.9.5 and refreshed all the nodes hoping that might fix it, but we are still facing the same issue. |
What happened:
New cluster with nodes restarted.
coredns stuck on ContainerCreating when using CNI v1.7.6 and Cilium 1.9.1.
Other pods are also experiencing the same behavior ( ContainerCreating )
coredns:v1.6.6-eksbuild.1
Attach logs
What you expected to happen:
I expected coredns and other pods to be in running state
How to reproduce it (as minimally and precisely as possible):
Deploy cni version 1.7.6 and cilium 1.9.1 on EKS 1.17
Anything else we need to know?:
We have Cilium running in chaining mode (v1.9.1)
[(https://docs.cilium.io/en/v1.9/gettingstarted/cni-chaining-aws-cni/)]
Environment:
kubectl version
):cat /etc/os-release
):uname -a
):Linux REDACTED.compute.internal 4.14.203-156.332.amzn2.x86_64 Initial commit of amazon-vpc-cni-k8s #1 SMP Fri Oct 30 19:19:33 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: