-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not releasing old ENIs #123
Comments
Deleting aws-node on this node resolved the issue. Something is definitely wonky. |
I'm seeing this issue, sporadically, with some nodes. I agree deleting |
@vpm-bradleyhession , troubleshootting-guilde provides guideline on how to troubleshoot CNI issue at cluster level. ipamD should release old ENI if the number of Pods running on the nodes goes below threshold. You can use cni-metrics-helper to view aggregated ENIs and IPs information at the cluster level. kubectl apply -f cni_metrics_helper.yaml You can find out cluster level ipamD statistics
|
@vpm-bradleyhession , @tomfotherby if you see issue #18 or #59 on some nodes, can you run /opt/cni/bin/aws-cni-support.sh and collect node level debugging information? You can send the node level debugging info directly to me ([email protected]) |
The problem is that 90% of the requests to the AWS API are rate limited. The number of ENIs that are hanging around far out way the ones that are actually deleted. |
@vpm-bradleyhession is it possible your "DeleteNetworkInterface" API threshold is caused by some other tools or manually delete all ENIs at same time as ipamD. if ipamD is being throttled, today, it uses AWS SDK exponential backoff and you should ipamdActionInProgress not being 0
|
Hm, I dont think so. They we recently being deleted up until a few days ago (no changes that would affect it. Then one day it hit the ENI limit for our account. I'm going to monitor it to catch when this happens again. Although, currently i'm getting
When using the metrics helper? |
@vpm-bradleyhession is it possible you can deploy cni_metrics_helper and share the output of kubectl logs cni-metrics-helper-xxxxx -n kube-system? |
@vpm-bradleyhession , what version is your CNI? It needs to be 1.0.0
|
ah this might be it. It's Images: right now. Should mention this is a KOPs cluster so the CNI was provisioned there. Is this safe to upgrade? |
We should open an issue w/ Kops to get the version bumped upstream also. Good spot on the version. |
@vpm-bradleyhession Please open an issue w/ Kops. 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:0.1.1 has fixed few issues and made improvement using AWS APIs |
So is upgrading from 0.1.1 -> 1.0.0 safe to do? |
@vpm-bradleyhession Yes. it is safe to upgrade from 0.1.1 -> 1.0.0. The only thing you need to make sure your worker node can reach K8S apiServer on port 443. In 1.0.0, ipamD needs to communicate with K8S apiServer. Some deployment have HTTP PROXY enable to block aws-node communicating with K8S apiServer (e.g. #104). |
You can use telnet to confirm if worker node can reach its apiServer
|
@liwenwu-amazon thanks so much for your help! Yeah this seems to work, I will bump all nodes over to the CNI 1.0.0 version and see if this fixes our issue with rate limiting. I will update this when I have more information for you. Thanks! |
For reference, I'm seeing the issue using EKS with CNI 1.0.0:
(I'll provide more info when it next happens) |
We're experiencing this issue on a cluster using We use Helm and the failed upgrade was rolled back, but the pod count on either cluster shouldn't have gone above 20 during the process. Currently the node we saw the issue on is running 17 pods. While I've been investigating the issue it appears that one ENI on each of the instances was finally released, but they both still have far to many IPs associated for the number of pods running (17 pods across all namespaces on one instance, with 3 ENIs each with 15 addresses).
|
@jlogsdon , today, ipamD will NOT free an ENI if there is any running Pod on top of this ENI. In another word, if there is any Pod using secondary IP address from this ENI, ipamD will not free that ENI. And it will increment that particular error count
|
Would the IP addresses still assigned to that ENI be unavailable for new pods? It seems like there should have been plenty of capacity for more pods given our instance size and running pod count. |
you can find out pod allocation by
|
Thanks. Next time we see this I'll look at that output. |
The IP address of ENI that is NOT assigned to other Pods are available for new Pod. In another word, all those IP address with "Assigned": false are available for new Pods. Here is one example on my setup:
|
For us, because we're running pods with small resource requests, the limiting factor for a node is the number of IP addresses. I did some calculations and the best 3 types are |
I'm seeing a possibly related problem in our cluster that in some cases the worker nodes doesn't release POD IP addresses, which ends up filling the worker node with unusable slots and eventually the worker node is not able to start new pods. When this problem appears kubernetes thinks that the node can accept pods, so they are scheduled but stuck with this error: "Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "xxxxxx" network: add cmd: failed to assign an IP address to container" Currently the only workaround I know is to ssh into the worker nodes and delete all docker containers which are in Exited-state, for example with this oneliner: "docker ps -a | grep Exited | awk '{print $1}' | xargs -n 1 docker rm" |
Today we were seeing Pods fail to start in EKS:
It turned out our subnet had only 256 IPs and it had run out of spare IPs. For reference, to debug it, on Amazon Linux 2, official AMI:
Aha: We run a few |
Following up @tomfotherby's comment seem that we had a significant amount of ENIs that were not in use but kept IPs attached. So our EKS workers subnets starved. |
Some more information for context - the kubernetes scheduler is not aware of a node that's hit it's ENI / Addr limits. For example: Instance with 18 Addresses -> 18 Pods assigned -> 0 Free addresses -> 75% CPU Requested -> In this scenario we get pods that have < 25% CPU OR Memory trying to schedule on this node when there are no addresses left. The Kubernetes scheduler will always try to assign a pod to this node (since the scheduler is not aware IP Limits are a thing). |
@vpm-bradleyhession Hello, I was wondering if you ever got the cni-metrics-helper to push logs to cloudwatch? It seems that we are both using kops for our kubernetes cluster so we might have a similar setup, I deployed the metrics helper to the cluster, and created the IAM policy for cloudwatch, I also attached that policy to the master/worker roles for my cluster. However, I am still seeing this error 'Unable to publish cloudwatch metrics: NoCredentialProviders: no valid providers in chain. Deprecated.' in the cni-metrics-helper pod. |
Does anyone here have any reliable solutions on how to resolve these issues? We seem to run into them multiple times and our work around so far is to find nodes with more than X ENI's and cordon them. Eventually the cluster autoscaler takes care of removing the cordoned instances |
This is causing some real issues for us. All of our clusters are affected by this as the scheduler gets busier throughout the day. Can we acknowledge this as a bug? @liwenwu-amazon do you have any suggestions on how to mitigate the problem? It might not affect the majority of the use-cases but when you're relying 100% on k8s+aws-cni and it happens, the disturbance on the cluster's stability is quite big. For us, it's happening literally every day now. |
Our solution is to clean terminated pods, in our case every terminated pod older than one hour is deleted. This seems to prevent this problem from manifesting. |
Yeah we have set successfulJobHistory: (and failed, too) to 1 on most of our jobs and it seems to mitigate the issue for us, however - this still needs to be acknowledged with an upstream fix. |
@garo Could you please explain "clean terminated pods" ? This could be a good solution for us as well |
To be clear again, our problem is that the node does not have many pods if you do a describe node or even run
But looking at the node from aws console: So how can we clean up these secondary IPs ? |
@harshal-shah I run this https://github.com/hjacobs/kube-job-cleaner as a slightly modified version which just does the cleanup again in an infinitive loop every five minutes. If you ssh into your affected worker node and type "docker ps -a" you should see a great amount of Exited containers and these seems to be what are holding the cni plugin back from releasing the IPs. It seems that deleting terminated pods will eventually clean these away and thus provide a workaround. |
I checked on the node mentioned above, there were a few dead containers, even after deleting those containers, the IPs are not released. |
Update: We're now setting --max-pods on the Kubelet, this will match the number of ENIs (and addresses available). We are setting this per node group (using kops). For kops specifically -
This seems like the only mitigation that is holding strong for us at the moment. |
Hello @mogren, could you comment on this issue please? we're getting affected on a daily basis, would be nice to have an official recommendation from AWS here. Thank you. |
Hi @RTodorov! What version of the plugin are you using? Is this cluster running on EKS? We have a known issue with the CNI that happens if ipamd is killed while it is in the middle of expanding ENI (e.g after having created a new ENI but before attaching it), then it might not be cleaned up. Could you run |
Hello @mogren The cluster is created via I have sent you the output of |
Thanks @harshal-shah, I discussed this with Liwen and basically it comes down to an optimization trade-off in the CNI plugin itself. The way the CNI works is that it allocates IPs in bulk, and adds new ENIs when needed. It un-assigns IPs when the pod gets deleted, but keeps them allocated to the ENI as long as there are pods active in that ENI. (Check the eni.output file for details). When a new pod gets scheduled to the worker, it will randomly pick any existing IP address that is not assigned and allocate it to the new pod. As soon as an ENI has no assigned IPs, meaning all pods using that ENI has been deleted, the ENI will be released and all the IPs with it. The reason for this behavior is both to make scheduling new pods a lot faster, no need to call another service, and also to prevent throttling from calling EC2 too much. (Throttling can still happen when scheduling a lot of pods quickly on new workers, not configuring If you run out of IP addresses in your Subnet, one solution is to use CNI custom network config and create separate subnets for the pods. Would that work for you? |
Hi @mogren One thing that we see in ipamd logs is
|
Hi @mogren we faced this issue again today on a very new node of our cluster. It had less than the prescribed maximum of 55 pods. Following is the event trail for a pod that was not getting IP address for a few minutes and then it got working fine.
I am also sending the tarball with logs to you separately. Hope this helps. |
@harshal-shah Thanks for reporting this. We have gone through the logs and found some minor issues, but have not managed to determine the root cause yet. The previous logs from the 22:nd had a lot more data and I'll keep going through them to see if we can figure out why ipamd gets restarted. |
facing the same issue. In my case I just followed the tutorial and created VPC, cluster and 1 worker node. Now I cannot create any pods :(. Here's what I get from kubectl get events:
|
@paksv - I encountered the same issue. It looks like my pods that are created (including aws-node CNI pod) aren't able to communicated with API server via the cross account eni's. Upon checking the ingress rules of the security groups associated to the cluster, I found |
Running into the same issue here in a deployment. If I delete the pod the next one created usually starts up fine. |
Resolving since v1.5.0 is released. This version contains a lot of changes in how we allocate ENIs and that we stopped force detaching them. If using too many ENIs is an issue, consider setting the |
It looks like old ENIs are not being deleted by IPAM. What happens is it keeps creating ENIs till it hits the account limits.
What then happens is new pods cannot start - error "Failed create pod sandbox." Manually deleting "Available" ENIs allows the controller to create a new ENI, and schedule the pod as normal.
Number of nodes is 25. Not the highest we have had.
Potentially related to #18
I am happy to help with debugging/finding the root cause.
The text was updated successfully, but these errors were encountered: