Dangling ENIs without any association with Instances #1447

Buffer0x7cd · 2021-04-29T17:24:22Z

What happened:
During one of incidents , where pods are failing due to IP address exhaustion, We noticed that there a lots of ENIs that are allocated , But are not attached to any Instances. Our first assumption was these might be the ENIs that are created to maintain warm pool on the nodes, But After checking them we discovered that there are no tags node.k8s.amazonaws.com/instance_id tags available on those ENIs, Which doesn’t seems like expected behaviour.

amazon-vpc-cni-k8s/pkg/awsutils/awsutils.go

Line 606 in 9db2ae6

    
           func (cache *EC2InstanceMetadataCache) AllocENI(useCustomCfg bool, sg []*string, subnet string) (string, error) {

As far i can see, Allocation and attachment of ENIs are so there shouldn’t be the case where ENIs are allocated but are not attached and have missing tags, Except here (

amazon-vpc-cni-k8s/pkg/awsutils/awsutils.go

Line 616 in 9db2ae6

awsUtilsErrInc("AllocENIDeleteErr", err)

ENI attach and delete both failed). To verify this i checked the prometheus metrics for AttachNetworkInterface api for any errors , but there are no significant increases here that explains this being the cause of increase in Allocated ENIs.

The text was updated successfully, but these errors were encountered:

jayanthvn · 2021-04-29T17:44:44Z

Hi @Buffer0x7cd

Do you have short lived instances/cluster? Also do you have any node termination policy? There is one known issue (#1223), After ENI is detached, it will take few seconds for the ENI to delete, if in the mean time node is terminated then the ENI will be dangling in the account.

Buffer0x7cd · 2021-04-29T18:02:01Z

HI @jayanthvn

It doesn’t seems like this is the issue.

amazon-vpc-cni-k8s/pkg/awsutils/awsutils.go

Line 836 in 9db2ae6

    
           func (cache *EC2InstanceMetadataCache) freeENI(eniName string, sleepDelayAfterDetach time.Duration, maxBackoffDelay time.Duration) error {

From my understanding , In the case here. ENI will be First detached and deleted. Assuming the ENI was first Attached It should have the node.k8s.amazonaws.com/instance_id tag, Even after being detached ( As there is no steps to delete tags in the freeENI method).

In our observed case we can see that the dangling ENIs have no node.k8s.amazonaws.com/instance_id tag available , Which should be present if these Dangling ENIs were due to #1223

jayanthvn · 2021-04-29T18:27:31Z

Yeah makes sense, I quickly ran a test and detached an ENI and I still see the instance_id tag even though the ENI is detached. Can you please open a support case?

jayanthvn · 2021-05-18T18:25:04Z

Hi @Buffer0x7cd

For the ENI, do you see the "node.k8s.amazonaws.com/createdAt" tag present?

Buffer0x7cd · 2021-07-13T09:59:35Z

@jayanthvn yes i can see the node.k8s.amazonaws.com/createdAt at tag present

jayanthvn · 2021-08-04T21:52:00Z

Thanks for checking @Buffer0x7cd. So looks like createENI is fine but if attachENI failed we would have deleted the ENI -

amazon-vpc-cni-k8s/pkg/awsutils/awsutils.go

Lines 612 to 614 in 9db2ae6

    
           attachmentID, err := cache.attachENI(eniID) 
        
           if err != nil { 
        
           	derr := cache.deleteENI(eniID, maxENIBackoffDelay)

. If you can open a support case, then we can check EC2 logs to confirm why attachENI failed.

aclevername · 2021-09-22T15:52:55Z

We've noticed this while working on https://github.com/weaveworks/eksctl/ too. We recently managed to reproduce this issue: eksctl-io/eksctl#4214 (comment)

hiattp · 2021-12-10T04:33:02Z

We're seeing a similar/related issue but have cases where none of the active pods have ENIs that are attached to instances (the node has 2 ENIs with 10 and 1 private IP addresses respectively, and there are 13 pods on the node none of which use those ENIs). Not sure if this is actually the same issue but we've raised a support ticket (~~9328577341~~ 9331293811). The original reason we raised the ticket was due to pods getting stuck in Pending with events like:

Warning FailedScheduling 21s (x12 over 13m) default-scheduler 0/10 nodes are available: 4 node(s) didn't match node selector, 6 Insufficient vpc.amazonaws.com/pod-eni.

And further investigation led us to this issue, but it's unclear whether the issues are related.

GaruGaru · 2022-02-09T11:45:36Z

Same issue running v1.7.5-eksbuild.1 on v1.21.5-eks-9017834.
We have many unused ENI interfaces with just the node.k8s.amazonaws.com/createdAt tag set.
This is pretty important since it can lead to available interface exhaustion causing service disruption.

github-actions · 2022-04-14T00:16:25Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

bryantbiggs · 2022-04-14T00:24:39Z

Not stale

jayanthvn · 2022-04-18T23:25:13Z

@aclevername - in the issue you mentioned we do see the node.k8s.amazonaws.com/instance_id. Typically this happens when node is terminated between delete and detach ENI calls.

@bryantbiggs or @GaruGaru - Can one of you please share IPAMD logs? You can email the log bundle to - [email protected]

github-actions · 2022-06-18T00:16:19Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

bryantbiggs · 2022-06-18T00:23:21Z

Not stale

timblaktu · 2022-09-19T15:52:21Z

Tagging teammate @vidhyadharm about this "dangling ENI" issue, suggested by @bryantbiggs as root cause for our vpc deletion issue in eks blueprints and the corresponding vpc deletion issue in aws vpc module.

github-actions · 2022-11-19T00:03:06Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

jayanthvn · 2022-11-19T01:14:05Z

/not stale

github-actions · 2023-01-20T00:02:55Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions · 2023-02-04T00:03:04Z

Issue closed due to inactivity.

yukccy · 2023-11-21T21:50:59Z

Is there any fix for this issue? Coming from terraform-aws-modules/terraform-aws-vpc#283 that cannot delete VPC due to DependencyViolation

demisx · 2024-04-09T17:37:42Z

In my case, there were nginx and eks related security groups left behind after EKS deletion. Once I removed those manually via AWS console, the VPC was destroyed within a couple of seconds.

NathanDotTo · 2024-09-18T07:57:58Z

This still appears to be an issue. It seems that the only workaround is to manually delete the VPC.

flaso-giron · 2024-10-10T15:13:34Z

This is still an issue.

Buffer0x7cd added the bug label Apr 29, 2021

aclevername mentioned this issue Sep 21, 2021

Delete daemonset(s) as part of cluster deletion eksctl-io/eksctl#4214

Closed

github-actions bot added the stale Issue or PR is stale label Apr 14, 2022

jayanthvn removed the stale Issue or PR is stale label Apr 14, 2022

github-actions bot added the stale Issue or PR is stale label Jun 18, 2022

github-actions bot removed the stale Issue or PR is stale label Jun 19, 2022

askulkarni2 mentioned this issue Jun 29, 2022

[Bug]: eks_blueprints.module.aws_eks.aws_security_group.node not being deleted properly due to orphaned ENI(s) aws-ia/terraform-aws-eks-blueprints#699

Closed

1 task

bryantbiggs mentioned this issue Sep 17, 2022

The VPC Has Dependencies and Cannot be Deleted terraform-aws-modules/terraform-aws-vpc#283

Closed

github-actions bot added the stale Issue or PR is stale label Nov 19, 2022

github-actions bot removed the stale Issue or PR is stale label Nov 20, 2022

github-actions bot added the stale Issue or PR is stale label Jan 20, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dangling ENIs without any association with Instances #1447

Dangling ENIs without any association with Instances #1447

Buffer0x7cd commented Apr 29, 2021

jayanthvn commented Apr 29, 2021

Buffer0x7cd commented Apr 29, 2021

jayanthvn commented Apr 29, 2021

jayanthvn commented May 18, 2021

Buffer0x7cd commented Jul 13, 2021

jayanthvn commented Aug 4, 2021

aclevername commented Sep 22, 2021 •

edited

Loading

hiattp commented Dec 10, 2021 •

edited

Loading

GaruGaru commented Feb 9, 2022 •

edited

Loading

github-actions bot commented Apr 14, 2022

bryantbiggs commented Apr 14, 2022

jayanthvn commented Apr 18, 2022

github-actions bot commented Jun 18, 2022

bryantbiggs commented Jun 18, 2022

timblaktu commented Sep 19, 2022

github-actions bot commented Nov 19, 2022

jayanthvn commented Nov 19, 2022

github-actions bot commented Jan 20, 2023

github-actions bot commented Feb 4, 2023

yukccy commented Nov 21, 2023

demisx commented Apr 9, 2024

NathanDotTo commented Sep 18, 2024

flaso-giron commented Oct 10, 2024

Dangling ENIs without any association with Instances #1447

Dangling ENIs without any association with Instances #1447

Comments

Buffer0x7cd commented Apr 29, 2021

jayanthvn commented Apr 29, 2021

Buffer0x7cd commented Apr 29, 2021

jayanthvn commented Apr 29, 2021

jayanthvn commented May 18, 2021

Buffer0x7cd commented Jul 13, 2021

jayanthvn commented Aug 4, 2021

aclevername commented Sep 22, 2021 • edited Loading

hiattp commented Dec 10, 2021 • edited Loading

GaruGaru commented Feb 9, 2022 • edited Loading

github-actions bot commented Apr 14, 2022

bryantbiggs commented Apr 14, 2022

jayanthvn commented Apr 18, 2022

github-actions bot commented Jun 18, 2022

bryantbiggs commented Jun 18, 2022

timblaktu commented Sep 19, 2022

github-actions bot commented Nov 19, 2022

jayanthvn commented Nov 19, 2022

github-actions bot commented Jan 20, 2023

github-actions bot commented Feb 4, 2023

yukccy commented Nov 21, 2023

demisx commented Apr 9, 2024

NathanDotTo commented Sep 18, 2024

flaso-giron commented Oct 10, 2024

aclevername commented Sep 22, 2021 •

edited

Loading

hiattp commented Dec 10, 2021 •

edited

Loading

GaruGaru commented Feb 9, 2022 •

edited

Loading