You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've been in the process of rolling out security groups for pods across various EKS clusters in multiple AWS accounts that we own. Recently, we've attempted to do the same in one of our "larger" AWS accounts, and noticed that pods would hang indefinitely waiting for an ENI to be attached by vpc-resource-controller.
We recently raised the ENI limits on this account to a very large value, and was told that unpaginated calls to DescribeNetworkInterfaces would be blocked as a result, and it seems that this is impacting vpc-resource-controller as you can see here in this CloudTrail event:
@ejholmes thanks for reporting the issue. As you mentioned, calls to EC2 shouldn't be blocked by EC2 just because they are unpaginated. I have found the request ID in our system. We need to check with EC2 team for the root cause. Thanks.
Just to update this issue for anybody else that manages to run into this, the issue was from being blocked by EC2 on ec2:DescribeNetworkInterfaces. We discovered an issue where Glue was leaking ENI's and leaving them orphaned, and managed to clean those up with some effort, and lowered our limit on ENI's (we were told we needed to be under 20k to be unblocked).
Once that was done, and EC2 unblocked us from ec2:DescribeNetworkInterfaces, security groups for pods was functional in the account again.
@ejholmes sorry for a late response. Glad to hear it works now. We are taking an action to explore if we can re-arch the workflow managing interfaces to enable paginated API calls. The reason we can't enable it now is the pagination call won't guarantee the count of response pages which means we may have uncertain API calls to make based on how many pages EC2 can return. The risk is this will likely put a risk on user's account limit and throttle user's account if the pagintating iteration is large enough. I will keep this issue open for now for tracking purpose till we have an approach as an update. Thanks.
Describe the Bug:
We've been in the process of rolling out security groups for pods across various EKS clusters in multiple AWS accounts that we own. Recently, we've attempted to do the same in one of our "larger" AWS accounts, and noticed that pods would hang indefinitely waiting for an ENI to be attached by vpc-resource-controller.
We recently raised the ENI limits on this account to a very large value, and was told that unpaginated calls to
DescribeNetworkInterfaces
would be blocked as a result, and it seems that this is impacting vpc-resource-controller as you can see here in this CloudTrail event:For AWS engineers, our case # is 12229078111
Observed Behavior:
Calls to
DescribeNetworkInterfaces
are blocked by EC2 with aClient.OperationNotPermitted
error codeExpected Behavior:
SGP should work even if unpaginated calls are blocked.
How to reproduce it (as minimally and precisely as possible):
Increase ENI limits in an AWS account high enough that EC2 blocks unpaginated calls to
DescribeNetworkInterfaces
Additional Context:
Environment:
kubectl version
):Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.16-eks-ffeb93d", GitCommit:"52e500d139bdef42fbc4540c357f0565c7867a81", GitTreeState:"clean", BuildDate:"2022-11-29T18:41:42Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
The text was updated successfully, but these errors were encountered: