-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Driver gets stuck at double volume removals #1466
Comments
Hi @FooBarWidget
If that's not the case please help us with more details. |
Hi, we have the same problem in our cluster. The Policy already allows describing the access Points, like shown below.
|
Hi @noudAndi, |
@FooBarWidget, If I understand it correctly, the second deletion request from external provisioner is looking for a tag |
Seems external provisioner side car is the issue here, EFS CSI Driver is using v5.0.1-eks-1-30-8 which has a bug or regression and which is mitigated in the latest version v5.1.0-eks-1-31-5. |
Hi, |
@mskanth972 It would be huge, if you could push out a new version with the updates dependencies. This bug is causing havoc on our CI pipeline. 💥 |
Hi @noudAndi, Its already released to GitHub. Addons ECD is 11/15. |
@mskanth972 Sorry, but what do you mean by
|
We provide EFS CSI Driver as an Addons for EKS cluster, so Addons release takes time when compared to normal GitHub release, so users using Addons need to wait till 11/15 for latest release. |
Ok, thank for the explanation. I'm deploying via helm, so I just bump the image versions. Let's see! |
Closing the issue for now, feel free to reopen if the issue still persists. |
/kind bug
What happened?
When we delete a PVC, external-provisioner sends two duplicate "DeleteVolume" commands in quick succession to the EFS driver. I don't know why, but it does, consistently.
At the same time, for security reasons, we have an IAM policy set on the EFS driver role, that restricts EFS DeleteVolume calls to only those volumes with a "cluster" tag. We don't want the driver to be able to delete any other volumes.
Depending on timing, the second DeleteVolume may fail with an Access Denied, like this:
The PersistentVolumeClaim then gets stuck in a deleting state with "VolumeFailedDelete" warning events. The driver keeps retrying and keeps failing. We have to manually remove the finalizer to unstuck the PVC.
I think it's because nonexistant access points count as "not having the tag" and so the delete call fails.
What you expected to happen?
Not getting a Permission Denied. Not getting stuck.
Maybe you can first perform a DescribeAccessPoint to check whether it exists, before deleting.
How to reproduce it (as minimally and precisely as possible)?
Modify the driver role to add a tag condition, as described above.
Create a PVC:
Then delete it. In the driver logs you will see two DeleteVolume calls in quick succession.
It may or may not also get a Permission Denied error, depending on timing. You may have to repeat creation and deletion a couple of times to reproduce the error.
Anything else we need to know?:
We are AWS premium support customer.
Environment
kubectl version
): v1.29.7-eks-a18cd3aThe text was updated successfully, but these errors were encountered: