-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vault tries to revoke AWS IAM users that don't exist #5190
Comments
Vault's job is to manage the lease and to keep on trying to revoke it. If the credential was not fully created or manually deleted, you can force revoke the lease to tell vault to "forget" the credential. See https://www.vaultproject.io/api/system/leases.html#revoke-force. |
While I find it odd that Vault sees the account as created when the API call performing the create returns an error, I understand the philosophy. I did the revoke as per the documentation:
But this did not seem to have fixed the issue. We still see Vault trying to remove the users. Am I revoking the wrong namespace? |
Hey @chrishoffman I was checking the source code trying to figure out why Vault is still trying to revoke the user and noticed that the GCP secrets engine does seem to check the API response when revoking access keys: And the access token revoking also has some logic attached to it: Since GCP does this, maybe it would be a good addition for the AWS engine as well? |
@chrishoffman I think the issue is that Vault writes a WAL entry before creating the user, and then the user creation fails as does the rollback. I think we should, upon user creation failure, try to delete the WAL entry, and further, detect the NoSuchEntity error and report that as a successful rollback, since a failure to mark a user as revoked would cause Vault to attempt to continue revoking it indefinitely. Should be a relatively simple fix. |
If AWS IAM user creation failed in any way, the WAL corresponding to the IAM user would get left around and Vault would try to roll it back. However, because the user never existed, the rollback failed. Thus, the WAL would essentially get "stuck" and Vault would continually attempt to roll it back, failing every time. A similar situation could arise if the IAM user that Vault created got deleted out of band, or if Vault deleted it but was unable to write the lease revocation back to storage (e.g., a storage failure). This attempts to harden it in two ways. One is by deleting the WAL log entry if the IAM user creation fails. However, the WAL deletion could still fail, and this wouldn't help where the user is deleted out of band, so second, consider the user rolled back if the user just doesn't exist, under certain circumstances. Fixes hashicorp#5190
I've proposed what I believe will fix this in #5202 |
* logical/aws: Harden WAL entry creation If AWS IAM user creation failed in any way, the WAL corresponding to the IAM user would get left around and Vault would try to roll it back. However, because the user never existed, the rollback failed. Thus, the WAL would essentially get "stuck" and Vault would continually attempt to roll it back, failing every time. A similar situation could arise if the IAM user that Vault created got deleted out of band, or if Vault deleted it but was unable to write the lease revocation back to storage (e.g., a storage failure). This attempts to harden it in two ways. One is by deleting the WAL log entry if the IAM user creation fails. However, the WAL deletion could still fail, and this wouldn't help where the user is deleted out of band, so second, consider the user rolled back if the user just doesn't exist, under certain circumstances. Fixes #5190 * Fix segfault in expiration unit tests TestExpiration_Tidy was passing in a leaseEntry that had a nil Secret, which then caused a segfault as the changes to revokeEntry didn't check whether Secret was nil; this is probably unlikely to occur in real life, but good to be extra cautious. * Fix potential segfault Missed the else... * Respond to PR feedback
build #70700006 |
Describe the bug
When we were looking at our CloudTrail logs, we noticed a large amount of errors being generated by the vault user. It seems that vault tries to revoke an IAM user that doesn't exist every minute, here is the log from CloudTrail:
We think this happens when Vault fails to create the IAM user in the first place (although I suppose it can also happen if someone removes the IAM user manually). Because we only use STS tokens, we never gave Vault the rights to create IAM users. However when using the interface, clicking on a role immediately creates an IAM user. Vault then tries to revoke this, but fails.
To Reproduce
Steps to reproduce the behavior:
You'll get this error:
Error creating IAM user: AccessDenied: User: arn:aws:iam::our-aws-account-id:user/vault is not authorized to perform: iam:CreateUser on resource: arn:aws:iam::[...]
Expected behavior
Vault acknowledges that the account does not exist (for example by checking the response the IAM API gives) and stops trying to remove the account.
Or alternatively have a way that you can stop Vault from trying to revoke the accounts.
Environment:
vault status
):vault version
):n/a
Vault server configuration file(s):
n/a
The text was updated successfully, but these errors were encountered: