Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault tries to revoke AWS IAM users that don't exist #5190

Closed
zBart opened this issue Aug 27, 2018 · 7 comments
Closed

Vault tries to revoke AWS IAM users that don't exist #5190

zBart opened this issue Aug 27, 2018 · 7 comments

Comments

@zBart
Copy link

zBart commented Aug 27, 2018

Describe the bug
When we were looking at our CloudTrail logs, we noticed a large amount of errors being generated by the vault user. It seems that vault tries to revoke an IAM user that doesn't exist every minute, here is the log from CloudTrail:

{
    "eventVersion": "1.02",
    "userIdentity": {
      "type": "IAMUser",
      "principalId": "XXXXXXXXXXXXXXXXXXXXX",
      "arn": "arn:aws:iam::our-aws-account-id:user/vault",
      "accountId": "our-aws-account-id",
      "accessKeyId": "XXXXXXXXXXXXXXXXXXXXX",
      "userName": "vault"
    },
    "eventTime": "2018-08-27T10:53:25Z",
    "eventSource": "iam.amazonaws.com",
    "eventName": "ListGroupsForUser",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "xxx.xxx.xxx.xxx",
    "userAgent": "aws-sdk-go/1.12.74 (go1.10.3; linux; amd64)",
    "errorCode": "NoSuchEntityException",
    "errorMessage": "The user with name vault-userpass-aVaultUsername-role-aVaultRoleName-1794 cannot be found.",
    "requestParameters": "{\"userName\": \"vault-userpass-aVaultUsername-role-aVaultRoleName-1794\", \"maxItems\": 1000}",
    "responseElements": "null",
    "requestID": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "eventID": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "eventType": "AwsApiCall",
    "recipientAccountId": "our-aws-account-id"
  }

We think this happens when Vault fails to create the IAM user in the first place (although I suppose it can also happen if someone removes the IAM user manually). Because we only use STS tokens, we never gave Vault the rights to create IAM users. However when using the interface, clicking on a role immediately creates an IAM user. Vault then tries to revoke this, but fails.

To Reproduce
Steps to reproduce the behavior:

  1. Setup Vault with AWS, without giving it IAM permissions to create actual users
  2. Create a role for AWS within vault
  3. Try to generate a account through the web interface (might also work through CLI)
    You'll get this error:
    Error creating IAM user: AccessDenied: User: arn:aws:iam::our-aws-account-id:user/vault is not authorized to perform: iam:CreateUser on resource: arn:aws:iam::[...]
  4. Afterwards, check the CloudTrail logs and see that Vault tries to revoke the user every minute

Expected behavior
Vault acknowledges that the account does not exist (for example by checking the response the IAM API gives) and stops trying to remove the account.

Or alternatively have a way that you can stop Vault from trying to revoke the accounts.

Environment:

  • Vault Server Version (retrieve with vault status):
Key             Value
---             -----
Seal Type       shamir
Sealed          false
Total Shares    5
Threshold       3
Version         0.10.3
Cluster Name    vault-cluster-63d461f0
Cluster ID      xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
HA Enabled      true
HA Cluster      https://our-domain
HA Mode         active
  • Vault CLI Version (retrieve with vault version):
Vault v0.10.3 ('c69ae68faf2bf7fc1d78e3ec62655696a07454c7')
  • Server Operating System/Architecture:
    n/a

Vault server configuration file(s):
n/a

@chrishoffman
Copy link
Contributor

Vault's job is to manage the lease and to keep on trying to revoke it. If the credential was not fully created or manually deleted, you can force revoke the lease to tell vault to "forget" the credential.

See https://www.vaultproject.io/api/system/leases.html#revoke-force.

@zBart
Copy link
Author

zBart commented Aug 27, 2018

While I find it odd that Vault sees the account as created when the API call performing the create returns an error, I understand the philosophy.

I did the revoke as per the documentation:

$ curl -v --header "X-Vault-Token: xxxxxx-xxxx-xxxx-xxxxxxxxxxxx" --request PUT https://our-vault-domain/v1/sys/leases/revoke-force/aws/creds
> PUT /v1/sys/leases/revoke-force/aws/creds HTTP/1.1
> Host: our-vault-domain
> User-Agent: curl/7.47.0
> Accept: */*
> X-Vault-Token: xxxxxx-xxxx-xxxx-xxxxxxxxxxxx
>
< HTTP/1.1 204 No Content
< Date: Mon, 27 Aug 2018 13:41:33 GMT
< Content-Type: application/json
< Connection: keep-alive
< Cache-Control: no-store

But this did not seem to have fixed the issue. We still see Vault trying to remove the users.

Am I revoking the wrong namespace?

@zBart
Copy link
Author

zBart commented Aug 27, 2018

Hey @chrishoffman I was checking the source code trying to figure out why Vault is still trying to revoke the user and noticed that the GCP secrets engine does seem to check the API response when revoking access keys:

https://github.com/hashicorp/vault-plugin-secrets-gcp/blob/master/plugin/secrets_service_account_key.go#L163

And the access token revoking also has some logic attached to it:
https://github.com/hashicorp/vault-plugin-secrets-gcp/blob/master/plugin/secrets_access_token.go#L104

Since GCP does this, maybe it would be a good addition for the AWS engine as well?

@joelthompson
Copy link
Contributor

@chrishoffman I think the issue is that Vault writes a WAL entry before creating the user, and then the user creation fails as does the rollback. I think we should, upon user creation failure, try to delete the WAL entry, and further, detect the NoSuchEntity error and report that as a successful rollback, since a failure to mark a user as revoked would cause Vault to attempt to continue revoking it indefinitely. Should be a relatively simple fix.

joelthompson added a commit to joelthompson/vault that referenced this issue Aug 28, 2018
If AWS IAM user creation failed in any way, the WAL corresponding to the
IAM user would get left around and Vault would try to roll it back.
However, because the user never existed, the rollback failed. Thus, the
WAL would essentially get "stuck" and Vault would continually attempt to
roll it back, failing every time. A similar situation could arise if the
IAM user that Vault created got deleted out of band, or if Vault deleted
it but was unable to write the lease revocation back to storage (e.g., a
storage failure).

This attempts to harden it in two ways. One is by deleting the WAL log
entry if the IAM user creation fails. However, the WAL deletion could
still fail, and this wouldn't help where the user is deleted out of
band, so second, consider the user rolled back if the user just doesn't
exist, under certain circumstances.

Fixes hashicorp#5190
@joelthompson
Copy link
Contributor

I've proposed what I believe will fix this in #5202

chrishoffman pushed a commit that referenced this issue Sep 27, 2018
* logical/aws: Harden WAL entry creation

If AWS IAM user creation failed in any way, the WAL corresponding to the
IAM user would get left around and Vault would try to roll it back.
However, because the user never existed, the rollback failed. Thus, the
WAL would essentially get "stuck" and Vault would continually attempt to
roll it back, failing every time. A similar situation could arise if the
IAM user that Vault created got deleted out of band, or if Vault deleted
it but was unable to write the lease revocation back to storage (e.g., a
storage failure).

This attempts to harden it in two ways. One is by deleting the WAL log
entry if the IAM user creation fails. However, the WAL deletion could
still fail, and this wouldn't help where the user is deleted out of
band, so second, consider the user rolled back if the user just doesn't
exist, under certain circumstances.

Fixes #5190

* Fix segfault in expiration unit tests

TestExpiration_Tidy was passing in a leaseEntry that had a nil Secret,
which then caused a segfault as the changes to revokeEntry didn't check
whether Secret was nil; this is probably unlikely to occur in real life,
but good to be extra cautious.

* Fix potential segfault

Missed the else...

* Respond to PR feedback
@Huper7777
Copy link

[email protected]

@Huper7777
Copy link

build #70700006

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants