Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault loop - High CPU with token lookup #4179

Closed
rhuddleston opened this issue Mar 21, 2018 · 23 comments
Closed

Vault loop - High CPU with token lookup #4179

rhuddleston opened this issue Mar 21, 2018 · 23 comments
Assignees
Labels
bug Used to indicate a potential bug core/token

Comments

@rhuddleston
Copy link

Environment:

  • Vault Version: Vault v0.9.5
  • Operating System/Architecture: 16.04.4

Vault Config File:

storage "consul" {
address = "127.0.0.1:8500"
path = "vault/"
advertise_addr = "https://node1.prod:8200"
redirect_addr = "https://node1.prod:8200"
cluster_addr = "https://node1.demo.prod:8199"
service_tags = "0.9.5"
token = "xxxxxx"
}

listener "tcp" {
address = "0.0.0.0:8200"
cluster_address = "0.0.0.0:8199"
tls_cert_file = "/keystore/corp.inter.root.bundle.crt.pem"
tls_key_file = "/keystore/corp.key.pem"
tls_min_version = "tls12"
tls_cipher_suites = "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256"
tls_prefer_server_cipher_suites = "true"
}

Expected Behavior:

If a vault token doesn't exist anymore but is still listed under a parent don't go into a loop

Actual Behavior:
Since upgrading to 0.9.5 I've had two instances where vault goes into a loop hammering consul requests. For example in this case it constantly making the following requests in a loop:

GET /v1/kv/vault/sys/token/parent/4148eb28e502542be3e32db4e68bece1e65f9868/?keys=&separator=%2F
GET /v1/kv/vault/sys/token/parent/a4773073fd5dede80c0ea08b8bac4bf78144aec0/?keys=&separator=%2F
GET /v1/kv/vault/sys/token/id/4148eb28e502542be3e32db4e68bece1e65f9868

Of these three keys only one currently exists:

/v1/kv/vault/sys/token/parent/a4773073fd5dede80c0ea08b8bac4bf78144aec0/

The above key is actually a directory with a single key "4148eb28e502542be3e32db4e68bece1e65f9868"

So it seems that if vault gets into this situation that the key doesn't exist from the parent dir in the /v1/kv/vault/sys/token/id/ keyspace that it ends up going into an infinite loop.

I'm opening this issue as it's now occurred twice since I upgraded vault recently, but in any case I'm hoping you can take a look at the code and avoid this loop if this condition occurs in the future

@rhuddleston
Copy link
Author

Forgot to mention. I can stop the loop by manually removing /v1/kv/vault/sys/token/parent/a4773073fd5dede80c0ea08b8bac4bf78144aec0/ from consul

@calvn calvn self-assigned this Mar 22, 2018
@calvn
Copy link
Contributor

calvn commented Mar 22, 2018

If a vault token doesn't exist anymore but is still listed under a parent ...

I don't think you can get into this state unless the storage entries were edited manually. As soon as the last child token is revoked and the parent token no longer has any child tokens, there shouldn't be an entry for that parent token under the /sys/token/parent prefix path.

Can you provide reproducible steps for this error?

@SoMuchToGrok
Copy link

SoMuchToGrok commented Mar 26, 2018

@rhuddleston When Vault gets stuck in this loop, can you successfully shutdown Vault by sending a SIGINT? I'm experiencing a [possibly] similar scenario but I'm unable to shutdown without forcefully killing it.

@calvn
Copy link
Contributor

calvn commented Mar 27, 2018

@SoMuchToGrok @rhuddleston can you try out a build off master? Once built and running, you'd have to run the tidy operation to clean up any dangling parent prefixes. Let us know if that fixes your issue!

@SoMuchToGrok
Copy link

SoMuchToGrok commented Mar 27, 2018

Thanks for the assistance @calvn. Just deployed master and ran the tidy, but unfortunately no luck. Still seeing the loop in the consul monitor.

Vault v0.9.6 ('5e3930cc9d38b2ec0412f46da9a4845448979aac')

2018/03/27 19:27:16.894298 [INFO ] token: beginning tidy operation on tokens
2018/03/27 19:27:17.088180 [INFO ] token: checking validity of tokens in secondary index list: progress=500
2018/03/27 19:27:18.540003 [INFO ] token: checking validity of tokens in secondary index list: progress=1000
2018/03/27 19:27:18.600900 [INFO ] token: checking validity of tokens in secondary index list: progress=1500
2018/03/27 19:27:19.170712 [INFO ] token: checking validity of tokens in secondary index list: progress=2000
2018/03/27 19:27:19.799802 [INFO ] token: checking if accessors contain valid tokens: progress=500
2018/03/27 19:27:19.821041 [INFO ] token: checking if accessors contain valid tokens: progress=1000
2018/03/27 19:27:19.842249 [INFO ] token: checking if accessors contain valid tokens: progress=1500
2018/03/27 19:27:19.863051 [INFO ] token: checking if accessors contain valid tokens: progress=2000
2018/03/27 19:27:19.879839 [DEBUG] token: number of entries scanned in parent prefix: count=152
2018/03/27 19:27:19.879861 [DEBUG] token: number of entries deleted in parent prefix: count=0
2018/03/27 19:27:19.879882 [DEBUG] token: number of tokens scanned in parent index list: count=2268
2018/03/27 19:27:19.879892 [DEBUG] token: number of tokens revoked in parent index list: count=0
2018/03/27 19:27:19.879902 [DEBUG] token: number of accessors scanned: count=2407
2018/03/27 19:27:19.879911 [DEBUG] token: number of deleted accessors which had empty tokens: count=0
2018/03/27 19:27:19.879921 [DEBUG] token: number of revoked tokens which were invalid but present in accessors: count=0
2018/03/27 19:27:19.879931 [DEBUG] token: number of deleted accessors which had invalid tokens: count=0
2018/03/27 19:27:19.879941 [INFO ] token: finished tidy operation on tokens

Consul monitor output

2018/03/27 19:28:30 [DEBUG] http: Request GET /v1/kv/vault/sys/token/parent/098e2b0af50e1dd46cdb3ed4ba03a0d869c93f7f/?keys=&separator=%2F (1.434456ms) from=127.0.0.1:38450
2018/03/27 19:28:30 [DEBUG] http: Request GET /v1/kv/vault/sys/token/parent/dc067819860c1dd23f458b418fcd7aed0b1a697e/?keys=&separator=%2F (1.457045ms) from=127.0.0.1:38452
2018/03/27 19:28:30 [DEBUG] http: Request GET /v1/kv/vault/sys/token/parent/3dd19860320685b7b34e99ac8cf8db8b2c83081c/?keys=&separator=%2F (1.47112ms) from=127.0.0.1:38450
2018/03/27 19:28:30 [DEBUG] http: Request GET /v1/kv/vault/sys/token/parent/61da133ade358ec06ae6e6f0152a57235574877f/?keys=&separator=%2F (1.434544ms) from=127.0.0.1:38452
2018/03/27 19:28:30 [DEBUG] http: Request GET /v1/kv/vault/sys/token/parent/421cfdbc380512d04780ff506960cfd1743a3759/?keys=&separator=%2F (1.476818ms) from=127.0.0.1:38450
2018/03/27 19:28:30 [DEBUG] http: Request GET /v1/kv/vault/sys/token/parent/098e2b0af50e1dd46cdb3ed4ba03a0d869c93f7f/?keys=&separator=%2F (1.360794ms) from=127.0.0.1:38452
2018/03/27 19:28:30 [DEBUG] http: Request GET /v1/kv/vault/sys/token/parent/4d54cc468ba8b7089c51dee15dc5693209269c51/?keys=&separator=%2F (1.392901ms) from=127.0.0.1:38450
2018/03/27 19:28:30 [DEBUG] http: Request GET /v1/kv/vault/sys/token/parent/3dd19860320685b7b34e99ac8cf8db8b2c83081c/?keys=&separator=%2F (1.405663ms) from=127.0.0.1:38452
2018/03/27 19:28:30 [DEBUG] http: Request GET /v1/kv/vault/sys/token/parent/508c12f556d19093e9167d5dcacda376d10cc5d9/?keys=&separator=%2F (1.452978ms) from=127.0.0.1:38450
2018/03/27 19:28:30 [DEBUG] http: Request GET /v1/kv/vault/sys/token/parent/421cfdbc380512d04780ff506960cfd1743a3759/?keys=&separator=%2F (1.411853ms) from=127.0.0.1:38452
2018/03/27 19:28:30 [DEBUG] http: Request GET /v1/kv/vault/sys/token/parent/560883d56509e907837d14d9c12f74921c7e6624/?keys=&separator=%2F (1.398985ms) from=127.0.0.1:38450
2018/03/27 19:28:30 [DEBUG] http: Request GET /v1/kv/vault/sys/token/parent/4d54cc468ba8b7089c51dee15dc5693209269c51/?keys=&separator=%2F (1.410343ms) from=127.0.0.1:38452
2018/03/27 19:28:30 [DEBUG] http: Request GET /v1/kv/vault/sys/token/parent/6bacbce05311553b655a29141094fe352afa5427/?keys=&separator=%2F (1.439407ms) from=127.0.0.1:38450

@calvn
Copy link
Contributor

calvn commented Mar 27, 2018

Thanks for the info! I think I might know what's going on, but I'll to need to verify some things on my end. Hang tight in the meantime!

@calvn
Copy link
Contributor

calvn commented Mar 27, 2018

I thought I had something going on, but upon further inspection it seems that the issue is not likely where I'm looking.

@calvn
Copy link
Contributor

calvn commented Mar 28, 2018

@rhuddleston let me know if a build and tidy from the latest master did the trick for you. These could be two different issues that we're bumping into.

@rhuddleston
Copy link
Author

A bit of an update, I had this issue again. Went ahead and upgraded to v0.10.1 after the upgrade it was still stuck in the loop. Went ahead and ran tidy. The first run complained about consistency issues but the second went through and the looping stopped. I'll let you know if this issue comes up again now that were on the new version.

@calvn
Copy link
Contributor

calvn commented May 10, 2018

@rhuddleston we're still finalizing some things on an open PR, but we've got confirmation from some folks on issue #4143 that PR #4512 has resolved looping issues similar to the one you're experiencing.

@rhuddleston
Copy link
Author

This just happened again on 0.10.1 and tidy did not clean it up. I had to manually delete a bunch of keys until the looping stopped. Any chance you can cut 0.10.2 soon?

@calvn
Copy link
Contributor

calvn commented May 11, 2018

Would it be possible for you build a Vault binary based off master and test using that?

@rhuddleston
Copy link
Author

I upgraded to 0.10.2 and put it on the clusters that keep having this issue. After upgrading and running tidy it fixed the immediate issue but the problem reoccured later and started looping over a couple values:

1062 /v1/kv/vault/sys/token/parent/69cb89543ac2aafb16dd31be26f1be561948b977/?keys=&separator=%2F
1061 /v1/kv/vault/sys/token/parent/9b3d89943dfbaa68aaebf02587de19423a7c15a4/?keys=&separator=%2F

so in a few seconds it hit each of the above over 1000 times when I straced it.

The /v1/kv/vault/sys/token/parent/9b3d89943dfbaa68aaebf02587de19423a7c15a4 folder exitsted and had one value under it of 69cb89543ac2aafb16dd31be26f1be561948b977. Both exited under /v1/kv/vault/sys/token/id/ . Deleting the looping keys stops the problem then returns for a diferent key later

@calvn
Copy link
Contributor

calvn commented Jun 12, 2018

Would it be possible for you to enable the sys/raw endpoint and dump the values under sys/token/id/<uuid> for both of those tokens (or the set of tokens that have been called excessively)? I'd be helpful to inspect the actual content under both of those entries.

You can read the entries with the following command: vault read sys/raw/sys/token/id/<uuid>. Please be sure to obfuscate any sensitive values such as the id and accessor or invalidate the token before sharing.

@holtwilkins
Copy link

Hey @calvn this happened for us as well. I grabbed a raw output, but not sure how much there is to get from it. Looks like it was a Nomad server token. It was also last renewed in January it looks like?

 ~ $ vault read sys/raw/sys/token/id/54a8da9834efafe58f92eb6f13edfec6577a4b71
Key      Value
---      -----
value    {"id":"<REDACTED>","accessor":"166e1832-86a2-c310-9884-3311446e37c2","parent":"","policies":["aws-nomad-server","default"],"path":"auth/aws/login","meta":{"account_id":"<REDACTED>","auth_type":"iam","canonical_arn":"arn:aws:iam::<REDACTED>:role/palantirtest-server-role","client_arn":"arn:aws:sts::<REDACTED>:assumed-role/palantirtest-server-role/i-08560a46f8f58cf64","client_user_id":"AROAICTGNJWK4CTY465HY","inferred_aws_region":"us-east-1","inferred_entity_id":"i-08560a46f8f58cf64","inferred_entity_type":"ec2_instance"},"display_name":"aws-palantirtest-server-role","num_uses":0,"creation_time":1513207829,"ttl":259200000000000,"explicit_max_ttl":0,"role":"","period":0,"DisplayName":"","NumUses":0,"CreationTime":0,"ExplicitMaxTTL":0}
 ~ $ vault token lookup -accessor 166e1832-86a2-c310-9884-3311446e37c2
Key                  Value
---                  -----
accessor             166e1832-86a2-c310-9884-3311446e37c2
creation_time        1513207829
creation_ttl         72h
display_name         aws-palantirtest-server-role
entity_id            n/a
expire_time          2018-01-12T23:30:54.148423355Z
explicit_max_ttl     0s
id                   n/a
issue_time           2017-12-13T23:30:29.212998746Z
last_renewal         2018-01-09T23:30:54.148423478Z
last_renewal_time    1515540654
meta                 map[canonical_arn:arn:aws:iam::<REDACTED>:role/palantirtest-server-role client_arn:arn:aws:sts::<REDACTED>:assumed-role/palantirtest-server-role/i-08560a46f8f58cf64 client_user_id:AROAICTGNJWK4CTY465HY inferred_aws_region:us-east-1 inferred_entity_id:i-08560a46f8f58cf64 inferred_entity_type:ec2_instance account_id:<REDACTED> auth_type:iam]
num_uses             0
orphan               true
path                 auth/aws/login
policies             [aws-nomad-server default]
renewable            false
ttl                  -5760h31m

@jefferai
Copy link
Member

@holtwilkins what version?

@holtwilkins
Copy link

Vault v0.10.3 ('c69ae68faf2bf7fc1d78e3ec62655696a07454c7')

@rhuddleston
Copy link
Author

rhuddleston commented Oct 5, 2018

@calvn

# /opt/vault/vault read sys/raw/sys/token/id/0f6a164785eeb3181dcc0ecec27847c12dadef0f
Key      Value
---      -----
value    {"id":"11891d19-4ee1-4658-9dbc-33adea2a4d17","accessor":"247742cf-52b4-4be9-ac4c-d3ada53ddf21","parent":"dc5c634c-c9ea-a8ff-dc17-28d7425db007","policies":["default","secret_reader"],"path":"auth/token/create","meta":null,"display_name":"token","num_uses":-1,"creation_time":1538722862,"ttl":60000000000,"explicit_max_ttl":0,"role":"","period":0,"DisplayName":"","NumUses":0,"CreationTime":0,"ExplicitMaxTTL":0,"entity_id":"","bound_cidrs":null}

# /opt/vault/vault read sys/raw/sys/token/id/477323fe7d01e84d70c263d21fd75069bba628e1
Key      Value
---      -----
value    {"id":"dc5c634c-c9ea-a8ff-dc17-28d7425db007","accessor":"1b868147-0619-d9af-b8ea-9d3536be57a2","parent":"","policies":["default","secret_reader","token_create"],"path":"auth/token/create-orphan","meta":null,"display_name":"token","num_uses":0,"creation_time":1538679858,"ttl":900000000000,"explicit_max_ttl":0,"role":"","period":0,"DisplayName":"","NumUses":0,"CreationTime":0,"ExplicitMaxTTL":0,"entity_id":"","bound_cidrs":null}```

The loop looked like it hit the below over 12k times over a small period when using strace:

  12845 "GET /v1/kv/vault/sys/token/parent/477323fe7d01e84d70c263d21fd75069bba628e1/?keys=&separator=%2F
  12846 "GET /v1/kv/vault/sys/token/parent/0f6a164785eeb3181dcc0ecec27847c12dadef0f/?keys=&separator=%2F

this was on 0.10.3 let me know if you need anything else

@rhuddleston
Copy link
Author

I keep having to manually fix these CPU loops, I can upgrade but don't want to unless we think there was something fixed to stop it

@calvn
Copy link
Contributor

calvn commented Nov 17, 2018

Can you list steps to repro the issue (perhaps the way the tokens got created)? In your log bits above, I see that 477323fe7d01e84d70c263d21fd75069bba628e1 is an orphan token, but the strace shows that a Consul KV GET request on 477323fe7d01e84d70c263d21fd75069bba628e1 is immediately followed by a request on 0f6a164785eeb3181dcc0ecec27847c12dadef0f .

Can you post the result from a vault list sys/raw/sys/token/parent/<token> of the two looping tokens in question (in addition to vault read sys/raw/sys/token/id/<token> if those tokens have changed since your last log post to know which is the orphan token). You can redact the token ID and accessors if those are still valid tokens.

@calvn
Copy link
Contributor

calvn commented Nov 17, 2018

I see that 477323fe7d01e84d70c263d21fd75069bba628e1 is the parent of 0f6a164785eeb3181dcc0ecec27847c12dadef0f, do you know whether it was the revocation of the parent or the child that triggered this loop?

@calvn
Copy link
Contributor

calvn commented Nov 17, 2018

The issue might have been fixed in #5364, would you be willing to give v0.11.2 or above a try?

@catsby catsby added version/0.9.x bug Used to indicate a potential bug core/token labels Nov 8, 2019
@jefferai
Copy link
Member

jefferai commented Nov 9, 2019

No response after a year, closing.

@jefferai jefferai closed this as completed Nov 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to indicate a potential bug core/token
Projects
None yet
Development

No branches or pull requests

6 participants