Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consul-template "vault_agent_token_file" should support wrapped format from vault agent #1498

Closed
drawks opened this issue Aug 4, 2021 · 39 comments · Fixed by #1645
Closed
Labels
bug hashicat-update-required Changes that need to be ported to hashicat vault Related to the Vault integration
Milestone

Comments

@drawks
Copy link

drawks commented Aug 4, 2021

Configuration

vault-agent.hcl

vault {
  address = "https://vault.example.com:8200"
  client_key = "client.key"
  client_cert = "client.pem"
}

auto_auth {
  method "cert" {
    wrap_ttl = "5m"
  }
  sink "file" {
    config = {
      path = "/tmp/vault.token.wrapped"
    }
  }
}

consul-template.hcl

log_level = "debug"
vault {
  address = "https://vault.example.com:8200"
  unwrap_token = true
  vault_agent_token_file = "/tmp/vault.token.wrapped"
  renew_token = true
  ssl = {
    key = "client.key"
    cert = "client.pem"
  }
}

Command

 vault agent -exit-after-auth -config=vault-agent.hcl &&  sudo consul-template -config=consul-template.hcl

Debug output

2021-08-04T16:06:08.012Z [INFO]  v0.26.0 (3b7f233a)
2021-08-04T16:06:08.012Z [INFO] (runner) creating new runner (dry: false, once: false)
2021-08-04T16:06:08.012Z [DEBUG] (runner) final config: {"Consul":{"Address":"","Namespace":"","Auth":{"Enabled":false,"Username":"","Password":""},"Retry":{"Attempts":12,"Backoff":250000000,"MaxBackoff":60000000000,"Enabled":true},"SSL":{"CaCert":"","CaPath":"","Cert":"","Enabled":false,"Key":"","ServerName":"","Verify":true},"Token":"","Transport":{"DialKeepAlive":30000000000,"DialTimeout":30000000000,"DisableKeepAlives":false,"IdleConnTimeout":90000000000,"MaxIdleConns":100,"MaxIdleConnsPerHost":81,"TLSHandshakeTimeout":10000000000}},"Dedup":{"Enabled":false,"MaxStale":2000000000,"Prefix":"consul-template/dedup/","TTL":15000000000,"BlockQueryWaitTime":60000000000},"DefaultDelims":{"Left":null,"Right":null},"Exec":{"Command":"","Enabled":false,"Env":{"Denylist":[],"Custom":[],"Pristine":false,"Allowlist":[]},"KillSignal":2,"KillTimeout":30000000000,"ReloadSignal":null,"Splay":0,"Timeout":0},"KillSignal":2,"LogLevel":"trace","MaxStale":2000000000,"PidFile":"","ReloadSignal":1,"Syslog":{"Enabled":false,"Facility":"LOCAL0","Name":""},"Templates":[],"Vault":{"Address":"https://vault.example.com:8200","Enabled":true,"Namespace":"","RenewToken":true,"Retry":{"Attempts":12,"Backoff":250000000,"MaxBackoff":60000000000,"Enabled":true},"SSL":{"CaCert":"","CaPath":"","Cert":"client.pem","Enabled":true,"Key":"client.key","ServerName":"","Verify":true},"Transport":{"DialKeepAlive":30000000000,"DialTimeout":30000000000,"DisableKeepAlives":false,"IdleConnTimeout":90000000000,"MaxIdleConns":100,"MaxIdleConnsPerHost":81,"TLSHandshakeTimeout":10000000000},"UnwrapToken":true,"DefaultLeaseDuration":300000000000},"Wait":{"Enabled":false,"Min":0,"Max":0},"Once":false,"BlockQueryWaitTime":60000000000}
2021-08-04T16:06:08.337Z [ERR] (cli) runner: runner: client set: vault unwrap: Error making API request.

URL: PUT https://vault.example.com:8200/v1/sys/wrapping/unwrap
Code: 400. Errors:

* error validating wrapping token: wrapping token could not be parsed: square/go-jose: missing payload in JWS message

The contents of the sink file from vault agent, showing that wrap_ttl results in a json blob and not just a bare wrapping token:

{"token":"s.fakeQP7gpMIamri3Kjv1Pfsz","accessor":"MlDq0BIcrS3ALyOlORafakeI","ttl":300,"creation_time":"2021-08-04T16:06:07.253104718Z","creation_path":"auth/cert/login","wrapped_accessor":"SncPS99fake6l6dJSAZmNijt"}

Expected behavior

consul-template should recognize the json format of the vault agent's sink file when using wrap_ttl and extract the token field and then proceed as usual.

Actual behavior

consul-template slurps the entire contents of the sink file and sends it to the unwrap endpoint as a wrapping token.

Steps to reproduce

  1. configure vault agent to login via auto-auth and write to a file sink with wrap_ttl set
  2. configure consul-template with vault_agent_token_file set to the same file along with renew_toke = true
  3. observe that consul cannot unwrap the token

References

Are there any other GitHub issues (open or closed) that should
be linked here? For example:

@eikenb eikenb added the bug label Aug 11, 2021
@eikenb eikenb added this to the v0.28.0 milestone Feb 17, 2022
@eikenb eikenb modified the milestones: v0.28.0, v0.29.0 Mar 3, 2022
@eikenb eikenb added the vault Related to the Vault integration label Apr 8, 2022
@eikenb
Copy link
Contributor

eikenb commented Apr 8, 2022

I'm working on reproducing this using vault agent but my vault skills need some work first. I haven't messed with Vault much directly and can't get the cert client auth to work.. so I'm going to pause this until I have a better understanding of Vault as there are other bugs that need my attention.

If anyone knows a way to reproduce this with consul-template itself, I could probably make faster progress. Not saying I won't get back to it, but I'm planning on a release soon and don't want to spin my wheels to long.

@eikenb eikenb removed this from the v0.29.0 milestone Apr 8, 2022
@eikenb eikenb added this to the v0.30.0 milestone Jul 8, 2022
@eikenb
Copy link
Contributor

eikenb commented Aug 31, 2022

Hey @drawks, sorry for the long delay getting to this.

I still haven't had time to get a test setup for this working but I'm going to take a stab at it anyways. Based on the issue filed I should be able to check to see if the token is embedded in a json object as described and pull the token value from it. I'll go strictly off documentation and see how it goes.

@drawks
Copy link
Author

drawks commented Aug 31, 2022

Hey @drawks, sorry for the long delay getting to this.

No apology necessary! Thanks for your time/attention. I've been working on other things, but can shift focus back to this as soon as you have something testable.

@eikenb eikenb added the hashicat-update-required Changes that need to be ported to hashicat label Aug 31, 2022
@eikenb
Copy link
Contributor

eikenb commented Aug 31, 2022

Hey @drawks,

I have the PR up with a potential fix. See #1636. Seems pretty simple and it should just work. If you could test it I'd super appreciate it and if I can do anything to help facilitate that please let me know.

Thanks!

@drawks
Copy link
Author

drawks commented Sep 1, 2022

I'll see if I can rig up a test using docker or something today so that I can confirm/deny that the fix is good AND also provide a re-usable test case

@eikenb
Copy link
Contributor

eikenb commented Sep 1, 2022

I'll see if I can rig up a test using docker or something today so that I can confirm/deny that the fix is good AND also provide a re-usable test case

That would be incredibly awesome @drawks. Thanks!

@drawks
Copy link
Author

drawks commented Sep 1, 2022

I'm running into a few issues while testing, but have run out of time for the week. Will return to this on Tuesday. The tldr so far is that your change does result in the wrapped token being read from disk and unwrapped. However subsequent template actions against vault are not being made with the unwrapped token as far as I can tell.

Using the hmac hashes in vaults audit log I can see the unwrap happen successfully and inserting a longline into the code I can see that the unwrapped token is captured in CreateVaultClient, but once we are attempting to render the template I'm seeing the client_token in the associated audit log doesn't match the hash returned from the unwrap.

@eikenb
Copy link
Contributor

eikenb commented Sep 1, 2022

Thanks for the feedback @drawks!

I'll look over the code again with the issue you found in mind, see if I can find any reason for it.

Thanks again for your help on this. Very much appreciated.

@drawks
Copy link
Author

drawks commented Sep 1, 2022

As far as I can piece together the timeline from audit logs:

  1. unwrap request with token A
  2. unwrap response with token B
  3. vault reports token A expired
  4. renew-self request with token B
  5. renew-self response with token B
  6. read request for /sys/internal/ui/mounts.... (to determine kv engine version) with token C
  7. read response with token C failed
  8. read request for template secret with token C
  9. read response with token C failed

I'd just include the logs, but they are quite verbose and all the values are hmac hashes.
The extra log line I inserted is inside this block and shows that the value of token is the unwrapped vault token and the hmac of this token matches the logline references to "Token B"

@eikenb
Copy link
Contributor

eikenb commented Sep 1, 2022

Sorry to ask.. but where does token C come from? Was there another round of expire/renew in there?

@drawks
Copy link
Author

drawks commented Sep 1, 2022

That's exactly the problem, it seemingly comes from nowhere. I only have the hash of it, but it matches neither the wrapped (A) not unwrapped (B) token hashes

@eikenb
Copy link
Contributor

eikenb commented Sep 1, 2022

I'm thinking it might have to do with the client.SetToken() call I moved from above the i.UnwrapToken block to below it. The unwrap API supports setting the token as the client token and using that with unwrap and maybe not setting it first causes this.

I'm going to re-work the PR to set the token before the unwrap call again. I'll push it up so you can test it when you get a chance.

I'll dig a bit more to see if I can come up with any other possibilities.

@eikenb
Copy link
Contributor

eikenb commented Sep 2, 2022

Was finally able to replicate the original issue using the approle auto-auth method. I've verified my fix works parsing wise. I'm still poking around to see if I can figure out what's up with the extra token.

@eikenb
Copy link
Contributor

eikenb commented Sep 2, 2022

I'm going to re-work the PR to set the token before the unwrap call again. I'll push it up so you can test it when you get a chance.

Just wanted to note that I've done this.

@drawks
Copy link
Author

drawks commented Sep 6, 2022

Ok, picked this back up today and Put together the following log which includes addional loglines where unwrap, set and reads occur in consul-template.

https://gist.github.com/drawks/44dfb9437bd27f5b09d9064e1c7a95b3

You can see at lines 28-30 that the wrapped token file is read from disc correctly, the wrapping token is read and then the unwrap operation occurs followed by a setToken to the correct unwrapped token. I make an additional call to the client's Token() method to confirm that the unwrapped token has been set to the client properly. Then at lines 45 and 66 you can see that the Token() method called on the actual client used in the read operations contains the raw json from the wrapped token file on disk still.

So I think your code changes to read and unwrap the token are correct, but we've got some issue where we're updating the incorrect client object.

p.s. I also changed my test setup to use approle authentication so that it matches your test setup; just in case we ended up in two different corners.

p.p.s. just so I don't get dire warnings from others reading this, the attached log is the result of a completely ephemeral vault instance setup in a vm. There are "secrets" in the logs, but they are of zero value and not used in any production application nor are the token being used for securing anything.

@eikenb
Copy link
Contributor

eikenb commented Sep 6, 2022

Thanks for digging in @drawks, this is a big help.

Just to be sure... were those new tests with the updated version I pushed on Friday? Thanks.

@drawks
Copy link
Author

drawks commented Sep 6, 2022

Yes, I added my loglines ontop of your latest commit in the branch

@drawks
Copy link
Author

drawks commented Sep 6, 2022

Aha! disabling the watcher against the VaultAgentTokenFile here results in everything working as expected. I think what is happening is that we make it through the first read/unwrap/renew loop and then immediately fire off from this watcher which causes the contents to be read into the Token of the vault client again, but without unwrapping.

I /think/ the proper behavior would be.
A. Either compare the hash of the file contents and trigger the watch on file changing OR have consul-template delete the file on read, so that the watcher doesn't trigger until/unless the file is recreated or written to.
B. The watcher on the token file should use the same unwrap logic as when it is read the first time on start up.

edit: to be clear, there should never be an attempt to unwrap the wrapping token more than once. I see both the case with the watcher could be a problem as well as the case where consul-template receives a HUP. I /think/ in both cases it makes sense to skip the unwrap logic IF the wrapping token has been seen before.

@eikenb
Copy link
Contributor

eikenb commented Sep 6, 2022

Great find! I was on my way as I was in the process of tracking down every SetToken call but hadn't made it to that one yet. I'll rework things to have it use the wrap_ttl check as well and push that in a few.

Thank you!!

@drawks
Copy link
Author

drawks commented Sep 6, 2022

I think the easiest way to prevent attempts at re-unwrapping the token would be to save the WrappedAccessor value from the wrapped token file as an attribute of the vaultClient struct when an unwrap has successfully happened. Then subsequent trips into the unwrapping handling code could just short circuit if WrappedAccessor matches the one on vaultClient. The documentation says that a TTL wrapped response containing a auth token will have WrappedAccessor set to the Accessor of the wrapped token. That way even if the wrapped token file were to change, but still contain the same wrapped token we wouldn't bother reading it since we could know beforehand that we already have the token which would be returned.

@eikenb
Copy link
Contributor

eikenb commented Sep 6, 2022

I went simple with my change in CT and just added the unwrapTTL call to the VaultAgentTokenQuery. I'm going to give it more thought when I port it to hashicat as I wanted to rework the watcher using a dependency internally like that anyways as it has always been sort of confusing. I'll think about your suggestion when I handle this with the next round of hashicat-update-required porting.

Thanks again. Pushing the update version now....

@eikenb
Copy link
Contributor

eikenb commented Sep 6, 2022

Wait. That won't work as the token in the file is wrapped and the VaultAgentTokenQuery doesn't unwrap it either.

I'm going to refactor a bit to add common SetToken wrapper that handles the wrap_ttl json as well as unwrapping the token if necessary before calling the SetToken.

@eikenb
Copy link
Contributor

eikenb commented Sep 7, 2022

Another refactoring done.

@drawks
Copy link
Author

drawks commented Sep 7, 2022

Your latest commit isn't /quite/ right, since it will still result in the watcher attempting to unwrap the wrapped token, but the unwrap will fail because we've already unwrapped it once.

2022-09-07T16:48:45.477Z [WARN] (view) vault-agent.token: client set: vault unwrap: Error making API request.

URL: PUT https://localhost:8200/v1/sys/wrapping/unwrap
Code: 400. Errors:

* wrapping token is not valid or does not exist (retry attempt 1 after "250ms")

As suggested earlier in this thread, we need either the wrapped token file to be deleted after read and/or consul-template to remember the WrappedAccessor and short circuit the unwrap logic if we know we've already got the token.

edit: I wouldn't mind taking a whack at getting this sorted, but was trying to avoid stepping on you as it seems you're actively updating your branch.

@eikenb
Copy link
Contributor

eikenb commented Sep 7, 2022

Your latest commit isn't /quite/ right, since it will still result in the watcher attempting to unwrap the wrapped token, but the unwrap will fail because we've already unwrapped it once.

Right. It unwraps it once in the client_set code on startup, then it does it again when starting up that VaultAgentTokenQuery dependency (which then just waits for file updates). I'm going to see if I can differentiate between the api returns for unwrapping already unwrapped tokens (docs imply this is possible). Basically I want to see if there is a way to do this without keeping track of the wrapper's state.

Please take a whack if you'd like. Communicating ideas about code in code works great. And if we like you're solution better we can use it. I appreciate the empathy but I enjoy the art of it and love to collaborate so no worries.

Thanks.

@eikenb
Copy link
Contributor

eikenb commented Sep 7, 2022

FYI... the docs were misleading and there is no differentiation. It always returns the same error whether provided with a already unwrapped token or garbage. Not surprising though.

Deleting the file seems like it could work but also seems like it could break things. Plus you can unwrap tokens from other sources than files.

At this point I'm down to 2 ideas.. keeping the state (as you suggested) or ignoring the unwraping errors in some cases (eg. having VaultAgentTokenQuery log instead of return unwrapping errors).

@eikenb
Copy link
Contributor

eikenb commented Sep 7, 2022

Thinking about this I'm wondering why not just always try to unwrap a secret, use the returned secret if given otherwise just go on with the token as if it is good (doesn't need unwrapping). Eliminate the option and just always have it check. The only downside I see are a few extra unwrap calls.

I'm asking about this internally in case it goes against some Vault recommended setup/use/etc.

@drawks
Copy link
Author

drawks commented Sep 8, 2022

Always sending the token to the unwrap API and then ignoring the error has the potential for causing a lot of audit log noise on the vault server.

@eikenb
Copy link
Contributor

eikenb commented Sep 8, 2022

In my experiences audit logs are pure noise that you need to sift/filter for anything interesting. But.. there is the option already and we could just keep that to eliminate the excess noise but change the unwrap logic to only log issues after that first unwrap which would still error (for early feedback). That should keep the audit noise to a minimum while eliminating the need to track the wrapping state of the token.

@eikenb
Copy link
Contributor

eikenb commented Sep 8, 2022

It'd be pretty easy to eliminate the additional check at startup, so it doesn't always try twice. Only need to have the VaultAgentTokenQuery skip updating the token on first use (as client_set has already handled it).

@eikenb
Copy link
Contributor

eikenb commented Sep 8, 2022

Skipping the first call makes testing a PITA and doesn't really buy much. Not skipping yet.

@eikenb
Copy link
Contributor

eikenb commented Sep 8, 2022

The current PR keeps the unwrap option and only tries to unwrap if set. It will return an error (and exit) if the initial client_set token set call failes but will only logs the failures in the monitoring code (VaultAgentTokenQuery).

This should hit a decent spot where it won't need to keep state and it won't add a lot of audit noise (1 extra log entry at startup and 1 per renew if the token isn't wrapped after the first which I don't think should happen).

IMO the 1 extra at startup shouldn't be an issue and if it becomes one it can be addressed after the fact by skipping that first call and reworking the tests.

@drawks .. will that work for your use case?

@drawks
Copy link
Author

drawks commented Sep 8, 2022

Just tested your branch and I'm still seeing a failure

2022-09-08T20:59:10.689Z [INFO] (runner) creating watcher
2022-09-08T20:59:10.691Z [DEBUG] (watcher) adding vault.token
2022-09-08T20:59:10.691Z [DEBUG] (watcher) adding vault-agent.token
2022-09-08T20:59:10.694Z [INFO] (runner) starting
2022-09-08T20:59:10.694Z [DEBUG] (runner) running initial templates
2022-09-08T20:59:10.695Z [DEBUG] (runner) initiating run
2022-09-08T20:59:10.695Z [DEBUG] (runner) checking template 7f1423e412a5203b3fc7ee4113252fdb
2022-09-08T20:59:10.696Z [DEBUG] (runner) missing data for 1 dependencies
2022-09-08T20:59:10.697Z [DEBUG] (runner) missing dependency: vault.read(kv/foo)
2022-09-08T20:59:10.702Z [DEBUG] (runner) add used dependency vault.read(kv/foo) to missing since isLeader but do not have a watcher
2022-09-08T20:59:10.702Z [DEBUG] (runner) was not watching 1 dependencies
2022-09-08T20:59:10.703Z [DEBUG] (watcher) adding vault.read(kv/foo)
2022-09-08T20:59:10.703Z [DEBUG] (runner) diffing and updating dependencies
2022-09-08T20:59:10.704Z [DEBUG] (runner) watching 3 dependencies
2022-09-08T20:59:10.706Z [INFO] unwrap skipped (vault token not wrapped)
2022-09-08T20:59:10.707Z [WARN] vault.read(kv/foo): failed to check if kv/foo is KVv2, assume not: Error making API request.

I've not re-added any instrumentation, nor really looked over the code changes (an expensive context switch at the moment) but I suspect that we're still somehow setting the token an extra time. unwrap skipped (vault token not wrapped) sounds like we've read the token from the file and decided to apply it without unwrapping.

edit: sure enough, commenting out the watcher on the token file results in this going back to a working state

@eikenb
Copy link
Contributor

eikenb commented Sep 8, 2022

First let me say thanks again @drawks for all the help with this.

Weird. I am able to replicate this error and will look into it.

@eikenb
Copy link
Contributor

eikenb commented Sep 8, 2022

I see the problem. To scatter brained to notice the obvious, that it sets the token before trying to unwrap it so it is wrong the 2nd time through.

Bounced around too much with my thoughts on this. Going to take a step back and rethink a bit.

@eikenb
Copy link
Contributor

eikenb commented Sep 12, 2022

Hey @drawks. Just to keep you up to date I've switched from trying to slap a quick fix on this to thinking how I want this fixed in a more permanent way that makes sense for the hcat library as well (where I have an issue to rework this). I'm working though it now and will post a new PR when done.

@drawks
Copy link
Author

drawks commented Sep 12, 2022

Rock and roll! Thanks for working on it.

@eikenb
Copy link
Contributor

eikenb commented Sep 17, 2022

I pushed up the branch (as a backup) with my refactor if you are curious. It's not done yet as there are issues with the integration tests conflicting and the commit message is an obvious placeholder, but this is the basic idea.

https://github.com/hashicorp/consul-template/tree/vault-token-file-refactor

Only if you are curious... if not I'll finish it up next week and comment here when the PR is up.

Thanks.

@eikenb
Copy link
Contributor

eikenb commented Sep 22, 2022

Hey @drawks, I've finished and pushed up the PR for my vault token handling refactor/fix. If you get a chance to test it please let me know how it goes. Thanks!

#1645

@eikenb eikenb modified the milestones: v0.30.0, v0.29.3 Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug hashicat-update-required Changes that need to be ported to hashicat vault Related to the Vault integration
Projects
None yet
2 participants