-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleaning of stale leases #2452
Cleaning of stale leases #2452
Conversation
I'm not sure what else was earmarked to be done here; one definite comment is there needs to be a guard so that you don't run the tidy operation if it's already running. |
vault/expiration.go
Outdated
|
||
lock := locksutil.LockForKey(m.tokenStore.tokenLocks, le.ClientToken) | ||
lock.RLock() | ||
defer lock.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be given up after the lookupSalted call or if we are trying to revoke a token it'll deadlock.
vault/expiration.go
Outdated
func (m *ExpirationManager) Tidy() error { | ||
var tidyErrors *multierror.Error | ||
|
||
tidyFunc := func(leaseID string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some progress logging so people can be aware things are still happening.
vault/expiration.go
Outdated
|
||
tidyFunc := func(leaseID string) { | ||
le, err := m.loadEntry(leaseID) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth adding a memoizing cache (a map would work, as a set) so that if we already know a token does still exist we don't have to look it up again.
vault/expiration.go
Outdated
return | ||
} | ||
|
||
if le.ClientToken == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we should probably revoke the lease as well.
vault/expiration.go
Outdated
tokenCache := make(map[string]string) | ||
|
||
tidyFunc := func(leaseID string) { | ||
m.logger.Trace("expiration: checking if lease is valid", "lease_id", leaseID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely to be way too verbose, since this will print out for every single lease -- imagine if you have 200k of them. What would be better is a log line before this happens, then a running count, e.g. if the current count is divisible evenly by 500, print out an update line.
vault/expiration.go
Outdated
var tidyErrors *multierror.Error | ||
|
||
// Create a cache to keep track of looked up tokens | ||
tokenCache := make(map[string]string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to store all tokens you have seen; you only need to store a list of valid tokens so that you don't check for validity again. So treat it as a set via map[string]struct{}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for doing this way is that, if we looked up a token and if that had turned out to be invalid, we need that information too. In comparison to the valid tokens, I assume that the number of invalid ones will be fairly less in number. While cleaning up, it is likely that we hit the same invalid token more than once. Caching information about a looked up invalid token might be of help.
I am okay to do the suggested changes. Just sharing my thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hit for an invalid entry isn't as high, because there won't be any value to decode/de-serialize. But memoizing both statues is fine -- however, you don't need a string, you just need a bool. Presence of the entry in the map means it's been seen; the bool indicates whether it was valid or not.
vault/expiration.go
Outdated
revokeLease = true | ||
} | ||
|
||
switch tokenCache[le.ClientToken] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't do this; simply do an if _, ok := validTokenCache; ok { return
vault/token_store.go
Outdated
@@ -102,14 +107,11 @@ func NewTokenStore(c *Core, config *logical.BackendConfig) (*TokenStore, error) | |||
t := &TokenStore{ | |||
view: view, | |||
cubbyholeDestroyer: destroyCubbyhole, | |||
logger: c.logger, | |||
policyLookupFunc: c.policyStore.GetPolicy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By doing this here you are skipping the check for nil; IIRC this check is there on purpose for some test reasons.
vault/expiration.go
Outdated
@@ -124,7 +124,7 @@ func (m *ExpirationManager) Tidy() error { | |||
var tidyErrors *multierror.Error | |||
|
|||
// Create a cache to keep track of looked up tokens | |||
tokenCache := make(map[string]string) | |||
validTokenCache := make(map[string]struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make this a bool and cache discovery of both valid and invalid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Had seen your previous comment after making this struct{}
:-)
vault/expiration.go
Outdated
|
||
// If no errors were encountered, return a normal error instead of a | ||
// multierror | ||
if tidyErrors == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than this, just use https://godoc.org/github.com/hashicorp/go-multierror#Error.ErrorOrNil !
vault/expiration.go
Outdated
tidyFunc := func(leaseID string) { | ||
i++ | ||
if i%500 == 0 { | ||
m.logger.Debug("expiration: tidying of leases", "progress", i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/tidying of leases/tidying leases/
vault/token_store.go
Outdated
// with the token's salt ID at the end, remove it | ||
for _, parent := range parentList { | ||
children, err := ts.view.List(parentPrefix + parent + "/") | ||
if atomic.CompareAndSwapInt64(&ts.tidyLock, 0, 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wrap the entire function in an if as opposed to having a block that checks at the beginning whether it's running and simply returns if so (as it does in the expiration manager tidy function)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. In expiration manager, the whole thing is in fact happening inside in if
condition. Its just that the tidy tasks are all inside a func.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I was going to ask you to move that check in the expiration manager to the top of that function but then decided to leave it alone :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it consistent with the changes in the token store, I just did that. Also, added debug statements that indicate that the operation is complete.
1) Ensure that if we fail to generate a lease for a secret we attempt to revoke it 2) Ensure that any lease that is registered should never have a blank token In theory, number 2 will let us a) find places where this *is* the case, and b) if errors are encountered when revoking tokens due to a blank client token, it suggests that the client token values are being stripped somewhere along the way, which is also instructive.
vault/expiration.go
Outdated
if retErr != nil { | ||
revResp, err := m.router.Route(logical.RevokeRequest(req.Path, resp.Secret, resp.Data)) | ||
if err != nil { | ||
retErr = multierror.Append(retErr, errwrap.Wrapf("an additional error was encountered revoking the newly-generated secret: {{err}}", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we distinguish the 4 error messages here so that if we hit them, we can know where exactly the error was caused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
vault/expiration.go
Outdated
}() | ||
|
||
if req.ClientToken == "" { | ||
return "", fmt.Errorf("expiration: cannot register a lease with an empty client token") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move the defer function to be after this check? There is no point in revoking a lease which didn't even get through the input validations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two optional comments, LGTM!
vault/expiration.go
Outdated
m.logger.Debug("expiration: revoking lease which holds an invalid token", "lease_id", leaseID) | ||
revokeLease = true | ||
deletedCountInvalidToken++ | ||
tokenCache[le.ClientToken] = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary, but I think you might as well add a goto REVOKE_CHECK
statement here incase more branches are added in the future. Makes it slightly more extendable.
vault/expiration.go
Outdated
} else { | ||
m.logger.Debug("expiration: revoking lease which contains an invalid token", "lease_id", leaseID) | ||
revokeLease = true | ||
deletedCountInvalidToken++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here.
Hi Everyone, |
No description provided.