Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning of stale leases #2452

Merged
merged 47 commits into from
May 5, 2017
Merged

Cleaning of stale leases #2452

merged 47 commits into from
May 5, 2017

Conversation

vishalnayak
Copy link
Member

@vishalnayak vishalnayak commented Mar 7, 2017

No description provided.

@jefferai
Copy link
Member

jefferai commented Apr 4, 2017

I'm not sure what else was earmarked to be done here; one definite comment is there needs to be a guard so that you don't run the tidy operation if it's already running.


lock := locksutil.LockForKey(m.tokenStore.tokenLocks, le.ClientToken)
lock.RLock()
defer lock.RUnlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be given up after the lookupSalted call or if we are trying to revoke a token it'll deadlock.

func (m *ExpirationManager) Tidy() error {
var tidyErrors *multierror.Error

tidyFunc := func(leaseID string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some progress logging so people can be aware things are still happening.


tidyFunc := func(leaseID string) {
le, err := m.loadEntry(leaseID)
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth adding a memoizing cache (a map would work, as a set) so that if we already know a token does still exist we don't have to look it up again.

return
}

if le.ClientToken == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we should probably revoke the lease as well.

@vishalnayak vishalnayak changed the title [WIP] Cleaning of stale leases Cleaning of stale leases Apr 27, 2017
tokenCache := make(map[string]string)

tidyFunc := func(leaseID string) {
m.logger.Trace("expiration: checking if lease is valid", "lease_id", leaseID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely to be way too verbose, since this will print out for every single lease -- imagine if you have 200k of them. What would be better is a log line before this happens, then a running count, e.g. if the current count is divisible evenly by 500, print out an update line.

var tidyErrors *multierror.Error

// Create a cache to keep track of looked up tokens
tokenCache := make(map[string]string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to store all tokens you have seen; you only need to store a list of valid tokens so that you don't check for validity again. So treat it as a set via map[string]struct{}.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for doing this way is that, if we looked up a token and if that had turned out to be invalid, we need that information too. In comparison to the valid tokens, I assume that the number of invalid ones will be fairly less in number. While cleaning up, it is likely that we hit the same invalid token more than once. Caching information about a looked up invalid token might be of help.

I am okay to do the suggested changes. Just sharing my thoughts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hit for an invalid entry isn't as high, because there won't be any value to decode/de-serialize. But memoizing both statues is fine -- however, you don't need a string, you just need a bool. Presence of the entry in the map means it's been seen; the bool indicates whether it was valid or not.

revokeLease = true
}

switch tokenCache[le.ClientToken] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't do this; simply do an if _, ok := validTokenCache; ok { return

@@ -102,14 +107,11 @@ func NewTokenStore(c *Core, config *logical.BackendConfig) (*TokenStore, error)
t := &TokenStore{
view: view,
cubbyholeDestroyer: destroyCubbyhole,
logger: c.logger,
policyLookupFunc: c.policyStore.GetPolicy,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By doing this here you are skipping the check for nil; IIRC this check is there on purpose for some test reasons.

@@ -124,7 +124,7 @@ func (m *ExpirationManager) Tidy() error {
var tidyErrors *multierror.Error

// Create a cache to keep track of looked up tokens
tokenCache := make(map[string]string)
validTokenCache := make(map[string]struct{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make this a bool and cache discovery of both valid and invalid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Had seen your previous comment after making this struct{} :-)


// If no errors were encountered, return a normal error instead of a
// multierror
if tidyErrors == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tidyFunc := func(leaseID string) {
i++
if i%500 == 0 {
m.logger.Debug("expiration: tidying of leases", "progress", i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/tidying of leases/tidying leases/

// with the token's salt ID at the end, remove it
for _, parent := range parentList {
children, err := ts.view.List(parentPrefix + parent + "/")
if atomic.CompareAndSwapInt64(&ts.tidyLock, 0, 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wrap the entire function in an if as opposed to having a block that checks at the beginning whether it's running and simply returns if so (as it does in the expiration manager tidy function)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. In expiration manager, the whole thing is in fact happening inside in if condition. Its just that the tidy tasks are all inside a func.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I was going to ask you to move that check in the expiration manager to the top of that function but then decided to leave it alone :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it consistent with the changes in the token store, I just did that. Also, added debug statements that indicate that the operation is complete.

if retErr != nil {
revResp, err := m.router.Route(logical.RevokeRequest(req.Path, resp.Secret, resp.Data))
if err != nil {
retErr = multierror.Append(retErr, errwrap.Wrapf("an additional error was encountered revoking the newly-generated secret: {{err}}", err))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we distinguish the 4 error messages here so that if we hit them, we can know where exactly the error was caused.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}()

if req.ClientToken == "" {
return "", fmt.Errorf("expiration: cannot register a lease with an empty client token")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the defer function to be after this check? There is no point in revoking a lease which didn't even get through the input validations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member Author

@vishalnayak vishalnayak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@briankassouf briankassouf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two optional comments, LGTM!

m.logger.Debug("expiration: revoking lease which holds an invalid token", "lease_id", leaseID)
revokeLease = true
deletedCountInvalidToken++
tokenCache[le.ClientToken] = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary, but I think you might as well add a goto REVOKE_CHECK statement here incase more branches are added in the future. Makes it slightly more extendable.

} else {
m.logger.Debug("expiration: revoking lease which contains an invalid token", "lease_id", leaseID)
revokeLease = true
deletedCountInvalidToken++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here.

@jefferai jefferai merged commit 2f1f4cc into master May 5, 2017
@jefferai jefferai deleted the sys-tidy-leases branch May 5, 2017 14:54
@juliangamble
Copy link

Hi Everyone,
Thanks for all your work on this - where is the best place to talk through ongoing blank token issues even after running the two new tidy operations on 0.7.2?
Cheers
Julian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants