Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning of stale leases #2452

Merged
merged 47 commits into from
May 5, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
1e5d6e3
Added sys/tidy-leases endpoint
vishalnayak Mar 7, 2017
239bd1c
Add locking where possible while doing auth/token/tidy
vishalnayak Mar 7, 2017
14aaa0a
Merge branch 'oss' into clean-stale-leases
vishalnayak Apr 26, 2017
3477038
Added atomic lock to ensure a single tidy operation is in progress
vishalnayak Apr 26, 2017
e52625d
Revoke lease that has empty token; added logs
vishalnayak Apr 26, 2017
dca0d70
Added logger to token store and logs to tidy function
vishalnayak Apr 26, 2017
de1a2a0
Added caching of looked up tokens
vishalnayak Apr 26, 2017
65c63b4
Fix the log statements
vishalnayak Apr 26, 2017
b036478
Fix logging levels
vishalnayak Apr 26, 2017
711153d
Fix logging suggestions; put the policyStore nil check back in
vishalnayak Apr 27, 2017
0d629ff
Cache only valid tokens
vishalnayak Apr 27, 2017
0c65cd4
Some more logging updates
vishalnayak Apr 27, 2017
785177a
Merge branch 'oss' into sys-tidy-leases
vishalnayak Apr 27, 2017
3fdf38a
Distinguish valid and invalid tokens using bool value in cache
vishalnayak Apr 27, 2017
98cdb68
Use an atomic lock for tidy operation in token store
vishalnayak Apr 27, 2017
2ef62fe
refactor lock handling in token tidy function
vishalnayak Apr 27, 2017
a8ef2c0
Refactor locking code in lease tidy; add ending debug statements
vishalnayak Apr 27, 2017
0892102
Merge branch 'oss' into sys-tidy-leases
vishalnayak May 1, 2017
8c7b175
Skip checking the validity of an empty client token
vishalnayak May 2, 2017
853233a
Added a test for tidying of empty token
vishalnayak May 2, 2017
d07d3cb
Added steps to check if invalid token is properly cleaned up
vishalnayak May 2, 2017
79fc0d8
Check if multiple leases with same invalid token is getting cleaned up
vishalnayak May 2, 2017
497bebe
Do not duplicate log lines for invalid leases
vishalnayak May 2, 2017
aa08e5c
Added test to check the atomicity of the lease tidy operation
vishalnayak May 2, 2017
a3c2a42
Test to check that leases with valid tokens are not being cleaned up
vishalnayak May 2, 2017
b6843ec
Added summary logs to help better understand the consequence
vishalnayak May 3, 2017
0c02540
Less scary debugging
jefferai May 3, 2017
b3c6a56
change some logging output
jefferai May 3, 2017
415b0a2
Two things:
jefferai May 3, 2017
0bda5a7
Adhere to tainted status in salted accessor lookup
jefferai May 3, 2017
8d35f92
consistent logging
vishalnayak May 3, 2017
a2e431b
Added logs when deletion fails so we can rely on server logs
vishalnayak May 3, 2017
2d21bf6
logging updates
vishalnayak May 3, 2017
5bc47b0
Add taint flag for looking up by accessor
jefferai May 3, 2017
b0c4a7e
Add more cleanup if a lease fails to register and revoke tokens if re…
jefferai May 3, 2017
265b4cd
Merge remote-tracking branch 'oss/master' into sys-tidy-leases
jefferai May 3, 2017
2f6e924
Fix substitution of index/child in delete call
jefferai May 3, 2017
f1d2fc3
Merge branch 'master-oss' into sys-tidy-leases
jefferai May 4, 2017
5320da0
Move tidy-leases to leases/tidy
jefferai May 4, 2017
cacf072
Update commenting
jefferai May 4, 2017
4de09fb
Update Tidy function comment
vishalnayak May 4, 2017
106f08a
Fix up the tests
vishalnayak May 4, 2017
7829107
Update comments
jefferai May 4, 2017
1378dd5
Move client token check in exp register to top
jefferai May 4, 2017
5dde45d
Address feedback
jefferai May 5, 2017
e61298e
Update debugging around tidy
jefferai May 5, 2017
0e10477
Merge branch 'master-oss' into sys-tidy-leases
jefferai May 5, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
154 changes: 151 additions & 3 deletions vault/expiration.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,18 @@ import (
"path"
"strings"
"sync"
"sync/atomic"
"time"

"github.com/armon/go-metrics"
log "github.com/mgutz/logxi/v1"

"github.com/hashicorp/errwrap"
multierror "github.com/hashicorp/go-multierror"
"github.com/hashicorp/go-uuid"
"github.com/hashicorp/vault/helper/consts"
"github.com/hashicorp/vault/helper/jsonutil"
"github.com/hashicorp/vault/helper/locksutil"
"github.com/hashicorp/vault/logical"
)

Expand Down Expand Up @@ -57,6 +61,8 @@ type ExpirationManager struct {

pending map[string]*time.Timer
pendingLock sync.Mutex

tidyLock int64
}

// NewExpirationManager creates a new ExpirationManager that is backed
Expand Down Expand Up @@ -114,6 +120,113 @@ func (c *Core) stopExpiration() error {
return nil
}

// Tidy cleans up the dangling storage entries for leases. It scans the storage
// view to find all the available leases, checks if the token embedded in it is
// either empty or invalid and in both the cases, it revokes them. It also uses
// a token cache to avoid multiple lookups of the same token ID. It is normally
// not required to use the API that invokes this. This is only intended to
// clean up the corrupt storage due to bugs.
func (m *ExpirationManager) Tidy() error {
var tidyErrors *multierror.Error

if !atomic.CompareAndSwapInt64(&m.tidyLock, 0, 1) {
m.logger.Warn("expiration: tidy operation on leases is already in progress")
return fmt.Errorf("tidy operation on leases is already in progress")
}

defer atomic.CompareAndSwapInt64(&m.tidyLock, 1, 0)

m.logger.Info("expiration: beginning tidy operation on leases")
defer m.logger.Info("expiration: finished tidy operation on leases")

// Create a cache to keep track of looked up tokens
tokenCache := make(map[string]bool)
var countLease, revokedCount, deletedCountInvalidToken, deletedCountEmptyToken int64

tidyFunc := func(leaseID string) {
countLease++
if countLease%500 == 0 {
m.logger.Info("expiration: tidying leases", "progress", countLease)
}

le, err := m.loadEntry(leaseID)
if err != nil {
tidyErrors = multierror.Append(tidyErrors, fmt.Errorf("failed to load the lease ID %q: %v", leaseID, err))
return
}

if le == nil {
tidyErrors = multierror.Append(tidyErrors, fmt.Errorf("nil entry for lease ID %q: %v", leaseID, err))
return
}

var isValid, ok bool
revokeLease := false
if le.ClientToken == "" {
m.logger.Trace("expiration: revoking lease which has an empty token", "lease_id", leaseID)
revokeLease = true
deletedCountEmptyToken++
goto REVOKE_CHECK
}

isValid, ok = tokenCache[le.ClientToken]
if !ok {
saltedID := m.tokenStore.SaltID(le.ClientToken)
lock := locksutil.LockForKey(m.tokenStore.tokenLocks, le.ClientToken)
lock.RLock()
te, err := m.tokenStore.lookupSalted(saltedID, true)
lock.RUnlock()

if err != nil {
tidyErrors = multierror.Append(tidyErrors, fmt.Errorf("failed to lookup token: %v", err))
return
}

if te == nil {
m.logger.Trace("expiration: revoking lease which holds an invalid token", "lease_id", leaseID)
revokeLease = true
deletedCountInvalidToken++
tokenCache[le.ClientToken] = false
} else {
tokenCache[le.ClientToken] = true
}
goto REVOKE_CHECK
} else {
if isValid {
return
} else {
m.logger.Trace("expiration: revoking lease which contains an invalid token", "lease_id", leaseID)
revokeLease = true
deletedCountInvalidToken++
}
goto REVOKE_CHECK
}

REVOKE_CHECK:
if revokeLease {
// Force the revocation and skip going through the token store
// again
err = m.revokeCommon(leaseID, true, true)
if err != nil {
tidyErrors = multierror.Append(tidyErrors, fmt.Errorf("failed to revoke an invalid lease with ID %q: %v", leaseID, err))
return
}
revokedCount++
}
}

if err := logical.ScanView(m.idView, tidyFunc); err != nil {
return err
}

m.logger.Debug("expiration: number of leases scanned", "count", countLease)
m.logger.Debug("expiration: number of leases which had empty tokens", "count", deletedCountEmptyToken)
m.logger.Debug("expiration: number of leases which had invalid tokens", "count", deletedCountInvalidToken)
m.logger.Debug("expiration: number of leases successfully revoked", "count", revokedCount)

return tidyErrors.ErrorOrNil()
}

// Restore is used to recover the lease states when starting.
// This is used after starting the vault.
func (m *ExpirationManager) Restore() error {
Expand Down Expand Up @@ -498,8 +611,13 @@ func (m *ExpirationManager) RenewToken(req *logical.Request, source string, toke
// Register is used to take a request and response with an associated
// lease. The secret gets assigned a LeaseID and the management of
// of lease is assumed by the expiration manager.
func (m *ExpirationManager) Register(req *logical.Request, resp *logical.Response) (string, error) {
func (m *ExpirationManager) Register(req *logical.Request, resp *logical.Response) (id string, retErr error) {
defer metrics.MeasureSince([]string{"expire", "register"}, time.Now())

if req.ClientToken == "" {
return "", fmt.Errorf("expiration: cannot register a lease with an empty client token")
}

// Ignore if there is no leased secret
if resp == nil || resp.Secret == nil {
return "", nil
Expand All @@ -515,8 +633,34 @@ func (m *ExpirationManager) Register(req *logical.Request, resp *logical.Respons
if err != nil {
return "", err
}

leaseID := path.Join(req.Path, leaseUUID)

defer func() {
// If there is an error we want to rollback as much as possible (note
// that errors here are ignored to do as much cleanup as we can). We
// want to revoke a generated secret (since an error means we may not
// be successfully tracking it), remove indexes, and delete the entry.
if retErr != nil {
revResp, err := m.router.Route(logical.RevokeRequest(req.Path, resp.Secret, resp.Data))
if err != nil {
retErr = multierror.Append(retErr, errwrap.Wrapf("an additional internal error was encountered revoking the newly-generated secret: {{err}}", err))
} else if revResp != nil && revResp.IsError() {
retErr = multierror.Append(retErr, errwrap.Wrapf("an additional error was encountered revoking the newly-generated secret: {{err}}", revResp.Error()))
}

if err := m.deleteEntry(leaseID); err != nil {
retErr = multierror.Append(retErr, errwrap.Wrapf("an additional error was encountered deleting any lease associated with the newly-generated secret: {{err}}", err))
}

if err := m.removeIndexByToken(req.ClientToken, leaseID); err != nil {
retErr = multierror.Append(retErr, errwrap.Wrapf("an additional error was encountered removing lease indexes associated with the newly-generated secret: {{err}}", err))
}
}
}()

le := leaseEntry{
LeaseID: path.Join(req.Path, leaseUUID),
LeaseID: leaseID,
ClientToken: req.ClientToken,
Path: req.Path,
Data: resp.Data,
Expand Down Expand Up @@ -548,6 +692,10 @@ func (m *ExpirationManager) Register(req *logical.Request, resp *logical.Respons
func (m *ExpirationManager) RegisterAuth(source string, auth *logical.Auth) error {
defer metrics.MeasureSince([]string{"expire", "register-auth"}, time.Now())

if auth.ClientToken == "" {
return fmt.Errorf("expiration: cannot register an auth lease with an empty token")
}

// Create a lease entry
le := leaseEntry{
LeaseID: path.Join(source, m.tokenStore.SaltID(auth.ClientToken)),
Expand Down Expand Up @@ -668,7 +816,7 @@ func (m *ExpirationManager) revokeEntry(le *leaseEntry) error {
// Revocation of login tokens is special since we can by-pass the
// backend and directly interact with the token store
if le.Auth != nil {
if err := m.tokenStore.RevokeTree(le.Auth.ClientToken); err != nil {
if err := m.tokenStore.RevokeTree(le.ClientToken); err != nil {
return fmt.Errorf("failed to revoke token: %v", err)
}

Expand Down
Loading