Skip to content

Commit

Permalink
logical/aws: Harden WAL entry creation (#5202)
Browse files Browse the repository at this point in the history
* logical/aws: Harden WAL entry creation

If AWS IAM user creation failed in any way, the WAL corresponding to the
IAM user would get left around and Vault would try to roll it back.
However, because the user never existed, the rollback failed. Thus, the
WAL would essentially get "stuck" and Vault would continually attempt to
roll it back, failing every time. A similar situation could arise if the
IAM user that Vault created got deleted out of band, or if Vault deleted
it but was unable to write the lease revocation back to storage (e.g., a
storage failure).

This attempts to harden it in two ways. One is by deleting the WAL log
entry if the IAM user creation fails. However, the WAL deletion could
still fail, and this wouldn't help where the user is deleted out of
band, so second, consider the user rolled back if the user just doesn't
exist, under certain circumstances.

Fixes #5190

* Fix segfault in expiration unit tests

TestExpiration_Tidy was passing in a leaseEntry that had a nil Secret,
which then caused a segfault as the changes to revokeEntry didn't check
whether Secret was nil; this is probably unlikely to occur in real life,
but good to be extra cautious.

* Fix potential segfault

Missed the else...

* Respond to PR feedback
  • Loading branch information
joelthompson authored and chrishoffman committed Sep 27, 2018
1 parent 8014b8b commit 0510b7e
Show file tree
Hide file tree
Showing 5 changed files with 46 additions and 2 deletions.
4 changes: 3 additions & 1 deletion builtin/logical/aws/backend.go
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ func Backend() *backend {
},

WALRollback: b.walRollback,
WALRollbackMinAge: 5 * time.Minute,
WALRollbackMinAge: minAwsUserRollbackAge,
BackendType: logical.TypeLogical,
}

Expand Down Expand Up @@ -135,3 +135,5 @@ func (b *backend) clientSTS(ctx context.Context, s logical.Storage) (stsiface.ST

return b.stsClient, nil
}

const minAwsUserRollbackAge = 5 * time.Minute
31 changes: 31 additions & 0 deletions builtin/logical/aws/path_user.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@ import (
"context"
"fmt"
"strings"
"time"

"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/awserr"
"github.com/aws/aws-sdk-go/service/iam"
"github.com/hashicorp/errwrap"
"github.com/hashicorp/vault/helper/strutil"
Expand Down Expand Up @@ -130,6 +132,35 @@ func (b *backend) pathUserRollback(ctx context.Context, req *logical.Request, _k
MaxItems: aws.Int64(1000),
})
if err != nil {
// This isn't guaranteed to be perfect; for example, an IAM user
// might have gotten put into the WAL but then the IAM user creation
// failed (e.g., Vault didn't have permissions) and then the WAL
// deletion failed as well. Then, if Vault doesn't have access to
// call iam:ListGroupsForUser, AWS will return an access denied error
// and the WAL will never get cleaned up. But this is better than
// just having Vault "forget" about a user it actually created.
//
// BEWARE a potential race condition -- where this is called
// immediately after a user is created. AWS eventual consistency
// might say the user doesn't exist when the user does in fact
// exist, and this could cause Vault to forget about the user.
// This won't happen if the user creation fails (because the WAL
// minimum age is 5 minutes, and AWS eventual consistency is, in
// practice, never that long), but it could happen if a lease holder
// asks immediately after getting a user to revoke the lease, causing
// Vault to leak the secret, which would be a Very Bad Thing to allow.
// So we make sure that, if there's an associated lease, it must be at
// least 5 minutes old as well.
if aerr, ok := err.(awserr.Error); ok {
acceptMissingIamUsers := false
if req.Secret == nil || time.Since(req.Secret.IssueTime) > time.Duration(minAwsUserRollbackAge) {
// WAL rollback
acceptMissingIamUsers = true
}
if aerr.Code() == iam.ErrCodeNoSuchEntityException && acceptMissingIamUsers {
return nil
}
}
return err
}
groups := groupsResp.Groups
Expand Down
4 changes: 4 additions & 0 deletions builtin/logical/aws/secret_access_keys.go
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,10 @@ func (b *backend) secretAccessKeysCreate(
UserName: aws.String(username),
})
if err != nil {
if walErr := framework.DeleteWAL(ctx, s, walId); walErr != nil {
iamErr := errwrap.Wrapf("error creating IAM user: {{err}}", err)
return nil, errwrap.Wrap(errwrap.Wrapf("failed to delete WAL entry: {{err}}", walErr), iamErr)
}
return logical.ErrorResponse(fmt.Sprintf(
"Error creating IAM user: %s", err)), awsutil.CheckAWSError(err)
}
Expand Down
2 changes: 1 addition & 1 deletion logical/lease.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ type LeaseOptions struct {
Increment time.Duration `json:"-"`

// IssueTime is the time of issue for the original lease. This is
// only available on a Renew operation and has no effect when returning
// only available on Renew and Revoke operations and has no effect when returning
// a response. It can be used to enforce maximum lease periods by
// a logical backend.
IssueTime time.Time `json:"-"`
Expand Down
7 changes: 7 additions & 0 deletions vault/expiration.go
Original file line number Diff line number Diff line change
Expand Up @@ -1277,6 +1277,13 @@ func (m *ExpirationManager) revokeEntry(ctx context.Context, le *leaseEntry) err
return nil
}

if le.Secret != nil {
// not sure if this is really valid to have a leaseEntry with a nil Secret
// (if there's a nil Secret, what are you really leasing?), but the tests
// create one, and good to be defensive
le.Secret.IssueTime = le.IssueTime
}

// Make sure we're operating in the right namespace
nsCtx := namespace.ContextWithNamespace(ctx, le.namespace)

Expand Down

0 comments on commit 0510b7e

Please sign in to comment.