Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VAULT-28677: Fix dangling entity-aliases in MemDB after invalidation #27750

Merged
merged 10 commits into from
Jul 25, 2024
3 changes: 3 additions & 0 deletions changelog/27750.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
```release-note:bug
core/identity: Fixed an issue where deleted/reassigned entity-aliases were not removed from in-memory database.
```
70 changes: 51 additions & 19 deletions vault/identity_store.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import (
"context"
"fmt"
"reflect"
"slices"
"strings"
"time"

Expand Down Expand Up @@ -721,34 +722,65 @@ func (i *IdentityStore) invalidateEntityBucket(ctx context.Context, key string)
}
}

// If the entity is not in MemDB or if it is but differs from the
// state that's in the bucket storage entry, upsert it into MemDB.

// We've considered the use of github.com/google/go-cmp here,
// but opted for sticking with reflect.DeepEqual because go-cmp
// is intended for testing and is able to panic in some
// situations.
if memDBEntity == nil || !reflect.DeepEqual(memDBEntity, bucketEntity) {
// The entity is not in MemDB, it's a new entity. Add it to MemDB.
err = i.upsertEntityInTxn(ctx, txn, bucketEntity, nil, false)
if err != nil {
i.logger.Error("failed to update entity in MemDB", "entity_id", bucketEntity.ID, "error", err)
return
if memDBEntity != nil && reflect.DeepEqual(memDBEntity, bucketEntity) {
// No changes on this entity, move on to the next one.
continue
}

// If the entity exists in MemDB it must differ from the entity in
// the storage bucket because of above test. Go through all of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the entity exists and it differs, are we assuming that the difference between them is always going to be the aliases? Why don't we simply delete the entity from memdb and add it again like the previous implementation did?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the MemDB entity differs from its corresponding storage bucket entity, it may or may not be the aliases. But this makes me think we could detect if the Aliases slices are the same but that would entail walking the entire set and that's what the current algorithm is doing. I could add a test case where there are no changes to the Aliases to really make sure it works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this makes me think we could detect if the Aliases slices are the same but that would entail walking the entire set and that's what the current algorithm is doing.

Wouldn't be easy to just detect if there are changes and replace one entity with the other? Deleting from memdb and adding without having to walk a slice and detect changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to what Bianca is saying. Just chiming in so I can follow along 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a change that follows this suggestion.

// aliases currently set in the memDB entity and check if any of
// those are not in the bucket entity. Those aliases that are only
// found in the memDB entity, need to be deleted from MemDB because
// the upsertEntityInTxn function does not delete those aliases.
if memDBEntity != nil {
biazmoreira marked this conversation as resolved.
Show resolved Hide resolved
aliasesToDelete := make([]*identity.Alias, 0)
bucketEntityAliases := bucketEntity.Aliases

slices.SortFunc(bucketEntityAliases, func(a, b *identity.Alias) int {
return strings.Compare(a.ID, b.ID)
})

for _, a := range memDBEntity.Aliases {
_, found := slices.BinarySearchFunc(bucketEntityAliases, a.ID, func(a *identity.Alias, b string) int {
return strings.Compare(a.ID, b)
})
if !found {
aliasesToDelete = append(aliasesToDelete, a)
}
}

// If this is a performance secondary, the entity created on
// this node would have been cached in a local cache based on
// the result of the CreateEntity RPC call to the primary
// cluster. Since this invalidation is signaling that the
// entity is now in the primary cluster's storage, the locally
// cached entry can be removed.
if i.localNode.ReplicationState().HasState(consts.ReplicationPerformanceSecondary) && i.localNode.HAState() == consts.Active {
if err := i.localAliasPacker.DeleteItem(ctx, bucketEntity.ID+tmpSuffix); err != nil {
i.logger.Error("failed to clear local alias entity cache", "error", err, "entity_id", bucketEntity.ID)
return
if len(aliasesToDelete) > 0 {
err := i.deleteAliasesInEntityInTxn(txn, memDBEntity, aliasesToDelete)
if err != nil {
i.logger.Error("failed to remove entity alias from MemDB", "entity_id", memDBEntity.ID, "error", err)
}
}
}

err = i.upsertEntityInTxn(ctx, txn, bucketEntity, nil, false)
if err != nil {
i.logger.Error("failed to update entity in MemDB", "entity_id", bucketEntity.ID, "error", err)
return
}

// If this is a performance secondary, the entity created on
// this node would have been cached in a local cache based on
// the result of the CreateEntity RPC call to the primary
// cluster. Since this invalidation is signaling that the
// entity is now in the primary cluster's storage, the locally
// cached entry can be removed.
if i.localNode.ReplicationState().HasState(consts.ReplicationPerformanceSecondary) && i.localNode.HAState() == consts.Active {
if err := i.localAliasPacker.DeleteItem(ctx, bucketEntity.ID+tmpSuffix); err != nil {
i.logger.Error("failed to clear local alias entity cache", "error", err, "entity_id", bucketEntity.ID)
return
Comment on lines +758 to +767
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole dance is fascinating. I'm a little curious how you discovered it here - it seems like this being missing is an unrelated bug to the regression right?

@biazmoreira you probably know all about this already, is this another place that breaks the mental model of all global writes go to primary because we updated memdb with a global thing outside of replication? I think we saw places like that with standbys but this was new to me that we have perf secondaries updating their memdb outside of replication for replicated state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anyone who reads this later, I realised this wasn't new code - just moved down from further up (see lines 739 and on in the before part of this diff).

}
}

}
}

Expand Down
126 changes: 126 additions & 0 deletions vault/identity_store_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -992,6 +992,132 @@ func TestIdentityStoreInvalidate_Entities(t *testing.T) {
txn.Commit()
}

// TestIdentityStoreInvalidate_EntityAliasDelete verifies that the
// invalidateEntityBucket method properly cleans up aliases from
// MemDB that are no longer associated with the entity in the
// storage bucket.
func TestIdentityStoreInvalidate_EntityAliasDelete(t *testing.T) {
ctx := namespace.ContextWithNamespace(context.Background(), namespace.RootNamespace)
c, _, root := TestCoreUnsealed(t)

// Enable a No-Op auth method
c.credentialBackends["noop"] = func(context.Context, *logical.BackendConfig) (logical.Backend, error) {
return &NoopBackend{
BackendType: logical.TypeCredential,
}, nil
}
mountAccessor1 := "noop-accessor1"
mountAccessor2 := "noop-accessor2"
mountAccessor3 := "noon-accessor3"

createMountEntry := func(path, uuid, mountAccessor string, local bool) *MountEntry {
return &MountEntry{
Table: credentialTableType,
Path: path,
Type: "noop",
UUID: uuid,
Accessor: mountAccessor,
BackendAwareUUID: uuid + "backend",
NamespaceID: namespace.RootNamespaceID,
namespace: namespace.RootNamespace,
Local: local,
}
}

c.auth = &MountTable{
Type: credentialTableType,
Entries: []*MountEntry{
createMountEntry("/noop1", "abcd", mountAccessor1, false),
createMountEntry("/noop2", "ghij", mountAccessor2, false),
createMountEntry("/noop3", "mnop", mountAccessor3, true),
},
}

require.NoError(t, c.setupCredentials(context.Background()))

// Create an entity
req := &logical.Request{
ClientToken: root,
Operation: logical.UpdateOperation,
Path: "entity",
Data: map[string]interface{}{
"name": "alice",
},
}

resp, err := c.identityStore.HandleRequest(ctx, req)
require.NoError(t, err)
require.NotNil(t, resp)
require.Contains(t, resp.Data, "id")

entityID := resp.Data["id"].(string)

// Create two entity-aliases
createEntityAlias := func(name, mountAccessor string) string {
req = &logical.Request{
ClientToken: root,
Operation: logical.UpdateOperation,
Path: "entity-alias",
Data: map[string]interface{}{
"name": name,
"canonical_id": entityID,
"mount_accessor": mountAccessor,
},
}

resp, err = c.identityStore.HandleRequest(ctx, req)
require.NoError(t, err)
require.NotNil(t, resp)
require.Contains(t, resp.Data, "id")

return resp.Data["id"].(string)
}

alias1ID := createEntityAlias("alias1", mountAccessor1)
alias2ID := createEntityAlias("alias2", mountAccessor2)
alias3ID := createEntityAlias("alias3", mountAccessor3)

// Update the entity in storage only to remove alias2 then call invalidate
bucketKey := c.identityStore.entityPacker.BucketKey(entityID)
bucket, err := c.identityStore.entityPacker.GetBucket(context.Background(), bucketKey)
require.NoError(t, err)
require.NotNil(t, bucket)

bucketEntityItem := bucket.Items[0] // since there's only 1 entity
bucketEntity, err := c.identityStore.parseEntityFromBucketItem(context.Background(), bucketEntityItem)
require.NoError(t, err)
require.NotNil(t, bucketEntity)

replacementAliases := make([]*identity.Alias, 1)
for _, a := range bucketEntity.Aliases {
if a.ID != alias2ID {
replacementAliases[0] = a
break
}
}

bucketEntity.Aliases = replacementAliases

bucketEntityItem.Message, err = anypb.New(bucketEntity)
require.NoError(t, err)

require.NoError(t, c.identityStore.entityPacker.PutItem(context.Background(), bucketEntityItem))

c.identityStore.Invalidate(context.Background(), bucketKey)

alias1, err := c.identityStore.MemDBAliasByID(alias1ID, false, false)
assert.NoError(t, err)
assert.NotNil(t, alias1)

alias2, err := c.identityStore.MemDBAliasByID(alias2ID, false, false)
assert.NoError(t, err)
assert.Nil(t, alias2)

alias3, err := c.identityStore.MemDBAliasByID(alias3ID, false, false)
assert.NoError(t, err)
assert.NotNil(t, alias3)
}

// TestIdentityStoreInvalidate_LocalAliasesWithEntity verifies the correct
// handling of local aliases in the Invalidate method.
func TestIdentityStoreInvalidate_LocalAliasesWithEntity(t *testing.T) {
Expand Down
1 change: 1 addition & 0 deletions website/content/docs/release-notes/1.16.1.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ description: |-
| 1.16.1 - 1.16.3 | [New nodes added by autopilot upgrades provisioned with the wrong version](/vault/docs/upgrading/upgrade-to-1.15.x#new-nodes-added-by-autopilot-upgrades-provisioned-with-the-wrong-version) |
| 1.15.8+ | [Autopilot upgrade for Vault Enterprise fails](/vault/docs/upgrading/upgrade-to-1.15.x#autopilot) |
| 1.16.5 | [Listener stops listening on untrusted upstream connection with particular config settings](/vault/docs/upgrading/upgrade-to-1.16.x#listener-proxy-protocol-config) |
| 1.16.3 - 1.16.6 | [Vault standby nodes not deleting removed entity-aliases from in-memory database](/vault/docs/upgrade-to-1.16.x#dangling-entity-alias-in-memory) |

## Vault companion updates

Expand Down
1 change: 1 addition & 0 deletions website/content/docs/release-notes/1.17.0.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ description: |-
| Known issue (1.17.0) | [Vault Agent and Vault Proxy consume excessive amounts of CPU](/vault/docs/upgrading/upgrade-to-1.17.x#agent-proxy-cpu-1-17) |
| Known issue (1.15.8+) | [Autopilot upgrade for Vault Enterprise fails](/vault/docs/upgrading/upgrade-to-1.15.x#autopilot) |
| Known issue (1.17.1) | [Listener stops listening on untrusted upstream connection with particular config settings](/vault/docs/upgrading/upgrade-to-1.17.x#listener-proxy-protocol-config) |
| Known issue (1.17.0 - 1.17.2) | [Vault standby nodes not deleting removed entity-aliases from in-memory database](/vault/docs/upgrade-to-1.17.x#dangling-entity-alias-in-memory)

## Vault companion updates

Expand Down
2 changes: 2 additions & 0 deletions website/content/docs/upgrading/upgrade-to-1.16.x.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -115,3 +115,5 @@ decides to trigger the flag. More information can be found in the
@include 'known-issues/1_16_secrets-sync-chroot-activation.mdx'

@include 'known-issues/config_listener_proxy_protocol_behavior_issue.mdx'

@include 'known-issues/dangling-entity-aliases-in-memory.mdx'
2 changes: 2 additions & 0 deletions website/content/docs/upgrading/upgrade-to-1.17.x.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -90,3 +90,5 @@ incorrectly. For additional details, refer to the
@include 'known-issues/config_listener_proxy_protocol_behavior_issue.mdx'

@include 'known-issues/transit-input-on-cmac-response.mdx'

@include 'known-issues/dangling-entity-aliases-in-memory.mdx'
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
<a id="dangling-entity-alias-in-memory" />

### Deleting an entity-aliases does not remove it from the in-memory database on standby nodes

#### Affected versions

##### Vault Community Edition

- 1.16.3
- 1.17.0 - 1.17.2

##### Enterprise

- 1.16.3+ent - 1.16.6+ent
- 1.17.0+ent - 1.17.2+ent

#### Issue

Entity-aliases deleted from Vault are not removed from the in-memory database on
standby nodes within a cluster. As a result, API requests to create a new
entity-alias with the same mount accessor and name sent to a standby node will
fail.

This bug is fixed in Vault 1.16.7+ent, 1.17.3, 1.17.3+ent and later.
Loading