-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VAULT-28677: Fix dangling entity-aliases in MemDB after invalidation #27750
Changes from 7 commits
b217021
a230738
4ba821e
60210bf
6b5c05d
2669a06
b2a7812
f7e47ff
eb7df5f
6e7aea6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
```release-note:bug | ||
core/identity: Fixed an issue where deleted/reassigned entity-aliases were not removed from in-memory database. | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -721,34 +721,47 @@ func (i *IdentityStore) invalidateEntityBucket(ctx context.Context, key string) | |
} | ||
} | ||
|
||
// If the entity is not in MemDB or if it is but differs from the | ||
// state that's in the bucket storage entry, upsert it into MemDB. | ||
|
||
// We've considered the use of github.com/google/go-cmp here, | ||
// but opted for sticking with reflect.DeepEqual because go-cmp | ||
// is intended for testing and is able to panic in some | ||
// situations. | ||
if memDBEntity == nil || !reflect.DeepEqual(memDBEntity, bucketEntity) { | ||
// The entity is not in MemDB, it's a new entity. Add it to MemDB. | ||
err = i.upsertEntityInTxn(ctx, txn, bucketEntity, nil, false) | ||
if err != nil { | ||
i.logger.Error("failed to update entity in MemDB", "entity_id", bucketEntity.ID, "error", err) | ||
if memDBEntity != nil && reflect.DeepEqual(memDBEntity, bucketEntity) { | ||
// No changes on this entity, move on to the next one. | ||
continue | ||
} | ||
|
||
// If the entity exists in MemDB it must differ from the entity in | ||
// the storage bucket because of above test. Go through all of the | ||
// aliases currently set in the memDB entity and check if any of | ||
// those are not in the bucket entity. Those aliases that are only | ||
// found in the memDB entity, need to be deleted from MemDB because | ||
// the upsertEntityInTxn function does not delete those aliases. | ||
if memDBEntity != nil { | ||
biazmoreira marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if err := i.deleteAliasesInEntityInTxn(txn, memDBEntity, memDBEntity.Aliases); err != nil { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @marcboudreau @elliesterner brought up a valid point while we were talking about entity merge prevention. Do you think not deleting the entity from memdb might cause an automatic merge to be triggered? if you could write a test for that, that would be awesome. we would like to prevent further merges from happening. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From looking at the code in
In the And by pre-deleting the aliases, we ensure that circumstance 2 cannot happen either. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the sake of clarity, pre-deleting the entity from MemDB and then calling |
||
i.logger.Error("failed to remove entity aliases from changed entity", "entity_id", memDBEntity.ID, "error", err) | ||
return | ||
} | ||
} | ||
|
||
// If this is a performance secondary, the entity created on | ||
// this node would have been cached in a local cache based on | ||
// the result of the CreateEntity RPC call to the primary | ||
// cluster. Since this invalidation is signaling that the | ||
// entity is now in the primary cluster's storage, the locally | ||
// cached entry can be removed. | ||
if i.localNode.ReplicationState().HasState(consts.ReplicationPerformanceSecondary) && i.localNode.HAState() == consts.Active { | ||
if err := i.localAliasPacker.DeleteItem(ctx, bucketEntity.ID+tmpSuffix); err != nil { | ||
i.logger.Error("failed to clear local alias entity cache", "error", err, "entity_id", bucketEntity.ID) | ||
return | ||
} | ||
err = i.upsertEntityInTxn(ctx, txn, bucketEntity, nil, false) | ||
if err != nil { | ||
i.logger.Error("failed to update entity in MemDB", "entity_id", bucketEntity.ID, "error", err) | ||
return | ||
} | ||
|
||
// If this is a performance secondary, the entity created on | ||
// this node would have been cached in a local cache based on | ||
// the result of the CreateEntity RPC call to the primary | ||
// cluster. Since this invalidation is signaling that the | ||
// entity is now in the primary cluster's storage, the locally | ||
// cached entry can be removed. | ||
if i.localNode.ReplicationState().HasState(consts.ReplicationPerformanceSecondary) && i.localNode.HAState() == consts.Active { | ||
if err := i.localAliasPacker.DeleteItem(ctx, bucketEntity.ID+tmpSuffix); err != nil { | ||
i.logger.Error("failed to clear local alias entity cache", "error", err, "entity_id", bucketEntity.ID) | ||
return | ||
Comment on lines
+758
to
+767
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This whole dance is fascinating. I'm a little curious how you discovered it here - it seems like this being missing is an unrelated bug to the regression right? @biazmoreira you probably know all about this already, is this another place that breaks the mental model of all global writes go to primary because we updated memdb with a global thing outside of replication? I think we saw places like that with standbys but this was new to me that we have perf secondaries updating their memdb outside of replication for replicated state. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For anyone who reads this later, I realised this wasn't new code - just moved down from further up (see lines 739 and on in the before part of this diff). |
||
} | ||
} | ||
|
||
} | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
<a id="dangling-entity-alias-in-memory" /> | ||
|
||
### Deleting an entity-aliases does not remove it from the in-memory database on standby nodes | ||
|
||
#### Affected versions | ||
|
||
##### Vault Community Edition | ||
|
||
- 1.16.3 | ||
- 1.17.0 - 1.17.2 | ||
|
||
##### Enterprise | ||
|
||
- 1.16.3+ent - 1.16.6+ent | ||
- 1.17.0+ent - 1.17.2+ent | ||
|
||
#### Issue | ||
|
||
Entity-aliases deleted from Vault are not removed from the in-memory database on | ||
standby nodes within a cluster. As a result, API requests to create a new | ||
entity-alias with the same mount accessor and name sent to a standby node will | ||
fail. | ||
|
||
This bug is fixed in Vault 1.16.7+ent, 1.17.3, 1.17.3+ent and later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the entity exists and it differs, are we assuming that the difference between them is always going to be the aliases? Why don't we simply delete the entity from memdb and add it again like the previous implementation did?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the MemDB entity differs from its corresponding storage bucket entity, it may or may not be the aliases. But this makes me think we could detect if the Aliases slices are the same but that would entail walking the entire set and that's what the current algorithm is doing. I could add a test case where there are no changes to the Aliases to really make sure it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't be easy to just detect if there are changes and replace one entity with the other? Deleting from memdb and adding without having to walk a slice and detect changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to what Bianca is saying. Just chiming in so I can follow along 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a change that follows this suggestion.