You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Requests to create an entity (POST identity/entity) and to merge 2 entities (POST identity/entity/merge) both rely on two Mutex's: IdentityStore's lock and a write lock for the go-memdb instance used to store identity information in memory.
The two paths acquire the locks in a different order from each other, making it possible to deadlock all write calls to the vault identity backend.
The below examples all use stack traces and links from Vault v1.6.2:
The entity creation request path locks the IdentityStore lock (1) and then memdb lock (2):
((2) THIS FUNCTION REQUIRES MEM DB LOCK) github.com/hashicorp/go-memdb.(*MemDB).Txn(memdb.go:57)
github.com/hashicorp/vault/vault.(*IdentityStore).upsertEntity(identity_store_util.go:501)
((1) THIS FUNCTION LOCKS IDENTITY LOCK) github.com/hashicorp/vault/vault.(*IdentityStore).handleEntityUpdateCommon.func1(identity_store_entities.go:270)
github.com/hashicorp/vault/sdk/framework. (*Backend).HandleRequest(backend.go:272)
But the entity merge request path locks the memdb lock (1) and then the Identity store lock (2)
((2) THIS FUNCTION REQUIRES IDENTITY LOCK) github.com/hashicorp/vault/vault.(*IdentityStore).mergeEntity(identity_store_entities.go:716)
((1) THIS FUNCTION LOCKS MEM DB LOCK) github.com/hashicorp/vault/vault.(*IdentityStore).pathEntityMergeID.func1(identity_store_entities.go:175)
github.com/hashicorp/vault/sdk/framework.(*Backend).HandleRequest(backend.go:272)
This means that a concurrent pair of one create entity request and one merge request can come in, with the create entity request getting the identity lock and the merge entity request getting the memdb lock, and neither request being able to proceed any further.
At that point, all requests that involve a modification to the identity store will hang and timeout, until the Vault process is restarted or terminated and replaced.
Looking at the rest of the identity code, it appears that the Identity lock is always locked prior to a memdb write transaction being created. Based on that, I believe that the lock ordering in pathEntityMergeID is incorrect, and the IdentityStore lock call should be moved to happen prior to the memdb write transaction being created.
I've created a pull request that I believe resolves this: #10877
To Reproduce
Requires vault, jq and something like uuidgen for generating ids
Steps to reproduce the behavior:
Run vault server -dev -log-level=trace -dev-root-token-id=dev-root-token
On my laptop, usually within 300-400 requests, if not sooner, both processes will lock up at the same time, and then begin logging context deadline exceeded exceptions every 60 seconds. Running multiple instances of each loop may accelerate this, but I usually could reproduce it with just one instance of each loop in under a minute.
Expected behavior
Create and Merge calls should be able to work in parallel without issue
Environment:
Vault Server Version (retrieve with vault status): Reproduced on both v1.5.5 and v.1.6.2
Vault CLI Version (retrieve with vault version): v1.5.5, v1.6.2
Server Operating System/Architecture: Ubuntu , OS X
We initially observed this on one of our Vault clusters under normal traffic from a variety of clients/services, and I was then able to reproduce on both Vault v1.5.5 and Vault v1.6.2 locally on my laptop
Additional Information
I've created a pull request that I believe resolves this: #10877
The text was updated successfully, but these errors were encountered:
Describe the bug
Requests to create an entity (POST
identity/entity
) and to merge 2 entities (POSTidentity/entity/merge
) both rely on two Mutex's:IdentityStore
's lock and a write lock for the go-memdb instance used to store identity information in memory.The two paths acquire the locks in a different order from each other, making it possible to deadlock all write calls to the vault identity backend.
The below examples all use stack traces and links from Vault v1.6.2:
The entity creation request path locks the IdentityStore lock (1) and then memdb lock (2):
vault/vault/identity_store_entities.go
Line 194 in c7f674e
vault/vault/identity_store_util.go
Line 494 in 72752ca
But the entity merge request path locks the memdb lock (1) and then the Identity store lock (2)
vault/vault/identity_store_entities.go
Line 717 in c7f674e
vault/vault/identity_store_entities.go
Line 167 in c7f674e
This means that a concurrent pair of one create entity request and one merge request can come in, with the create entity request getting the identity lock and the merge entity request getting the memdb lock, and neither request being able to proceed any further.
At that point, all requests that involve a modification to the identity store will hang and timeout, until the Vault process is restarted or terminated and replaced.
Looking at the rest of the identity code, it appears that the Identity lock is always locked prior to a memdb write transaction being created. Based on that, I believe that the lock ordering in
pathEntityMergeID
is incorrect, and the IdentityStore lock call should be moved to happen prior to the memdb write transaction being created.I've created a pull request that I believe resolves this: #10877
To Reproduce
Requires
vault
,jq
and something likeuuidgen
for generating idsSteps to reproduce the behavior:
vault server -dev -log-level=trace -dev-root-token-id=dev-root-token
uuidgen
can be replaced with anything that gives you unique/random strings for namesOn my laptop, usually within 300-400 requests, if not sooner, both processes will lock up at the same time, and then begin logging context deadline exceeded exceptions every 60 seconds. Running multiple instances of each loop may accelerate this, but I usually could reproduce it with just one instance of each loop in under a minute.
Expected behavior
Create and Merge calls should be able to work in parallel without issue
Environment:
vault status
): Reproduced on both v1.5.5 and v.1.6.2vault version
): v1.5.5, v1.6.2We initially observed this on one of our Vault clusters under normal traffic from a variety of clients/services, and I was then able to reproduce on both Vault v1.5.5 and Vault v1.6.2 locally on my laptop
Additional Information
I've created a pull request that I believe resolves this: #10877
The text was updated successfully, but these errors were encountered: