Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in Identity Backend's Merge and Create paths #10876

Closed
ianferguson opened this issue Feb 9, 2021 · 0 comments · Fixed by #10877
Closed

Deadlock in Identity Backend's Merge and Create paths #10876

ianferguson opened this issue Feb 9, 2021 · 0 comments · Fixed by #10877

Comments

@ianferguson
Copy link
Contributor

ianferguson commented Feb 9, 2021

Describe the bug

Requests to create an entity (POST identity/entity) and to merge 2 entities (POST identity/entity/merge) both rely on two Mutex's: IdentityStore's lock and a write lock for the go-memdb instance used to store identity information in memory.

The two paths acquire the locks in a different order from each other, making it possible to deadlock all write calls to the vault identity backend.

The below examples all use stack traces and links from Vault v1.6.2:

The entity creation request path locks the IdentityStore lock (1) and then memdb lock (2):

((2) THIS FUNCTION REQUIRES MEM DB LOCK) github.com/hashicorp/go-memdb.(*MemDB).Txn(memdb.go:57) 
github.com/hashicorp/vault/vault.(*IdentityStore).upsertEntity(identity_store_util.go:501)
((1) THIS FUNCTION LOCKS IDENTITY LOCK) github.com/hashicorp/vault/vault.(*IdentityStore).handleEntityUpdateCommon.func1(identity_store_entities.go:270)   
github.com/hashicorp/vault/sdk/framework. (*Backend).HandleRequest(backend.go:272)
  1. txn := i.db.Txn(true)

But the entity merge request path locks the memdb lock (1) and then the Identity store lock (2)

((2) THIS FUNCTION REQUIRES IDENTITY LOCK) github.com/hashicorp/vault/vault.(*IdentityStore).mergeEntity(identity_store_entities.go:716) 
((1) THIS FUNCTION LOCKS MEM DB LOCK) github.com/hashicorp/vault/vault.(*IdentityStore).pathEntityMergeID.func1(identity_store_entities.go:175) 
github.com/hashicorp/vault/sdk/framework.(*Backend).HandleRequest(backend.go:272)
  1. defer i.lock.Unlock()
  2. txn := i.db.Txn(true)

This means that a concurrent pair of one create entity request and one merge request can come in, with the create entity request getting the identity lock and the merge entity request getting the memdb lock, and neither request being able to proceed any further.

At that point, all requests that involve a modification to the identity store will hang and timeout, until the Vault process is restarted or terminated and replaced.

Looking at the rest of the identity code, it appears that the Identity lock is always locked prior to a memdb write transaction being created. Based on that, I believe that the lock ordering in pathEntityMergeID is incorrect, and the IdentityStore lock call should be moved to happen prior to the memdb write transaction being created.

I've created a pull request that I believe resolves this: #10877

To Reproduce
Requires vault, jq and something like uuidgen for generating ids

Steps to reproduce the behavior:

  1. Run vault server -dev -log-level=trace -dev-root-token-id=dev-root-token
  2. In one process run a loop of entity creations:
export VAULT_ADDR=http://127.0.0.1:8200 
export VAULT_TOKEN=dev-root-token 
while true; do
  vault write -format=json identity/entity name=$(uuidgen) |  | jq -r '.data.id'
done;

uuidgen can be replaced with anything that gives you unique/random strings for names

  1. In another process at the same time run a loop that creates 2 entities and then merges them:
export VAULT_ADDR=http://127.0.0.1:8200 
export VAULT_TOKEN=dev-root-token 
while true; do
  a=$(vault write -format=json identity/entity name=$(uuidgen) |  | jq -r '.data.id')
  b=$(vault write -format=json identity/entity name=$(uuidgen) |  | jq -r '.data.id')
  vault write identity/entity/merge from_entity_ids=$a to_entity_id=$b
done;

On my laptop, usually within 300-400 requests, if not sooner, both processes will lock up at the same time, and then begin logging context deadline exceeded exceptions every 60 seconds. Running multiple instances of each loop may accelerate this, but I usually could reproduce it with just one instance of each loop in under a minute.

Expected behavior
Create and Merge calls should be able to work in parallel without issue

Environment:

  • Vault Server Version (retrieve with vault status): Reproduced on both v1.5.5 and v.1.6.2
  • Vault CLI Version (retrieve with vault version): v1.5.5, v1.6.2
  • Server Operating System/Architecture: Ubuntu , OS X

We initially observed this on one of our Vault clusters under normal traffic from a variety of clients/services, and I was then able to reproduce on both Vault v1.5.5 and Vault v1.6.2 locally on my laptop

Additional Information
I've created a pull request that I believe resolves this: #10877

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant