-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: fix deadlock when updating backfill progress #69040
sql: fix deadlock when updating backfill progress #69040
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @dt)
56840b5
to
f673951
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @dt)
f673951
to
e8899e8
Compare
The root cause here is that we acquired the mutex inside the transaction which also laid down intents. This was not a problem in earlier iterations of this code because of the FOR UPDATE logic which would, generally, in theory, order the transactions such that the first one to acquire the mutex would be the first to lay down an intent, thus avoiding the deadlock by ordering the acquisitions. That was changed in cockroachdb#68244, which removed the FOR UPDATE. What we see now is that you have a transaction doing the progress update which hits a restart but has laid down an intent. Then we have a transaction which is doing a details update that starts and acquires the mutex but blocks on the intent of the other transaction. That other transaction now is blocked on the mutex and we have a deadlock. The solution here is to not acquire the mutex inside these transactions. Instead, the code copies out the relevant state prior to issuing the transaction. The cost here should be pretty minimal and the staleness in the fact of retries is the least of my concerns. No release note because the code in cockroachdb#68244 has never been released. Release note: None
e8899e8
to
ec29064
Compare
bors r+ |
Build succeeded: |
The root cause here is that we acquired the mutex inside the transaction which
also laid down intents. This was not a problem in earlier iterations of this
code because of the FOR UPDATE logic which would, generally, in theory, order
the transactions such that the first one to acquire the mutex would be the
first to lay down an intent, thus avoiding the deadlock by ordering the
acquisitions. That was changed in #68244, which removed the FOR UPDATE.
What we see now is that you have a transaction doing the progress update which
hits a restart but has laid down an intent. Then we have a transaction which
is doing a details update that starts and acquires the mutex but blocks on the
intent of the other transaction. That other transaction now is blocked on the
mutex and we have a deadlock.
The solution here is to not acquire the mutex inside these transactions.
Instead, the code copies out the relevant state prior to issuing the
transaction. The cost here should be pretty minimal and the staleness in
the fact of retries is the least of my concerns.
No release note because the code in #68244 has never been released.
Touches #68951, #68958.
Release note: None