sql: fix deadlock when updating backfill progress #69040

ajwerner · 2021-08-17T15:50:43Z

The root cause here is that we acquired the mutex inside the transaction which
also laid down intents. This was not a problem in earlier iterations of this
code because of the FOR UPDATE logic which would, generally, in theory, order
the transactions such that the first one to acquire the mutex would be the
first to lay down an intent, thus avoiding the deadlock by ordering the
acquisitions. That was changed in #68244, which removed the FOR UPDATE.

What we see now is that you have a transaction doing the progress update which
hits a restart but has laid down an intent. Then we have a transaction which
is doing a details update that starts and acquires the mutex but blocks on the
intent of the other transaction. That other transaction now is blocked on the
mutex and we have a deadlock.

The solution here is to not acquire the mutex inside these transactions.
Instead, the code copies out the relevant state prior to issuing the
transaction. The cost here should be pretty minimal and the staleness in
the fact of retries is the least of my concerns.

No release note because the code in #68244 has never been released.

Touches #68951, #68958.

Release note: None

cockroach-teamcity · 2021-08-17T15:50:49Z

This change is

fqazi

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @dt)

fqazi

Reviewed 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @dt)

The root cause here is that we acquired the mutex inside the transaction which also laid down intents. This was not a problem in earlier iterations of this code because of the FOR UPDATE logic which would, generally, in theory, order the transactions such that the first one to acquire the mutex would be the first to lay down an intent, thus avoiding the deadlock by ordering the acquisitions. That was changed in cockroachdb#68244, which removed the FOR UPDATE. What we see now is that you have a transaction doing the progress update which hits a restart but has laid down an intent. Then we have a transaction which is doing a details update that starts and acquires the mutex but blocks on the intent of the other transaction. That other transaction now is blocked on the mutex and we have a deadlock. The solution here is to not acquire the mutex inside these transactions. Instead, the code copies out the relevant state prior to issuing the transaction. The cost here should be pretty minimal and the staleness in the fact of retries is the least of my concerns. No release note because the code in cockroachdb#68244 has never been released. Release note: None

ajwerner · 2021-08-19T02:53:32Z

bors r+

craig · 2021-08-19T03:55:31Z

Build succeeded:

GitHub CI (Cockroach)

ajwerner added the backport-21.1.x label Aug 17, 2021

ajwerner requested review from dt, adityamaru and a team August 17, 2021 15:50

fqazi approved these changes Aug 17, 2021

View reviewed changes

ajwerner force-pushed the ajwerner/fix-lock-ordering-issues branch from 56840b5 to f673951 Compare August 17, 2021 17:14

fqazi reviewed Aug 17, 2021

View reviewed changes

ajwerner force-pushed the ajwerner/fix-lock-ordering-issues branch from f673951 to e8899e8 Compare August 18, 2021 16:51

dt approved these changes Aug 18, 2021

View reviewed changes

ajwerner force-pushed the ajwerner/fix-lock-ordering-issues branch from e8899e8 to ec29064 Compare August 18, 2021 21:19

craig bot merged commit 4d6d79d into cockroachdb:master Aug 19, 2021

blathers-crl bot mentioned this pull request Aug 19, 2021

release-21.1: sql: fix deadlock when updating backfill progress #69130

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: fix deadlock when updating backfill progress #69040

sql: fix deadlock when updating backfill progress #69040

ajwerner commented Aug 17, 2021

cockroach-teamcity commented Aug 17, 2021

fqazi left a comment

fqazi left a comment

ajwerner commented Aug 19, 2021

craig bot commented Aug 19, 2021

sql: fix deadlock when updating backfill progress #69040

sql: fix deadlock when updating backfill progress #69040

Conversation

ajwerner commented Aug 17, 2021

cockroach-teamcity commented Aug 17, 2021

fqazi left a comment

Choose a reason for hiding this comment

fqazi left a comment

Choose a reason for hiding this comment

ajwerner commented Aug 19, 2021

craig bot commented Aug 19, 2021