kv/concurrency: remove TODO about impossible deadlock scenario #47616

nvanbenschoten · 2020-04-17T17:56:39Z

This commit removes a TODO that described a potential scenario in which a request-only dependency cycle could deadlock due to a simultaneous transaction abort. I realized after writing a full fix (9a9182f) that such a deadlock was not achievable in practice due to the semantics of the lockTable.

It may appear that there is a bug here in the handling of request-only
dependency cycles. If such a cycle was broken by simultaneously aborting
the transactions responsible for each of the request, there would be no
guarantee that an aborted pusher would notice that its own transaction
was aborted before it notices that its pushee's transaction was aborted.
For example, in the simplest case, imagine two requests deadlocked on
each other. If their transactions are both aborted and each push notices
the pushee is aborted first, they will both return here triumphantly and
wait for the other to exit its lock wait-queues, leading to deadlock.
Even if they eventually pushed each other again, there would be no
guarantee that the same thing wouldn't happen.

However, such a situation is not possible in practice because such a
dependency cycle is never constructed by the lockTable. The lockTable
assigns each request a monotonically increasing sequence number upon its
initial entrance to the lockTable. This sequence number is used to
straighten out dependency chains of requests such that a request only
waits on conflicting requests with lower sequence numbers than its own
sequence number. This behavior guarantees that request-only dependency
cycles are never constructed by the lockTable. Put differently, all
dependency cycles must include at least one dependency on a lock and,
therefore, one call to pushLockTxn. Unlike pushRequestTxn, pushLockTxn
actively removes the conflicting lock and removes the dependency when it
determines that its pushee transaction is aborted. This means that the
call to pushLockTxn will continue to make forward progress in the case of
a simultaneous abort of all transactions behind the members of the cycle,
preventing such a hypothesized deadlock from ever materializing.

The PR also includes a good amount of other cleanup that I was intending to land with the fix to this deadlock. For instance, it splits off a LockAcquisition type from the LockUpdate type. It also cleans up the internals of lock_table.go.

cockroach-teamcity · 2020-04-17T17:56:51Z

This change is

blathers-crl · 2020-04-17T18:25:52Z

❌ The GitHub CI (Cockroach) build has failed on ae7781b4.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

sumeerbhola

Reviewed 18 of 18 files at r1, 3 of 3 files at r2, 14 of 15 files at r3, 4 of 4 files at r4, 2 of 2 files at r5, 1 of 1 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/kv/kvserver/concurrency/lock_table.go, line 1122 at r4 (raw file):

	g.key = l.key
	g.mu.startWait = true
	if reservedBySelfTxn {

I like the reduction in local variables in the refactorings.

pkg/kv/kvserver/concurrency/lock_table.go, line 1054 at r5 (raw file):

		// reservation if the reservation has a lower seqNum. For reads, the
		// non-transactional and transactional behavior is equivalent and
		// handled later in this function.

the comment is now inconsistent with the change that removed sa == spanset.SpanReadWrite from the if-condition.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 454 at r6 (raw file):

	// call to pushLockTxn will continue to make forward progress in the case of
	// a simultaneous abort of all transactions behind the members of the cycle,
	// preventing such a hypothesized deadlock from ever materializing.

An example here would be helpful. And it may help clarify the definition of "request-only dependency cycle", which is quite narrow. Perhaps something like:

req(1, txn1), req(1, txn2) are both waiting on a lock held by txn3, and they respectively hold a reservation on key "a" and key "b".
req(2, txn2) queues up behind the reservation on key "a" and req(2, txn1) queues up behind the reservation on key "b". Now the dependency cycle between txn1 and txn2 only involves requests, but some of the requests here also depend on a lock. So when both txn1, txn2 are aborted, the req(1, txn1), req(1, txn2) will exit the lockTable, allowing req(2, txn1) and req(2, txn2) to get the reservation and now they no longer depend on each other.

pkg/roachpb/data.proto, line 569 at r1 (raw file):

// A LockAcquisition represents the action of a Transaction acquiring a lock
// with a specified durbility level over a Span of keys.

is this using a Span to be future-proof?

nvanbenschoten

TFTR!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)

pkg/kv/kvserver/concurrency/lock_table.go, line 1054 at r5 (raw file):

Previously, sumeerbhola wrote…

the comment is now inconsistent with the change that removed sa == spanset.SpanReadWrite from the if-condition.

Done.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 454 at r6 (raw file):

Previously, sumeerbhola wrote…

An example here would be helpful. And it may help clarify the definition of "request-only dependency cycle", which is quite narrow. Perhaps something like:

req(1, txn1), req(1, txn2) are both waiting on a lock held by txn3, and they respectively hold a reservation on key "a" and key "b".
req(2, txn2) queues up behind the reservation on key "a" and req(2, txn1) queues up behind the reservation on key "b". Now the dependency cycle between txn1 and txn2 only involves requests, but some of the requests here also depend on a lock. So when both txn1, txn2 are aborted, the req(1, txn1), req(1, txn2) will exit the lockTable, allowing req(2, txn1) and req(2, txn2) to get the reservation and now they no longer depend on each other.

Done.

pkg/roachpb/data.proto, line 569 at r1 (raw file):

Previously, sumeerbhola wrote…

is this using a Span to be future-proof?

Yes, we're likely to eventually introduce ranged lock acquisition.

blathers-crl · 2020-04-21T17:19:03Z

❌ The GitHub CI (Cockroach) build has failed on d4cbe154.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

nvanbenschoten · 2020-04-21T17:27:24Z

All of the roachtests in CI failed to start with the error invalid version string 'd4cbe15'. I'm guessing this is related to some of the github flakiness. I'll try again.

craig · 2020-04-21T17:27:31Z

Canceled

nvanbenschoten · 2020-04-21T17:28:37Z

bors r+

blathers-crl · 2020-04-21T17:41:28Z

❌ The GitHub CI (Cockroach) build has failed on ebcba168.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

sumeerbhola

Reviewed 2 of 18 files at r7, 1 of 2 files at r11, 1 of 1 files at r12.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)

craig · 2020-04-21T18:10:54Z

Build failed (retrying...)

GitHub CI (Cockroach)

craig · 2020-04-21T18:41:56Z

Build failed (retrying...)

GitHub CI (Cockroach)

craig · 2020-04-21T19:03:00Z

Build failed

GitHub CI (Cockroach)

6f81abe split off a new LockUpdate type from the existing Intent type. This was an improvement because, to that point, the roles of the Intent type had been seriously overloaded, leading to complexity and confusion. 6f81abe has a good commit message explaining this in detail. This separation moved us in the right direction, but it has become clear over the past month that it wasn't complete. Specifically, LockUpdate was still used for two distinct roles: - talking about updating existing locks - talking about acquiring new locks Using LockUpdate for both of these roles still left some room for ambiguity. The type contained fields that clearly only made sense for one of the two roles. For instance, IgnoredSeqnums was clearly only useful for the first role. More concerning, the type contained fields that arguably could have made sense for both roles but weren't actually used for both. This was the case for the Durability field. Originally, we planned on being able to resolve intents at specific durabilities, but it became clear that this was not a good idea. Because of this, the field was only actually used for the second role and was ignored by the IntentResolver. This commit addresses these concerns by splitting off a LockAcquisition type from the LockUpdate type. LockUpdate now exclusively serves the first role while LockAcquisition serves the second. This added type safety avoids confusion and room for bugs. NOTE: LockUpdate is not sent across the wire, so we're able to remove a field from it without reserving the field number.

These were unused and its not clear when they would be useful. They were an added burden to the implementation of lockTable, so it's better to remove them.

This field was used, but none of the uses were necessary. The field was an added burden to the implementation of lockTable, so it's better to remove it.

Strictly renames and added comments. Just a refactor.

This makes some of the logic easier to understand.

This commit removes a TODO that described a potential scenario in which a request-only dependency cycle could deadlock due to a simultaneous transaction abort. I realized after writing a full fix (<add ref>) that such a deadlock was not achievable in practice due to the semantics of the lockTable. > It may appear that there is a bug here in the handling of request-only > dependency cycles. If such a cycle was broken by simultaneously aborting > the transactions responsible for each of the request, there would be no > guarantee that an aborted pusher would notice that its own transaction > was aborted before it notices that its pushee's transaction was aborted. > For example, in the simplest case, imagine two requests deadlocked on > each other. If their transactions are both aborted and each push notices > the pushee is aborted first, they will both return here triumphantly and > wait for the other to exit its lock wait-queues, leading to deadlock. > Even if they eventually pushed each other again, there would be no > guarantee that the same thing wouldn't happen. > > However, such a situation is not possible in practice because such a > dependency cycle is never constructed by the lockTable. The lockTable > assigns each request a monotonically increasing sequence number upon its > initial entrance to the lockTable. This sequence number is used to > straighten out dependency chains of requests such that a request only > waits on conflicting requests with lower sequence numbers than its own > sequence number. This behavior guarantees that request-only dependency > cycles are never constructed by the lockTable. Put differently, all > dependency cycles must include at least one dependency on a lock and, > therefore, one call to pushLockTxn. Unlike pushRequestTxn, pushLockTxn > actively removes the conflicting lock and removes the dependency when it > determines that its pushee transaction is aborted. This means that the > call to pushLockTxn will continue to make forward progress in the case of > a simultaneous abort of all transactions behind the members of the cycle, > preventing such a hypothesized deadlock from ever materializing.

nvanbenschoten · 2020-04-22T18:02:21Z

bors r+

craig · 2020-04-22T19:02:22Z

Build failed (retrying...)

GitHub CI (Cockroach)

craig · 2020-04-22T20:23:19Z

Build succeeded

GitHub CI (Cockroach)

nvanbenschoten requested a review from sumeerbhola April 17, 2020 17:56

nvanbenschoten requested a review from a team as a code owner April 17, 2020 17:56

nvanbenschoten force-pushed the nvanbenschoten/lockTableWaiterCancel3 branch from ae7781b to 561b783 Compare April 17, 2020 20:48

sumeerbhola approved these changes Apr 20, 2020

View reviewed changes

nvanbenschoten force-pushed the nvanbenschoten/lockTableWaiterCancel3 branch from 561b783 to d4cbe15 Compare April 21, 2020 17:06

nvanbenschoten commented Apr 21, 2020

View reviewed changes

nvanbenschoten force-pushed the nvanbenschoten/lockTableWaiterCancel3 branch from d4cbe15 to ebcba16 Compare April 21, 2020 17:27

sumeerbhola approved these changes Apr 21, 2020

View reviewed changes

nvanbenschoten added 6 commits April 22, 2020 14:02

kv/concurrency: remove waitingStat's dur and access fields

b8fabfc

These were unused and its not clear when they would be useful. They were an added burden to the implementation of lockTable, so it's better to remove them.

kv/concurrency: remove waitingStat's ts fields

75d933e

This field was used, but none of the uses were necessary. The field was an added burden to the implementation of lockTable, so it's better to remove it.

kv/concurrency: clean up waitKind and waitingState types

f60948d

Strictly renames and added comments. Just a refactor.

kv/concurrency: minor cleanup of lockTable

9563601

This makes some of the logic easier to understand.

nvanbenschoten force-pushed the nvanbenschoten/lockTableWaiterCancel3 branch from ebcba16 to 6ac38be Compare April 22, 2020 18:02

craig bot merged commit a1296f9 into cockroachdb:master Apr 22, 2020

nvanbenschoten deleted the nvanbenschoten/lockTableWaiterCancel3 branch April 23, 2020 02:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv/concurrency: remove TODO about impossible deadlock scenario #47616

kv/concurrency: remove TODO about impossible deadlock scenario #47616

nvanbenschoten commented Apr 17, 2020

cockroach-teamcity commented Apr 17, 2020

blathers-crl bot commented Apr 17, 2020

sumeerbhola left a comment

nvanbenschoten left a comment

blathers-crl bot commented Apr 21, 2020

nvanbenschoten commented Apr 21, 2020

craig bot commented Apr 21, 2020

nvanbenschoten commented Apr 21, 2020

blathers-crl bot commented Apr 21, 2020

sumeerbhola left a comment

craig bot commented Apr 21, 2020

craig bot commented Apr 21, 2020

craig bot commented Apr 21, 2020

nvanbenschoten commented Apr 22, 2020

craig bot commented Apr 22, 2020

craig bot commented Apr 22, 2020

kv/concurrency: remove TODO about impossible deadlock scenario #47616

kv/concurrency: remove TODO about impossible deadlock scenario #47616

Conversation

nvanbenschoten commented Apr 17, 2020

cockroach-teamcity commented Apr 17, 2020

blathers-crl bot commented Apr 17, 2020

sumeerbhola left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

blathers-crl bot commented Apr 21, 2020

nvanbenschoten commented Apr 21, 2020

craig bot commented Apr 21, 2020

Canceled

nvanbenschoten commented Apr 21, 2020

blathers-crl bot commented Apr 21, 2020

sumeerbhola left a comment

Choose a reason for hiding this comment

craig bot commented Apr 21, 2020

Build failed (retrying...)

craig bot commented Apr 21, 2020

Build failed (retrying...)

craig bot commented Apr 21, 2020

Build failed

nvanbenschoten commented Apr 22, 2020

craig bot commented Apr 22, 2020

Build failed (retrying...)

craig bot commented Apr 22, 2020

Build succeeded