-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv/concurrency: remove TODO about impossible deadlock scenario #47616
kv/concurrency: remove TODO about impossible deadlock scenario #47616
Conversation
❌ The GitHub CI (Cockroach) build has failed on ae7781b4. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
ae7781b
to
561b783
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 18 of 18 files at r1, 3 of 3 files at r2, 14 of 15 files at r3, 4 of 4 files at r4, 2 of 2 files at r5, 1 of 1 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/concurrency/lock_table.go, line 1122 at r4 (raw file):
g.key = l.key g.mu.startWait = true if reservedBySelfTxn {
I like the reduction in local variables in the refactorings.
pkg/kv/kvserver/concurrency/lock_table.go, line 1054 at r5 (raw file):
// reservation if the reservation has a lower seqNum. For reads, the // non-transactional and transactional behavior is equivalent and // handled later in this function.
the comment is now inconsistent with the change that removed sa == spanset.SpanReadWrite
from the if-condition.
pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 454 at r6 (raw file):
// call to pushLockTxn will continue to make forward progress in the case of // a simultaneous abort of all transactions behind the members of the cycle, // preventing such a hypothesized deadlock from ever materializing.
An example here would be helpful. And it may help clarify the definition of "request-only dependency cycle", which is quite narrow. Perhaps something like:
req(1, txn1), req(1, txn2) are both waiting on a lock held by txn3, and they respectively hold a reservation on key "a" and key "b".
req(2, txn2) queues up behind the reservation on key "a" and req(2, txn1) queues up behind the reservation on key "b". Now the dependency cycle between txn1 and txn2 only involves requests, but some of the requests here also depend on a lock. So when both txn1, txn2 are aborted, the req(1, txn1), req(1, txn2) will exit the lockTable, allowing req(2, txn1) and req(2, txn2) to get the reservation and now they no longer depend on each other.
pkg/roachpb/data.proto, line 569 at r1 (raw file):
// A LockAcquisition represents the action of a Transaction acquiring a lock // with a specified durbility level over a Span of keys.
is this using a Span
to be future-proof?
561b783
to
d4cbe15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)
pkg/kv/kvserver/concurrency/lock_table.go, line 1054 at r5 (raw file):
Previously, sumeerbhola wrote…
the comment is now inconsistent with the change that removed
sa == spanset.SpanReadWrite
from the if-condition.
Done.
pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 454 at r6 (raw file):
Previously, sumeerbhola wrote…
An example here would be helpful. And it may help clarify the definition of "request-only dependency cycle", which is quite narrow. Perhaps something like:
req(1, txn1), req(1, txn2) are both waiting on a lock held by txn3, and they respectively hold a reservation on key "a" and key "b".
req(2, txn2) queues up behind the reservation on key "a" and req(2, txn1) queues up behind the reservation on key "b". Now the dependency cycle between txn1 and txn2 only involves requests, but some of the requests here also depend on a lock. So when both txn1, txn2 are aborted, the req(1, txn1), req(1, txn2) will exit the lockTable, allowing req(2, txn1) and req(2, txn2) to get the reservation and now they no longer depend on each other.
Done.
pkg/roachpb/data.proto, line 569 at r1 (raw file):
Previously, sumeerbhola wrote…
is this using a
Span
to be future-proof?
Yes, we're likely to eventually introduce ranged lock acquisition.
❌ The GitHub CI (Cockroach) build has failed on d4cbe154. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
All of the roachtests in CI failed to start with the error |
d4cbe15
to
ebcba16
Compare
Canceled |
bors r+ |
❌ The GitHub CI (Cockroach) build has failed on ebcba168. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 18 files at r7, 1 of 2 files at r11, 1 of 1 files at r12.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)
Build failed (retrying...) |
Build failed (retrying...) |
Build failed |
6f81abe split off a new LockUpdate type from the existing Intent type. This was an improvement because, to that point, the roles of the Intent type had been seriously overloaded, leading to complexity and confusion. 6f81abe has a good commit message explaining this in detail. This separation moved us in the right direction, but it has become clear over the past month that it wasn't complete. Specifically, LockUpdate was still used for two distinct roles: - talking about updating existing locks - talking about acquiring new locks Using LockUpdate for both of these roles still left some room for ambiguity. The type contained fields that clearly only made sense for one of the two roles. For instance, IgnoredSeqnums was clearly only useful for the first role. More concerning, the type contained fields that arguably could have made sense for both roles but weren't actually used for both. This was the case for the Durability field. Originally, we planned on being able to resolve intents at specific durabilities, but it became clear that this was not a good idea. Because of this, the field was only actually used for the second role and was ignored by the IntentResolver. This commit addresses these concerns by splitting off a LockAcquisition type from the LockUpdate type. LockUpdate now exclusively serves the first role while LockAcquisition serves the second. This added type safety avoids confusion and room for bugs. NOTE: LockUpdate is not sent across the wire, so we're able to remove a field from it without reserving the field number.
These were unused and its not clear when they would be useful. They were an added burden to the implementation of lockTable, so it's better to remove them.
This field was used, but none of the uses were necessary. The field was an added burden to the implementation of lockTable, so it's better to remove it.
Strictly renames and added comments. Just a refactor.
This makes some of the logic easier to understand.
This commit removes a TODO that described a potential scenario in which a request-only dependency cycle could deadlock due to a simultaneous transaction abort. I realized after writing a full fix (<add ref>) that such a deadlock was not achievable in practice due to the semantics of the lockTable. > It may appear that there is a bug here in the handling of request-only > dependency cycles. If such a cycle was broken by simultaneously aborting > the transactions responsible for each of the request, there would be no > guarantee that an aborted pusher would notice that its own transaction > was aborted before it notices that its pushee's transaction was aborted. > For example, in the simplest case, imagine two requests deadlocked on > each other. If their transactions are both aborted and each push notices > the pushee is aborted first, they will both return here triumphantly and > wait for the other to exit its lock wait-queues, leading to deadlock. > Even if they eventually pushed each other again, there would be no > guarantee that the same thing wouldn't happen. > > However, such a situation is not possible in practice because such a > dependency cycle is never constructed by the lockTable. The lockTable > assigns each request a monotonically increasing sequence number upon its > initial entrance to the lockTable. This sequence number is used to > straighten out dependency chains of requests such that a request only > waits on conflicting requests with lower sequence numbers than its own > sequence number. This behavior guarantees that request-only dependency > cycles are never constructed by the lockTable. Put differently, all > dependency cycles must include at least one dependency on a lock and, > therefore, one call to pushLockTxn. Unlike pushRequestTxn, pushLockTxn > actively removes the conflicting lock and removes the dependency when it > determines that its pushee transaction is aborted. This means that the > call to pushLockTxn will continue to make forward progress in the case of > a simultaneous abort of all transactions behind the members of the cycle, > preventing such a hypothesized deadlock from ever materializing.
ebcba16
to
6ac38be
Compare
bors r+ |
Build failed (retrying...) |
Build succeeded |
This commit removes a TODO that described a potential scenario in which a request-only dependency cycle could deadlock due to a simultaneous transaction abort. I realized after writing a full fix (9a9182f) that such a deadlock was not achievable in practice due to the semantics of the lockTable.
The PR also includes a good amount of other cleanup that I was intending to land with the fix to this deadlock. For instance, it splits off a
LockAcquisition
type from theLockUpdate
type. It also cleans up the internals oflock_table.go
.