-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: fix conflict resolution for high-priority, non-txn'al requests #83366
kv: fix conflict resolution for high-priority, non-txn'al requests #83366
Conversation
Thanks for the prompt fix and explanation, I'll work on a backup level test with this fix. I've given the diff a look but I'll leave the review to @AlexTalks who is more familiar with this code. |
584f78e
to
7bd4477
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider printing the push reason(s) like we were discussing
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru and @AlexTalks)
Fixes cockroachdb#83342. This commit reworks the behavior of non-transactional requests in lock wait-queues when the lock holder or the lock queue waiter has an extreme priority (min or max priority). In such cases, we allow the lock queue waiter to immediately push the lock holder out of its way, either by moving its timestamp to resolve a read-write conflict or aborting it to resolve a write-write conflict. This handling was broken in two ways for non-transactional requests. 1. these requests' priorities were not consulted when deciding whether to immediately push instead of temporarily delaying while waiting in the lock wait-queue. This meant that a high-priority, non-txn request might still wait for 50ms (kv.lock_table.coordinator_liveness_push_delay) before pushing a lower priority lock holder out of its way. 2. worse, it was possible that if these requests were not in the front of a lock wait-queue, they might never push. This was because we had logic that disabled a push if it was not needed for the purposes of checking liveness, detecting deadlocks, or enforcing timeouts. This commit resolves both of these issues. It also improves the testing of transaction priorities in the `kv/kvserver/concurrency` package. Finally, it consolidates the determination of when a pusher should be able to push/abort a pushee into a single location. Release note (bug fix): a bug in transaction conflict resolution which could allow backups to wait on long-running transactions has been resolved.
This commit adds the reason(s) for pushing a lock holder while in a lock wait-queue to trace events.
b43f7dc
to
f6a7a2c
Compare
Done. TFTR! bors r=andreimatei |
Build failed (retrying...): |
Build failed (retrying...): |
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 83ac099 to blathers/backport-release-21.2-83366: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 21.2.x failed. See errors above. error creating merge commit from 83ac099 to blathers/backport-release-22.1-83366: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Fixes #83342.
This commit reworks the behavior of non-transactional requests in lock
wait-queues when the lock holder or the lock queue waiter has an extreme
priority (min or max priority). In such cases, we allow the lock queue
waiter to immediately push the lock holder out of its way, either by
moving its timestamp to resolve a read-write conflict or aborting it to
resolve a write-write conflict.
This handling was broken in two ways for non-transactional requests.
to immediately push instead of temporarily delaying while waiting in the
lock wait-queue. This meant that a high-priority, non-txn request might
still wait for 50ms (kv.lock_table.coordinator_liveness_push_delay)
before pushing a lower priority lock holder out of its way.
of a lock wait-queue, they might never push. This was because we had
logic that disabled a push if it was not needed for the purposes of
checking liveness, detecting deadlocks, or enforcing timeouts.
This commit resolves both of these issues. It also improves the testing
of transaction priorities in the
kv/kvserver/concurrency
package.Finally, it consolidates the determination of when a pusher should be
able to push/abort a pushee into a single location.
Release note (bug fix): a bug in transaction conflict resolution which
could allow backups to wait on long-running transactions has been
resolved.