concurrency: extend testing for waitQueueMaxLengthExceeded #104147

arulajmani · 2023-05-31T14:27:39Z

This patch adds a test for the rare scenario where a request re-scans the lock table and finds itself already waiting at a lock. We test the case where adding a new request to the lock's wait queue would cause the waitQueueMaxLengthExceeded state to be triggered -- however, this shouldn't happen, as the request has already been accounted for.

Note that this wasn't broken; we're only adding a test here to ensure we don't regress this behavior as we go about refactoring tryActiveWait. Tested using the diff below and confirmed the test does indeed fail:

--- a/pkg/kv/kvserver/concurrency/lock_table.go
+++ b/pkg/kv/kvserver/concurrency/lock_table.go
@@ -1706,7 +1706,7 @@ func (l *lockState) tryActiveWait(
        defer g.mu.Unlock()
        if str == lock.Intent {
                var qg *queuedGuard
-               if _, inQueue := g.mu.locks[l]; inQueue {
+               if _, inQueue := g.mu.locks[l]; inQueue && false {

Release note: None

Informs: #102210

cockroach-teamcity · 2023-05-31T14:27:50Z

This change is

arulajmani

@nvanbenschoten this is prompted by why we didn't see test failures because of the TODO you pointed out in https://reviewable.io/reviews/cockroachdb/cockroach/102973#-NVAqrhl0xO-JkJnGYXU.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/kv/kvserver/concurrency/testdata/lock_table/queue_length_exceeded line 248 at r1 (raw file):

# that's already actively waiting at a lock, which is something that cannot
# happen outside of unit tests. tryActiveWait doesn't expect this, and doesn't
# handle this state transition -- we could teach it, but it would be just for

I could go either way on this one. Whatever we do here is going to be temporary given we're changing how tryActiveWait works.

This patch adds a test for the rare scenario where a request re-scans the lock table and finds itself already waiting at a lock. We test the case where adding a new request to the lock's wait queue would cause the waitQueueMaxLengthExceeded state to be triggered -- however, this shouldn't happen, as the request has already been accounted for. Note that this wasn't broken; we're only adding a test here to ensure we don't regress this behavior as we go about refactoring `tryActiveWait`. Tested using the diff below and confirmed the test does indeed fail: ``` --- a/pkg/kv/kvserver/concurrency/lock_table.go +++ b/pkg/kv/kvserver/concurrency/lock_table.go @@ -1706,7 +1706,7 @@ func (l *lockState) tryActiveWait( defer g.mu.Unlock() if str == lock.Intent { var qg *queuedGuard - if _, inQueue := g.mu.locks[l]; inQueue { + if _, inQueue := g.mu.locks[l]; inQueue && false { ``` Release note: None Informs: cockroachdb#102210

nvanbenschoten

Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)

pkg/kv/kvserver/concurrency/testdata/lock_table/queue_length_exceeded line 247 at r2 (raw file):

which is something that cannot happen outside of unit tests.

How difficult would it be to make this test behave like production? These kinds of test-only states can add friction when trying to make future, unrelated changes.

arulajmani

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/kv/kvserver/concurrency/testdata/lock_table/queue_length_exceeded line 247 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

which is something that cannot happen outside of unit tests.

How difficult would it be to make this test behave like production? These kinds of test-only states can add friction when trying to make future, unrelated changes.

It's doable if we run this test at the concurrency manager level, and not at the lock table level. The setup required to trigger this would be a bit involved, but it is doable. I can update the PR to make this change.

Maybe we should add an assertion in tryActiveWait to ensure a request isn't already actively waiting at the lock. That'll make it impossible to write such stuff in the future.

nvanbenschoten

Reviewed 1 of 1 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)

pkg/kv/kvserver/concurrency/testdata/lock_table/queue_length_exceeded line 247 at r2 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

It's doable if we run this test at the concurrency manager level, and not at the lock table level. The setup required to trigger this would be a bit involved, but it is doable. I can update the PR to make this change.

Maybe we should add an assertion in tryActiveWait to ensure a request isn't already actively waiting at the lock. That'll make it impossible to write such stuff in the future.

I had missed that this was a lock_table test and not a concurrency_manager test. I'm more ok with this behaving in a non-prod manner, given the scope of the test framework.

arulajmani

TFTR!

bors r=nvanbenschoten

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/kv/kvserver/concurrency/testdata/lock_table/queue_length_exceeded line 247 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I had missed that this was a lock_table test and not a concurrency_manager test. I'm more ok with this behaving in a non-prod manner, given the scope of the test framework.

FWIW, I'd initially written this test without realizing I could do this, and the setup felt needlessly complex. This is much easier/simpler to reason about, because it's testing the case we're interested in directly. So if you're okay with this, I'm happy to merge ahead.

craig · 2023-06-01T19:55:25Z

Build succeeded:

Bazel Essential CI (Cockroach)

arulajmani requested a review from nvanbenschoten May 31, 2023 14:27

arulajmani requested a review from a team as a code owner May 31, 2023 14:27

arulajmani commented May 31, 2023

View reviewed changes

arulajmani force-pushed the wait-queue-exceeded branch from bfe15b0 to 8372915 Compare May 31, 2023 14:52

nvanbenschoten reviewed Jun 1, 2023

View reviewed changes

arulajmani commented Jun 1, 2023

View reviewed changes

nvanbenschoten approved these changes Jun 1, 2023

View reviewed changes

arulajmani commented Jun 1, 2023

View reviewed changes

craig bot merged commit db8257a into cockroachdb:master Jun 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

concurrency: extend testing for waitQueueMaxLengthExceeded #104147

concurrency: extend testing for waitQueueMaxLengthExceeded #104147

arulajmani commented May 31, 2023

cockroach-teamcity commented May 31, 2023

arulajmani left a comment

nvanbenschoten left a comment

arulajmani left a comment

nvanbenschoten left a comment

arulajmani left a comment

craig bot commented Jun 1, 2023

concurrency: extend testing for waitQueueMaxLengthExceeded #104147

concurrency: extend testing for waitQueueMaxLengthExceeded #104147

Conversation

arulajmani commented May 31, 2023

cockroach-teamcity commented May 31, 2023

arulajmani left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

arulajmani left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

arulajmani left a comment

Choose a reason for hiding this comment

craig bot commented Jun 1, 2023