Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrency: extend testing for waitQueueMaxLengthExceeded #104147

Merged
merged 1 commit into from
Jun 1, 2023

Conversation

arulajmani
Copy link
Collaborator

This patch adds a test for the rare scenario where a request re-scans the lock table and finds itself already waiting at a lock. We test the case where adding a new request to the lock's wait queue would cause the waitQueueMaxLengthExceeded state to be triggered -- however, this shouldn't happen, as the request has already been accounted for.

Note that this wasn't broken; we're only adding a test here to ensure we don't regress this behavior as we go about refactoring tryActiveWait. Tested using the diff below and confirmed the test does indeed fail:

--- a/pkg/kv/kvserver/concurrency/lock_table.go
+++ b/pkg/kv/kvserver/concurrency/lock_table.go
@@ -1706,7 +1706,7 @@ func (l *lockState) tryActiveWait(
        defer g.mu.Unlock()
        if str == lock.Intent {
                var qg *queuedGuard
-               if _, inQueue := g.mu.locks[l]; inQueue {
+               if _, inQueue := g.mu.locks[l]; inQueue && false {

Release note: None

Informs: #102210

@arulajmani arulajmani requested a review from nvanbenschoten May 31, 2023 14:27
@arulajmani arulajmani requested a review from a team as a code owner May 31, 2023 14:27
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvanbenschoten this is prompted by why we didn't see test failures because of the TODO you pointed out in https://reviewable.io/reviews/cockroachdb/cockroach/102973#-NVAqrhl0xO-JkJnGYXU.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/concurrency/testdata/lock_table/queue_length_exceeded line 248 at r1 (raw file):

# that's already actively waiting at a lock, which is something that cannot
# happen outside of unit tests. tryActiveWait doesn't expect this, and doesn't
# handle this state transition -- we could teach it, but it would be just for

I could go either way on this one. Whatever we do here is going to be temporary given we're changing how tryActiveWait works.

This patch adds a test for the rare scenario where a request re-scans
the lock table and finds itself already waiting at a lock. We test the
case where adding a new request to the lock's wait queue would cause
the waitQueueMaxLengthExceeded state to be triggered -- however, this
shouldn't happen, as the request has already been accounted for.

Note that this wasn't broken; we're only adding a test here to ensure
we don't regress this behavior as we go about refactoring
`tryActiveWait`. Tested using the diff below and confirmed the test does
indeed fail:

```
--- a/pkg/kv/kvserver/concurrency/lock_table.go
+++ b/pkg/kv/kvserver/concurrency/lock_table.go
@@ -1706,7 +1706,7 @@ func (l *lockState) tryActiveWait(
        defer g.mu.Unlock()
        if str == lock.Intent {
                var qg *queuedGuard
-               if _, inQueue := g.mu.locks[l]; inQueue {
+               if _, inQueue := g.mu.locks[l]; inQueue && false {
```

Release note: None

Informs: cockroachdb#102210
@arulajmani arulajmani force-pushed the wait-queue-exceeded branch from bfe15b0 to 8372915 Compare May 31, 2023 14:52
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)


pkg/kv/kvserver/concurrency/testdata/lock_table/queue_length_exceeded line 247 at r2 (raw file):

which is something that cannot happen outside of unit tests.

How difficult would it be to make this test behave like production? These kinds of test-only states can add friction when trying to make future, unrelated changes.

Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/concurrency/testdata/lock_table/queue_length_exceeded line 247 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

which is something that cannot happen outside of unit tests.

How difficult would it be to make this test behave like production? These kinds of test-only states can add friction when trying to make future, unrelated changes.

It's doable if we run this test at the concurrency manager level, and not at the lock table level. The setup required to trigger this would be a bit involved, but it is doable. I can update the PR to make this change.

Maybe we should add an assertion in tryActiveWait to ensure a request isn't already actively waiting at the lock. That'll make it impossible to write such stuff in the future.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)


pkg/kv/kvserver/concurrency/testdata/lock_table/queue_length_exceeded line 247 at r2 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

It's doable if we run this test at the concurrency manager level, and not at the lock table level. The setup required to trigger this would be a bit involved, but it is doable. I can update the PR to make this change.

Maybe we should add an assertion in tryActiveWait to ensure a request isn't already actively waiting at the lock. That'll make it impossible to write such stuff in the future.

I had missed that this was a lock_table test and not a concurrency_manager test. I'm more ok with this behaving in a non-prod manner, given the scope of the test framework.

Copy link
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r=nvanbenschoten

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/concurrency/testdata/lock_table/queue_length_exceeded line 247 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I had missed that this was a lock_table test and not a concurrency_manager test. I'm more ok with this behaving in a non-prod manner, given the scope of the test framework.

FWIW, I'd initially written this test without realizing I could do this, and the setup felt needlessly complex. This is much easier/simpler to reason about, because it's testing the case we're interested in directly. So if you're okay with this, I'm happy to merge ahead.

@craig
Copy link
Contributor

craig bot commented Jun 1, 2023

Build succeeded:

@craig craig bot merged commit db8257a into cockroachdb:master Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants