-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
concurrency: extend testing for waitQueueMaxLengthExceeded #104147
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nvanbenschoten this is prompted by why we didn't see test failures because of the TODO you pointed out in https://reviewable.io/reviews/cockroachdb/cockroach/102973#-NVAqrhl0xO-JkJnGYXU.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/concurrency/testdata/lock_table/queue_length_exceeded
line 248 at r1 (raw file):
# that's already actively waiting at a lock, which is something that cannot # happen outside of unit tests. tryActiveWait doesn't expect this, and doesn't # handle this state transition -- we could teach it, but it would be just for
I could go either way on this one. Whatever we do here is going to be temporary given we're changing how tryActiveWait
works.
This patch adds a test for the rare scenario where a request re-scans the lock table and finds itself already waiting at a lock. We test the case where adding a new request to the lock's wait queue would cause the waitQueueMaxLengthExceeded state to be triggered -- however, this shouldn't happen, as the request has already been accounted for. Note that this wasn't broken; we're only adding a test here to ensure we don't regress this behavior as we go about refactoring `tryActiveWait`. Tested using the diff below and confirmed the test does indeed fail: ``` --- a/pkg/kv/kvserver/concurrency/lock_table.go +++ b/pkg/kv/kvserver/concurrency/lock_table.go @@ -1706,7 +1706,7 @@ func (l *lockState) tryActiveWait( defer g.mu.Unlock() if str == lock.Intent { var qg *queuedGuard - if _, inQueue := g.mu.locks[l]; inQueue { + if _, inQueue := g.mu.locks[l]; inQueue && false { ``` Release note: None Informs: cockroachdb#102210
bfe15b0
to
8372915
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)
pkg/kv/kvserver/concurrency/testdata/lock_table/queue_length_exceeded
line 247 at r2 (raw file):
which is something that cannot happen outside of unit tests.
How difficult would it be to make this test behave like production? These kinds of test-only states can add friction when trying to make future, unrelated changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/concurrency/testdata/lock_table/queue_length_exceeded
line 247 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
which is something that cannot happen outside of unit tests.
How difficult would it be to make this test behave like production? These kinds of test-only states can add friction when trying to make future, unrelated changes.
It's doable if we run this test at the concurrency manager level, and not at the lock table level. The setup required to trigger this would be a bit involved, but it is doable. I can update the PR to make this change.
Maybe we should add an assertion in tryActiveWait
to ensure a request isn't already actively waiting at the lock. That'll make it impossible to write such stuff in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)
pkg/kv/kvserver/concurrency/testdata/lock_table/queue_length_exceeded
line 247 at r2 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
It's doable if we run this test at the concurrency manager level, and not at the lock table level. The setup required to trigger this would be a bit involved, but it is doable. I can update the PR to make this change.
Maybe we should add an assertion in
tryActiveWait
to ensure a request isn't already actively waiting at the lock. That'll make it impossible to write such stuff in the future.
I had missed that this was a lock_table
test and not a concurrency_manager
test. I'm more ok with this behaving in a non-prod manner, given the scope of the test framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r=nvanbenschoten
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/concurrency/testdata/lock_table/queue_length_exceeded
line 247 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I had missed that this was a
lock_table
test and not aconcurrency_manager
test. I'm more ok with this behaving in a non-prod manner, given the scope of the test framework.
FWIW, I'd initially written this test without realizing I could do this, and the setup felt needlessly complex. This is much easier/simpler to reason about, because it's testing the case we're interested in directly. So if you're okay with this, I'm happy to merge ahead.
Build succeeded: |
This patch adds a test for the rare scenario where a request re-scans the lock table and finds itself already waiting at a lock. We test the case where adding a new request to the lock's wait queue would cause the waitQueueMaxLengthExceeded state to be triggered -- however, this shouldn't happen, as the request has already been accounted for.
Note that this wasn't broken; we're only adding a test here to ensure we don't regress this behavior as we go about refactoring
tryActiveWait
. Tested using the diff below and confirmed the test does indeed fail:Release note: None
Informs: #102210