-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stress: failed test in cockroach/storage/storage.test: TestSplitSnapshotRace_SnapshotWins #8170
Comments
Looks like a deadlock on |
Root cause looks same as #8149. |
I don't know how this ties into the lock, but definitely looks like one |
Ah, here's the held mtc mutex:
So one Raft command isn't coming back. |
Yep, and it isn't coming back because the Raft transport also wants in on the action:
|
Essentially we're holding the RLock during SendNext, then in another thread we try to stop a store which tries to get a write lock, blocking all future RLocks. But SendNext won't come back until Raft has had a chance to get another read lock at a later point. Frankly I think this pretty beyond repair. The only "easy" fix I can think of is this is acquiring an |
See cockroachdb#7488 and cockroachdb#8170. Attempting to acquire a write lock early in `stopStore` could lead to situations in which an outstanding Raft proposal never returned (due to address resolution calling back into `multiTestContext` with a RLock), but at the same time that write lock being stuck on a read lock held in `SendNext` which in turn waited on Raft: SendNext[hold RLock] -> Raft[want RLock] ʌ / \ v stopStore[want Lock] The solution (which I wasn't able to test, for the flakiness doesn't easily reproduce on my laptop and that's all I have available at the moment) is to acquire first a read lock to quiesce the stopper, which should tell everything downstream to let go of what they're trying to accomplish before the opportunity for deadlock presents itself. I'm sure there will be another one, though.
There's no longer much need to do all this manual mocking; we should move these tests to |
Binary: cockroach/static-tests.tar.gz sha: https://github.com/cockroachdb/cockroach/commits/b624f98deda333293ba82b5bf9bf603e6670728d
Stress build found a failed test:
Run Details:
Please assign, take a look and update the issue accordingly.
The text was updated successfully, but these errors were encountered: