-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
circleci: failed tests: TestReplicaRemovalCampaign #7386
Comments
This is easily reproducible with:
|
This was introduced in f447eb0, looks like we're tickling something new here. |
I'll revert that PR until we figure this out. |
This reverts commit cf5fee1. See cockroachdb#7386.
Looking. |
First results: In the test, we call But
I'll look a bit more into why this crashes our beta cluster even after the revert, with my starting point being that we destroy replica data on startup ( |
When we call |
Yes, that's the above conclusion. I want to understand the production crash On Wed, Jun 22, 2016 at 3:05 PM Peter Mattis [email protected]
-- Tobias |
Mark Replicas as destroyed in `Replica.Destroy` and do not perform any further raft operations after destruction. This addresses the underlying bug in cockroachdb#7386 that was exacerbated by cockroachdb#7355. The specific sequence of events in cockroachdb#7386 was that a replica was queued for replication and the test then removed the replica from the store. Removal of a replica from a store removes it from the various queues, though that is done asynchronously. The replica queue merrily tries to process the now destroyed replica and as a first step tries to acquire the leader lease. This leader lease acquisition is done via a raft command, but the state underlying the replica has already been deleted. Boom!
Mark Replicas as destroyed in `Replica.Destroy` and do not perform any further raft operations after destruction. This addresses the underlying bug in cockroachdb#7386 that was exacerbated by cockroachdb#7355. The specific sequence of events in cockroachdb#7386 was that a replica was queued for replication and the test then removed the replica from the store. Removal of a replica from a store removes it from the various queues, though that is done asynchronously. The replica queue merrily tries to process the now destroyed replica and as a first step tries to acquire the leader lease. This leader lease acquisition is done via a raft command, but the state underlying the replica has already been deleted. Boom!
Fixed in #7404. |
The following test appears to have failed:
#19440:
Please assign, take a look and update the issue accordingly.
The text was updated successfully, but these errors were encountered: