-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: add replicas to replica GC queue when remote reports an error #5789
Comments
When a command is proposed on a replica which as been removed from its group (due to a stale cached range descriptor), that command currently goes into raft-retry limbo until the replica is garbage collected. This appears to be happening in the CI logs posted to #6118, and is why the system is unable to progress past the |
This will be a very valuable addition. When we added consistency checking On Mon, Apr 18, 2016 at 10:00 AM Ben Darnell [email protected]
|
What I'm proposing operates at the level of raft messages and not application-level commands, so it won't really help for the consistency checker (unless you have an alternate design that could encompass both) |
gotcha. I don't have another proposal on the table. On Mon, Apr 18, 2016 at 10:52 AM Ben Darnell [email protected]
|
From #6011: A second use case for this mechanism would be to explicitly acknowledge snapshots, so we can call ReportSnapshot once the snapshot has been applied, instead of optimistically reporting it as done as soon as the MsgSnap has been sent. |
I don't think we can use this to report snapshot status more accurately (raft may internally decide to drop a snapshot on the floor so detecting when a snapshot has been either applied or rejected is tricky), so let's focus on returning some sort of feedback for messages that are rejected because their replica ID is too old, and ensure that those ranges can be GC'd promptly. I'm not sure whether it's better to model this as a GRPC return stream (and adding plumbing between RaftTransport and Store.handleRaftMessage) or as a separate RaftMessageRequest that doesn't map to an underlying etcd/raft message. |
I was just thinking about this; the most straightforward solution is to add On Wed, Jun 29, 2016 at 11:13 AM, Peter Mattis [email protected]
|
Why do you want to close the stream? The stream is per-node, but the message is per-replica. using |
The only trouble is that the stream is unidirectional - sending one of these back necessary terminates the stream. |
Running the 1-to-3 rebalance test showed that we are still sometimes stalling for ~30 seconds; this could be caused by the GC queue taking a bit to actually GC replicas. In anticipation of cockroachdb#5789, we should soon be able to make the GC queue mostly event-driven. In the meantime, aggressive queue timings serve to rule out this factor when investigating upreplication stalls.
When the Raft transport stream returns an error we can use that error as an signal that the replica may need to be GC'd. Suggested in cockroachdb#8130. Fixes cockroachdb#5789.
When the Raft transport stream returns an error we can use that error as an signal that the replica may need to be GC'd. Suggested in cockroachdb#8130. Fixes cockroachdb#5789.
When the Raft transport stream returns an error we can use that error as an signal that the replica may need to be GC'd. Suggested in cockroachdb#8130. Fixes cockroachdb#5789.
When the Raft transport stream returns an error we can use that error as an signal that the replica may need to be GC'd. Suggested in cockroachdb#8130. Fixes cockroachdb#5789.
When the Raft transport stream returns an error we can use that error as an signal that the replica may need to be GC'd. Suggested in cockroachdb#8130. Fixes cockroachdb#5789.
When the Raft transport stream returns an error we can use that error as an signal that the replica may need to be GC'd. Suggested in cockroachdb#8130. Fixes cockroachdb#5789.
When the Raft transport stream returns an error we can use that error as a signal that the replica may need to be GC'd. Suggested in cockroachdb#8130. Fixes cockroachdb#5789.
From #5467:
Raft never gives a negative answer; either the command commits or you timeout (and in general we use periodic retries of raft commands to ensure that it eventually commits). However, if you're on a replica which has been removed from the range, then none of the other replicas will talk to you.
So when a stale range descriptor cache directs a request to a removed replica, we don't handle it well at all. I think we're currently relying on SendNextTimeout to handle this at the rpc layer, but it leaves an orphaned goroutine behind (prior to #5551). Eventually the replicaGCQueue will see that the replica has been removed and cancel all of its outstanding operations (or at least it should), but that takes a long time.
I think we need to add some sort of error messaging to the raft transport, so when a message is rejected for being from an out of date replica, we can send a message to that replica that will trigger a replica GC.
The text was updated successfully, but these errors were encountered: