-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: avoid reading uncommitted tail of Raft log when becoming leader #18601
Comments
Are we still planning on getting to this before the 2.0 release? |
It's less crucial thanks to the quota pool (which limits the size of the uncommitted tail of the log), though it still has some value (we thought this was still worth doing even though this issue was created after the quota pool landed). I don't know if it's going to make the cut for 2.0, though. I think it ranks below fixing PreVote (#18151) as far as raft changes go. |
I recall an incident post quota pool where we saw a very large uncommitted tail of the log due to re-proposals. I don't recall the details, but I think @a-robinson looked at this too and he has a fantastic memory for this stuff. |
The incident I looked into (#15702) was pre-quota pool. A 40MB delete operation got re-proposed 66 times, kicking off the infinite cycle of raft elections. Even the first proposal triggered an election due to the high latency / low bandwidth, but if reproposals hadn't been allowed then things presumably wouldn't have spun so out of control. Around the same time, we also saw it during the uncommon combination of a dropping a large database and running a restore at the same time while running on terrible disks (#15681). #18199 happened post-quota pool, though. I think understanding @bdarnell's questions in #18199 (comment) would be helpful for properly prioritizing this. I'll personally remain worried about it unless we know it actually happened while they were running 1.0.x or we understand how the quota pool didn't prevent it. |
The quota pool also doesn't prevent reproposals, and the Raft log could grow that way too. |
I guess I should have checked the code. If that's the case, then consider me still fairly worried about this. |
Scanning the uncommitted portion of the raft log to determine whether there are any pending config changes can be expensive. In cockroachdb/cockroach#18601, we've seen that a new leader can spend so much time scanning its log post-election that it fails to send its first heartbeats in time to prevent a second election from starting immediately. Instead of tracking whether a pending config change exists with a boolean, this commit tracks the latest log index at which a pending config change *could* exist. This is a less expensive solution to the problem, and the impact of false positives should be minimal since a newly-elected leader should be able to quickly commit the tail of its log.
Scanning the uncommitted portion of the raft log to determine whether there are any pending config changes can be expensive. In cockroachdb/cockroach#18601, we've seen that a new leader can spend so much time scanning its log post-election that it fails to send its first heartbeats in time to prevent a second election from starting immediately. Instead of tracking whether a pending config change exists with a boolean, this commit tracks the latest log index at which a pending config change *could* exist. This is a less expensive solution to the problem, and the impact of false positives should be minimal since a newly-elected leader should be able to quickly commit the tail of its log.
Fixes cockroachdb#18601 Release note (bug fix): Fix a bug in which ranges could get stuck if the uncommitted raft log grew too large
Scanning the uncommitted portion of the raft log to determine whether there are any pending config changes can be expensive. In cockroachdb/cockroach#18601, we've seen that a new leader can spend so much time scanning its log post-election that it fails to send its first heartbeats in time to prevent a second election from starting immediately. Instead of tracking whether a pending config change exists with a boolean, this commit tracks the latest log index at which a pending config change *could* exist. This is a less expensive solution to the problem, and the impact of false positives should be minimal since a newly-elected leader should be able to quickly commit the tail of its log.
Picks up a cherry-picked version of etcd-io/etcd#9073, to fix cockroachdb#18601 Release note (bug fix): Fixes potential cluster unavailability after raft logs grow too large.
24889: cherrypick-1.1: build: Update etcd r=bdarnell a=bdarnell Picks up a cherry-picked version of etcd-io/etcd#9073, to fix #18601 Release note (bug fix): Fixes potential cluster unavailability after raft logs grow too large. Co-authored-by: Ben Darnell <[email protected]>
The work will all be upstream in
etcd/raft
. Filing an issue here for tracking purposes.Forked from a comment on #18199:
@petermattis says:
@bdarnell responds:
The text was updated successfully, but these errors were encountered: