-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backport-2.0: storage: prevent unbounded raft log growth without quorum #27868
backport-2.0: storage: prevent unbounded raft log growth without quorum #27868
Commits on Aug 6, 2018
-
storage: prevent unbounded raft log growth without quorum
Fixes cockroachdb#27772. This change adds safeguards to prevent cases where a raft log would grow without bound during loss of quorum scenarios. It also adds a new test that demonstrates that the raft log does not grow without bound in these cases. There are two cases that need to be handled to prevent the unbounded raft log growth observed in cockroachdb#27772. 1. When the leader proposes a command and cannot establish a quorum. In this case, we know the leader has the entry in its log, so there's no need to refresh it with `reasonTicks`. To avoid this, we no longer use `refreshTicks` as a leader. 2. When a follower proposes a command that is forwarded to the leader who cannot establish a quorum. In this case, the follower can't be sure (currently) that the leader got the proposal, so it needs to refresh using `reasonTicks`. However, the leader now detects duplicate forwarded proposals and avoids appending redundant entries to its log. It does so by maintaining a set of in-flight forwarded proposals that it has received during its term as leader. This set is reset after every leadership change. Both of these cases are tested against in the new TestLogGrowthWhenRefreshingPendingCommands. Without both of the safeguards introduced in this commit, the test fails. Release note (bug fix): Prevent unbounded growth of the raft log caused by a loss of quorum.
Configuration menu - View commit details
-
Copy full SHA for 20cfb25 - Browse repository at this point
Copy the full SHA 20cfb25View commit details -
storage: s/replica.mu.proposals/replica.mu.localProposals/
Release note: None
Configuration menu - View commit details
-
Copy full SHA for bc1e14a - Browse repository at this point
Copy the full SHA bc1e14aView commit details -
storage: cancel liveness update context on Stopper cancellation
Fixes cockroachdb#27878. The test was flaky because a NodeLiveness update was getting stuck when killing a majority of nodes in a Range. The fix was to tie NodeLiveness' update context to its stopper so that liveness updates are canceled when their node begins to shut down. This was a real issue that I suspect would occasionally make nodes hang on shutdown when their liveness range was becoming unavailable. There are probably other issues in the same class of bugs, but stressing `TestLogGrowthWhenRefreshingPendingCommands` isn't showing anything else. In doing so, the change needed to extend `stopper.WithCancel` into `stopper.WithCancelOnQuiesce` and `stopper.WithCancelOnStop`. Release note: None
Configuration menu - View commit details
-
Copy full SHA for 723f1af - Browse repository at this point
Copy the full SHA 723f1afView commit details -
stopper: avoid memory leak in Stopper.WithCancel methods
Before this change, `WithCancelOnQuiesce` and `WithCancelOnStop` were dangerous to use because they would never clean up memory. This meant that any use of the methods that happened more than a constant number of times would slowly leak memory. This was an issue in `client.Txn.rollback`. This change fixes the methods so that it's possible for callers to clean up after themselves. Added a warning to `Stopper.AddCloser` because this had a similar issue. Release note: None
Configuration menu - View commit details
-
Copy full SHA for b201df1 - Browse repository at this point
Copy the full SHA b201df1View commit details -
stopper: synchronize access to sCancels functions
Fixes cockroachdb#27908. Release note: None
Configuration menu - View commit details
-
Copy full SHA for e2adebe - Browse repository at this point
Copy the full SHA e2adebeView commit details -
stopper: Ensure that Closers and Ctxs are always cleaned up
Before this change the Stopper made no guarantee that `Closers` registered by the `AddCloser` method or `Contexts` tied to the `Stopper` with `WithCancelOn{Quiesce,Stop}` were ever properly cleaned up. In cases where these methods were called concurrently with the Stopper stopping, it was possible for the resources to leak. This change fixes this by handling cases where these methods are called concurrently with or after the Stopper has stopped. This allows them to provide stronger external guarantees. Release note: None
Configuration menu - View commit details
-
Copy full SHA for 7ffc9e1 - Browse repository at this point
Copy the full SHA 7ffc9e1View commit details