Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: bump LeaseAppliedIndex instead of reproposing #27054

Closed
tbg opened this issue Jun 28, 2018 · 2 comments
Closed

storage: bump LeaseAppliedIndex instead of reproposing #27054

tbg opened this issue Jun 28, 2018 · 2 comments
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@tbg
Copy link
Member

tbg commented Jun 28, 2018

In #26830, we observed an instance in which the uncommitted portion of the Raft log grew to over 7GB of size, leading to an out-of-memory error due to being loaded into memory on election. This particular out-of-memory issue has been fixed, but not the process by which the uncommitted Raft log grew to 7GB in the first place.

This likely happened because the split queue retried the split repeatedly, each time proposing on the order of 100mb into Raft. This in itself is a problem (that has partially been addressed, see #25233) but a problematic piece of code related to this is the reproposal logic:

https://github.com/cockroachdb/cockroach/blob/master/pkg/storage/replica.go#L4396-L4417

which repeatedly creates copies of proposals that don't commit within a target interval. Consider the case in which Raft commit is significantly delayed or even stalled (lost quorum) -- the mechanism is essentially blowing up the uncommitted tail of the Raft log for no good reason.

This raises the question why we're even trying to repropose these commands. We should simply propose a "no-op" through Raft that allocates a new LeaseAppliedIndex, after which the pending proposals below it are discarded (to be retried externally).

The semantics of that approach seem saner as we never duplicate large commands and as proposers already have to expect retries and are equipped to handle them.

@bdarnell this seems worth doing for 2.1, if you don't have any reservations. The change shouldn't be too big (probably most of the the time spent fixing up the tests that test this through intimate knowledge of the current code).

@tbg tbg added the A-kv-replication Relating to Raft, consensus, and coordination. label Jun 28, 2018
@bdarnell
Copy link
Contributor

I'm fine with doing this for reasonTicks reproposals. I worry that doing it for reasonNewLeader would be unnecessarily disruptive and lead to more time spent re-evaluating things that could simply be reproposals.

@tbg tbg added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 22, 2018
@tbg tbg added this to the Later milestone Jul 22, 2018
@petermattis petermattis removed this from the Later milestone Oct 5, 2018
@tbg
Copy link
Member Author

tbg commented Oct 11, 2018

With the Raft log growth problems fixed, I don't think this needs to be done.

@tbg tbg closed this as completed Oct 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

3 participants