-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: unexpected Raft re-proposals during split transaction #10160
Comments
You've ruled out |
The time is spent in the |
Looks like we're splitting |
Well, one part of the problem is definitely due to not campaigning the idle |
Eager campaigning of idle replicas is not being performed because the initial store used for bootstrapping the cluster is only used temporarily in |
Extend the change in cockroachdb#9550 to allow eagerly campaigning replicas after startup to include clusters created via TestCluster. Fixes cockroachdb#10160
Extend the change in cockroachdb#9550 to allow eagerly campaigning replicas after startup to include clusters created via TestCluster. Fixes cockroachdb#10160
@petermattis I'm reopening this issue. Recall in the original description that we sometimes experience 10s delays due to split queue mishaps. This is the cause of @tamird's issue #10184. What's happening is as follows:
So then we need to wait for the next scan interval for the range to be reconsidered for splitting. |
The unexpected reproposals were fixed. Now to fix the slow splits. |
After splitting a range, further split work might be required if the zone config changed or a table was created concurrently or if the range was much larger than the target size. Before this change, TestSplitAtTableBoundary had a wide variance in how long it would take. Sometimes it would completely in less than a second while other times it would take 10-20 seconds. With this change it reliably completes in less than a second. Fixes cockroachdb#10160
After splitting a range, further split work might be required if the zone config changed or a table was created concurrently or if the range was much larger than the target size. Before this change, TestSplitAtTableBoundary had a wide variance in how long it would take. Sometimes it would completely in less than a second while other times it would take 10-20 seconds. With this change it reliably completes in less than a second. Fixes cockroachdb#10160
This doesn't help #9624. That seems to be a different problem. I'll take a look. |
@bdarnell this simple test (just add it to the
pkg/sql
directory) exhibits behavior I'm not understanding. I have a triplicated cluster and create a table. I then wait for the table to be split along the expected boundary. Most times I run it, it takes 3-5s waiting for Raft reproposals. I've done a fair bit of digging, and it's very consistent what happens. The lost Raft batch includes just the start of the txn, which adjust the LHSRangeDescriptor
. After a few hours tracking it this far, I felt like it'd be more reasonable to turn this over to the expert.Sometimes it takes about 10s to run (seems to be a race related to not adding to the split queue), and other times it takes 25s to run (not sure about that case as it's rare and I lost the logs).
The text was updated successfully, but these errors were encountered: