-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv/kvserver: TestInitRaftGroupOnRequest failed #47551
Labels
Milestone
Comments
cockroach-teamcity
added
C-test-failure
Broken test (automatically or manually discovered).
O-robot
Originated from a bot.
branch-provisional_202004152236_v20.1.0-rc.2
labels
Apr 16, 2020
Duplicate #47231. |
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Apr 17, 2020
Fixes cockroachdb#42808. Fixes cockroachdb#44146. Fixes cockroachdb#47020. Fixes cockroachdb#47551. Fixes cockroachdb#47231. Disable async intent resolution. This can lead to flakiness in the test because it allows for the intents written by the split transaction to be resolved at any time, including after the nodes are restarted. The intent resolution on the RHS's local range descriptor intent can both wake up the RHS range's Raft group and result in the wrong replica acquiring the lease. I was always seeing this in conjunction with the log line: ``` kv/kvserver/intentresolver/intent_resolver.go:746 failed to gc transaction record: could not GC completed transaction anchored at /Local/Range/Table/50/RangeDescriptor: node unavailable; try another peer ``` Before the fix, the test failed almost immediately when stressed on a roachprod cluster. After, I've never seen it flake: ``` 576962 runs so far, 0 failures, over 19m35s ``` I think this may have gotten more flaky after we began batching intent resolution, as this batching also introduced a delay to the async task. I'll backport this to the past few release branches.
craig bot
pushed a commit
that referenced
this issue
Apr 18, 2020
47625: kv: deflake TestInitRaftGroupOnRequest r=nvanbenschoten a=nvanbenschoten Fixes #42808. Fixes #44146. Fixes #47020. Fixes #47551. Fixes #47231. Disable async intent resolution. This can lead to flakiness in the test because it allows for the intents written by the split transaction to be resolved at any time, including after the nodes are restarted. The intent resolution on the RHS's local range descriptor intent can both wake up the RHS range's Raft group and result in the wrong replica acquiring the lease. I was always seeing this in conjunction with the log line: ``` kv/kvserver/intentresolver/intent_resolver.go:746 failed to gc transaction record: could not GC completed transaction anchored at /Local/Range/Table/50/RangeDescriptor: node unavailable; try another peer ``` Before the fix, the test failed almost immediately when stressed on a roachprod cluster. After, I've never seen it flake: ``` 576962 runs so far, 0 failures, over 19m35s ``` I think this may have gotten more flaky after we began batching intent resolution, as this batching also introduced a delay to the async task. I'll backport this to the past few release branches. Co-authored-by: Nathan VanBenschoten <[email protected]>
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Apr 18, 2020
Fixes cockroachdb#42808. Fixes cockroachdb#44146. Fixes cockroachdb#47020. Fixes cockroachdb#47551. Fixes cockroachdb#47231. Disable async intent resolution. This can lead to flakiness in the test because it allows for the intents written by the split transaction to be resolved at any time, including after the nodes are restarted. The intent resolution on the RHS's local range descriptor intent can both wake up the RHS range's Raft group and result in the wrong replica acquiring the lease. I was always seeing this in conjunction with the log line: ``` kv/kvserver/intentresolver/intent_resolver.go:746 failed to gc transaction record: could not GC completed transaction anchored at /Local/Range/Table/50/RangeDescriptor: node unavailable; try another peer ``` Before the fix, the test failed almost immediately when stressed on a roachprod cluster. After, I've never seen it flake: ``` 576962 runs so far, 0 failures, over 19m35s ``` I think this may have gotten more flaky after we began batching intent resolution, as this batching also introduced a delay to the async task. I'll backport this to the past few release branches.
This was referenced Apr 18, 2020
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Apr 18, 2020
Fixes cockroachdb#42808. Fixes cockroachdb#44146. Fixes cockroachdb#47020. Fixes cockroachdb#47551. Fixes cockroachdb#47231. Disable async intent resolution. This can lead to flakiness in the test because it allows for the intents written by the split transaction to be resolved at any time, including after the nodes are restarted. The intent resolution on the RHS's local range descriptor intent can both wake up the RHS range's Raft group and result in the wrong replica acquiring the lease. I was always seeing this in conjunction with the log line: ``` kv/kvserver/intentresolver/intent_resolver.go:746 failed to gc transaction record: could not GC completed transaction anchored at /Local/Range/Table/50/RangeDescriptor: node unavailable; try another peer ``` Before the fix, the test failed almost immediately when stressed on a roachprod cluster. After, I've never seen it flake: ``` 576962 runs so far, 0 failures, over 19m35s ``` I think this may have gotten more flaky after we began batching intent resolution, as this batching also introduced a delay to the async task. I'll backport this to the past few release branches.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
(kv/kvserver).TestInitRaftGroupOnRequest failed on provisional_202004152236_v20.1.0-rc.2@778550737c285409cc1d9c6d96584fc985071c79:
More
Parameters:
Related:
See this test on roachdash
powered by pkg/cmd/internal/issues
The text was updated successfully, but these errors were encountered: