-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: [dnm] add RACv2 failing split/merge/migrate repros #135339
Closed
kvoli
wants to merge
3
commits into
cockroachdb:master
from
kvoli:241115.rac-repro-send-queue-split-snap
Closed
kvserver: [dnm] add RACv2 failing split/merge/migrate repros #135339
kvoli
wants to merge
3
commits into
cockroachdb:master
from
kvoli:241115.rac-repro-send-queue-split-snap
Commits on Nov 18, 2024
-
kvserver: [dnm] add TestFlowControlSendQueueRangeSplitSnap test
This test currently fails, while also churning snapshots towards the RHS range post-split. It appears that the RHS post-split replica on the store which has no available tokens (from the leader, n1's perspective) is effectively wedged. ```bash dev test pkg/kv/kvserver -v --vmodule='replica_raft=1,kvflowcontroller=2,replica_proposal_buf=1,raft_transport=2,kvflowdispatch=1,kvadmission=1,kvflowhandle=1,work_queue=1,replica_flow_control=1,tracker=1,client_raft_helpers_test=1,raft=1,admission=1,replica_flow_control=1,work_queue=1,replica_raft=1,replica_proposal_buf=1,raft_transport=2,kvadmission=1,work_queue=1,replica_flow_control=1,client_raft_helpers_test=1,range_controller=2,token_counter=2,token_tracker=2,processor=2,kvflowhandle=1' -f TestFlowControlSendQueueRangeSplitSnap --show-logs --timeout=120s ``` Epic: none Release note: None
Configuration menu - View commit details
-
Copy full SHA for 22e3767 - Browse repository at this point
Copy the full SHA 22e3767View commit details -
kvserver: [dnm] add TestFlowControlSendQueueRangeMerge test
This adds a new RACv2 integration test, `TestFlowControlSendQueueRangeMerge`. The test exhausts the tokens on s3 (s1,s2,s3) by issuing 5 MiB / 4 MiB of writes to the RHS and blocking admission on n3. The test currently fails on the merge with: ``` testcluster.go:760: merging at /Table/Max: /Table/Max: merge unexpected error: kv/kvserver/replica_command.go:962: merge failed: waiting for all right-hand replicas to catch up: operation "waiting for merge application" timed out after 5.001s (given timeout 5s): grpc: context deadline exceeded [code 4/DeadlineExceeded] (1) attached stack trace -- stack trace: | github.com/cockroachdb/cockroach/pkg/server.(*testServer).MergeRanges | testserver.go:2039 | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).MergeRanges | pkg/testutils/testcluster/testcluster.go:750 | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).MergeRangesOrFatal | pkg/testutils/testcluster/testcluster.go:758 | github.com/cockroachdb/cockroach/pkg/kv/kvserver_test.TestFlowControlSendQueueRangeMerge | pkg/kv/kvserver/flow_control_integration_test.go:5163 | testing.tRunner | GOROOT/src/testing/testing.go:1689 | runtime.goexit | src/runtime/asm_arm64.s:1222 Wraps: (2) /Table/Max: merge unexpected error: kv/kvserver/replica_command.go:962: merge failed: waiting for all right-hand replicas to catch up: operation "waiting for merge application" timed out after 5.001s (given timeout 5s): grpc: context deadline exceeded [code 4/DeadlineExceeded] ``` Run with the following for more logging: ``` dev test pkg/kv/kvserver -v --vmodule='replica_raft=1,kvflowcontroller=2,replica_proposal_buf=1,raft_transport=2,kvflowdispatch=1,kvadmission=1,kvflowhandle=1,work_queue=1,replica_flow_control=1,tracker=1,client_raft_helpers_test=1,raft=1,admission=1,replica_flow_control=1,work_queue=1,replica_raft=1,replica_proposal_buf=1,raft_transport=2,kvadmission=1,work_queue=1,replica_flow_control=1,client_raft_helpers_test=1,range_controller=2,token_counter=2,token_tracker=2,processor=2,kvflowhandle=1' -f TestFlowControlSendQueueRangeMerge --show-logs --timeout=30s ``` Epic: none Release note: None
Configuration menu - View commit details
-
Copy full SHA for 2ba2ff2 - Browse repository at this point
Copy the full SHA 2ba2ff2View commit details -
kvserver: [dnm] add TestFlowControlSendQueueRangeMigrate test
Add a new RACv2 integration test, `TestFlowControlSendQueueRangeMigrate`. This test fails, but not by timing out (it will with the below args due to the timeout flag specified), instead s1 and s3 on the scratch range appear to ping pong messages back and forth like: ``` -- Test timed out at 2024-11-16 00:39:36 UTC -- I241116 00:39:37.221610 4048 kv/kvserver_test/client_raft_helpers_test.go:104 [T1,Vsystem,n3] 725 [raft] r69 Raft message 1->3 MsgApp Term:6 Log:6/25 Commit:27 I241116 00:39:37.222264 4068 kv/kvserver_test/client_raft_helpers_test.go:104 [T1,Vsystem,n1] 726 [raft] r69 Raft message 3->1 MsgAppResp Term:6 Log:0/25 Commit:25 I241116 00:39:38.221271 4048 kv/kvserver_test/client_raft_helpers_test.go:104 [T1,Vsystem,n3] 727 [raft] r69 Raft message 1->3 MsgApp Term:6 Log:6/25 Commit:27 I241116 00:39:38.221876 4068 kv/kvserver_test/client_raft_helpers_test.go:104 [T1,Vsystem,n1] 728 [raft] r69 Raft message 3->1 MsgAppResp Term:6 Log:0/25 Commit:25 I241116 00:39:38.971603 4048 kv/kvserver_test/client_raft_helpers_test.go:104 [T1,Vsystem,n3] 729 [raft] r69 Raft message 1->3 MsgApp Term:6 Log:6/25 Commit:27 I241116 00:39:38.972018 4068 kv/kvserver_test/client_raft_helpers_test.go:104 [T1,Vsystem,n1] 730 [raft] r69 Raft message 3->1 MsgAppResp Term:6 Log:0/25 Commit:25 ``` The timeout is on these lines, calling `kvserver.waitForApplication(..)` here: - https://github.com/kvoli/cockroach/blob/dd5456fea47451d5b47e10ca40991808d7e47780/pkg/kv/kvserver/replica_write.go#L289-L292 Run with the following for more logging: ```bash dev test pkg/kv/kvserver -v --vmodule='replica_raft=1,kvflowcontroller=2,replica_proposal_buf=1,raft_transport=2,kvflowdispatch=1,kvadmission=1,kvflowhandle=1,work_queue=1,replica_flow_control=1,tracker=1,client_raft_helpers_test=1,raft=1,admission=1,replica_flow_control=1,work_queue=1,replica_raft=1,replica_proposal_buf=1,raft_transport=2,kvadmission=1,work_queue=1,replica_flow_control=1,client_raft_helpers_test=1,range_controller=2,token_counter=2,token_tracker=2,processor=2,kvflowhandle=1' -f TestFlowControlSendQueueRangeMigrate --show-logs --timeout=60s ``` Epic: none Release note: None
Configuration menu - View commit details
-
Copy full SHA for 9b3a6c0 - Browse repository at this point
Copy the full SHA 9b3a6c0View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.