Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: [dnm] add RACv2 failing split/merge/migrate repros #135339

Closed

Commits on Nov 18, 2024

  1. kvserver: [dnm] add TestFlowControlSendQueueRangeSplitSnap test

    This test currently fails, while also churning snapshots towards the
    RHS range post-split. It appears that the RHS post-split replica on the
    store which has no available tokens (from the leader, n1's perspective)
    is effectively wedged.
    
    ```bash
    dev test pkg/kv/kvserver -v --vmodule='replica_raft=1,kvflowcontroller=2,replica_proposal_buf=1,raft_transport=2,kvflowdispatch=1,kvadmission=1,kvflowhandle=1,work_queue=1,replica_flow_control=1,tracker=1,client_raft_helpers_test=1,raft=1,admission=1,replica_flow_control=1,work_queue=1,replica_raft=1,replica_proposal_buf=1,raft_transport=2,kvadmission=1,work_queue=1,replica_flow_control=1,client_raft_helpers_test=1,range_controller=2,token_counter=2,token_tracker=2,processor=2,kvflowhandle=1' -f  TestFlowControlSendQueueRangeSplitSnap --show-logs --timeout=120s
    ```
    
    Epic: none
    Release note: None
    kvoli committed Nov 18, 2024
    Configuration menu
    Copy the full SHA
    22e3767 View commit details
    Browse the repository at this point in the history
  2. kvserver: [dnm] add TestFlowControlSendQueueRangeMerge test

    This adds a new RACv2 integration test,
    `TestFlowControlSendQueueRangeMerge`.
    
    The test exhausts the tokens on s3 (s1,s2,s3) by issuing 5 MiB / 4 MiB
    of writes to the RHS and blocking admission on n3.
    
    The test currently fails on the merge with:
    
    ```
    testcluster.go:760: merging at /Table/Max: /Table/Max: merge unexpected error: kv/kvserver/replica_command.go:962: merge failed: waiting for all right-hand replicas to catch up: operation "waiting for merge application" timed out after 5.001s (given timeout 5s): grpc: context deadline exceeded [code 4/DeadlineExceeded]
            (1) attached stack trace
              -- stack trace:
              | github.com/cockroachdb/cockroach/pkg/server.(*testServer).MergeRanges
              |     testserver.go:2039
              | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).MergeRanges
              |     pkg/testutils/testcluster/testcluster.go:750
              | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).MergeRangesOrFatal
              |     pkg/testutils/testcluster/testcluster.go:758
              | github.com/cockroachdb/cockroach/pkg/kv/kvserver_test.TestFlowControlSendQueueRangeMerge
              |     pkg/kv/kvserver/flow_control_integration_test.go:5163
              | testing.tRunner
              |     GOROOT/src/testing/testing.go:1689
              | runtime.goexit
              |     src/runtime/asm_arm64.s:1222
            Wraps: (2) /Table/Max: merge unexpected error: kv/kvserver/replica_command.go:962: merge failed: waiting for all right-hand replicas to catch up: operation "waiting for merge application" timed out after 5.001s (given timeout 5s): grpc: context deadline exceeded [code 4/DeadlineExceeded]
    ```
    
    Run with the following for more logging:
    
    ```
    dev test pkg/kv/kvserver -v --vmodule='replica_raft=1,kvflowcontroller=2,replica_proposal_buf=1,raft_transport=2,kvflowdispatch=1,kvadmission=1,kvflowhandle=1,work_queue=1,replica_flow_control=1,tracker=1,client_raft_helpers_test=1,raft=1,admission=1,replica_flow_control=1,work_queue=1,replica_raft=1,replica_proposal_buf=1,raft_transport=2,kvadmission=1,work_queue=1,replica_flow_control=1,client_raft_helpers_test=1,range_controller=2,token_counter=2,token_tracker=2,processor=2,kvflowhandle=1' -f  TestFlowControlSendQueueRangeMerge --show-logs --timeout=30s
    ```
    
    Epic: none
    Release note: None
    kvoli committed Nov 18, 2024
    Configuration menu
    Copy the full SHA
    2ba2ff2 View commit details
    Browse the repository at this point in the history
  3. kvserver: [dnm] add TestFlowControlSendQueueRangeMigrate test

    Add a new RACv2 integration test, `TestFlowControlSendQueueRangeMigrate`.
    
    This test fails, but not by timing out (it will with the below args due
    to the timeout flag specified), instead s1 and s3 on the scratch range
    appear to ping pong messages back and forth like:
    
    ```
    -- Test timed out at 2024-11-16 00:39:36 UTC --
    I241116 00:39:37.221610 4048 kv/kvserver_test/client_raft_helpers_test.go:104  [T1,Vsystem,n3] 725   [raft] r69 Raft message 1->3 MsgApp Term:6 Log:6/25 Commit:27
    I241116 00:39:37.222264 4068 kv/kvserver_test/client_raft_helpers_test.go:104  [T1,Vsystem,n1] 726   [raft] r69 Raft message 3->1 MsgAppResp Term:6 Log:0/25 Commit:25
    I241116 00:39:38.221271 4048 kv/kvserver_test/client_raft_helpers_test.go:104  [T1,Vsystem,n3] 727   [raft] r69 Raft message 1->3 MsgApp Term:6 Log:6/25 Commit:27
    I241116 00:39:38.221876 4068 kv/kvserver_test/client_raft_helpers_test.go:104  [T1,Vsystem,n1] 728   [raft] r69 Raft message 3->1 MsgAppResp Term:6 Log:0/25 Commit:25
    I241116 00:39:38.971603 4048 kv/kvserver_test/client_raft_helpers_test.go:104  [T1,Vsystem,n3] 729   [raft] r69 Raft message 1->3 MsgApp Term:6 Log:6/25 Commit:27
    I241116 00:39:38.972018 4068 kv/kvserver_test/client_raft_helpers_test.go:104  [T1,Vsystem,n1] 730   [raft] r69 Raft message 3->1 MsgAppResp Term:6 Log:0/25 Commit:25
    ```
    
    The timeout is on these lines, calling `kvserver.waitForApplication(..)`
    here:
    - https://github.com/kvoli/cockroach/blob/dd5456fea47451d5b47e10ca40991808d7e47780/pkg/kv/kvserver/replica_write.go#L289-L292
    
    Run with the following for more logging:
    
    ```bash
    dev test pkg/kv/kvserver -v --vmodule='replica_raft=1,kvflowcontroller=2,replica_proposal_buf=1,raft_transport=2,kvflowdispatch=1,kvadmission=1,kvflowhandle=1,work_queue=1,replica_flow_control=1,tracker=1,client_raft_helpers_test=1,raft=1,admission=1,replica_flow_control=1,work_queue=1,replica_raft=1,replica_proposal_buf=1,raft_transport=2,kvadmission=1,work_queue=1,replica_flow_control=1,client_raft_helpers_test=1,range_controller=2,token_counter=2,token_tracker=2,processor=2,kvflowhandle=1' -f  TestFlowControlSendQueueRangeMigrate --show-logs --timeout=60s
    ```
    
    Epic: none
    Release note: None
    kvoli committed Nov 18, 2024
    Configuration menu
    Copy the full SHA
    9b3a6c0 View commit details
    Browse the repository at this point in the history