Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (Timeout waiting for 'delete_records' operation validation) in RandomNodeOperationsTest.test_node_operations #11942

Closed
andijcr opened this issue Jul 7, 2023 · 2 comments
Assignees
Labels
ci-failure kind/bug Something isn't working

Comments

@andijcr
Copy link
Contributor

andijcr commented Jul 7, 2023

https://buildkite.com/redpanda/vtools/builds/8405#0189302a-912b-425a-9a6a-fcce6f6d213a

Module: rptest.tests.random_node_operations_test
Class:  RandomNodeOperationsTest
Method: test_node_operations
Arguments:
{
  "enable_controller_snapshots": true,
  "enable_failures": true,
  "num_to_upgrade": 0
}
====================================================================================================
test_id:    rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.num_to_upgrade=0.enable_controller_snapshots=True
status:     FAIL
run time:   3 minutes 38.456 seconds


    TimeoutError("Timeout waiting for {'type': 'delete_records', 'properties': {'truncate_points': {'fuzzy-operator-6890-ibkfcm': {0: 384}, 'fuzzy-operator-6890-izkprl': {0: 403}}}} operation validation")
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/utils/mode_checks.py", line 63, in f
    return func(*args, **kwargs)
  File "/root/tests/rptest/services/cluster.py", line 82, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/random_node_operations_test.py", line 318, in test_node_operations
    self.admin_fuzz.wait(20, 180)
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 835, in wait
    wait_until(check, timeout_sec=timeout, backoff_sec=2)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 53, in wait_until
    raise e
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 44, in wait_until
    if condition():
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 823, in check
    raise self.error
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 664, in thread_loop
    self.execute_one()
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 726, in execute_one
    raise e
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 708, in execute_one
    wait_until(
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: Timeout waiting for {'type': 'delete_records', 'properties': {'truncate_points': {'fuzzy-operator-6890-ibkfcm': {0: 384}, 'fuzzy-operator-6890-izkprl': {0: 403}}}} operation validation
@andijcr andijcr added kind/bug Something isn't working ci-failure labels Jul 7, 2023
@andijcr
Copy link
Contributor Author

andijcr commented Jul 7, 2023

different test, same failure
https://buildkite.com/redpanda/vtools/builds/8405#0189302a-9128-4e83-a300-d741175894bc
ControllerUpgradeTest.test_updating_cluster_when_executing_operations

====================================================================================================
test_id:    rptest.tests.controller_upgrade_test.ControllerUpgradeTest.test_updating_cluster_when_executing_operations
status:     FAIL
run time:   1 minute 42.122 seconds


    TimeoutError("Timeout waiting for {'type': 'delete_records', 'properties': {'truncate_points': {'fuzzy-operator-179-hmuyzj': {0: 4}}}} operation validation")
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 82, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/controller_upgrade_test.py", line 113, in test_updating_cluster_when_executing_operations
    admin_fuzz.wait(num_executed_before_restart + 2, 240)
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 835, in wait
    wait_until(check, timeout_sec=timeout, backoff_sec=2)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 53, in wait_until
    raise e
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 44, in wait_until
    if condition():
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 823, in check
    raise self.error
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 664, in thread_loop
    self.execute_one()
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 726, in execute_one
    raise e
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 708, in execute_one
    wait_until(
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: Timeout waiting for {'type': 'delete_records', 'properties': {'truncate_points': {'fuzzy-operator-179-hmuyzj': {0: 4}}}} operation validation

@andijcr
Copy link
Contributor Author

andijcr commented Jul 7, 2023

#11944 (comment)

@graphcareful graphcareful self-assigned this Jul 7, 2023
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Jul 7, 2023
- This change relies on rpk to issue delete records instead of kcl
because the rpk ducktape client harness will raise exceptions in the
case any command does not exit 0, which will cause the
AminOperationsFuzzer retry logic to kick in. This was causing some CI
tests to fail, as the retry logic would never kick-in in tests like the
PartitionBalancerTests where it is highly likely where a partition will
change leadership during the test.

- Fixes: redpanda-data#11950
- Fixes: redpanda-data#11942
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-failure kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants