Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout failure waiting on Admin Fuzzer in RandomNodeOperationsTest.test_node_operations #8142

Closed
NyaliaLui opened this issue Jan 10, 2023 · 5 comments · Fixed by #8176
Closed
Assignees
Labels

Comments

@NyaliaLui
Copy link
Contributor

NyaliaLui commented Jan 10, 2023

https://buildkite.com/redpanda/redpanda/builds/20900#01859b7e-8b42-40b9-a7cb-b93d70f8b1f3/6-2729

Module: rptest.tests.random_node_operations_test
Class:  RandomNodeOperationsTest
Method: test_node_operations
Arguments:
{
  "enable_failures": true
}
test_id:    rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True
status:     FAIL
run time:   4 minutes 21.145 seconds
 
    TimeoutError('')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 476, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/utils/mode_checks.py", line 63, in f
    return func(*args, **kwargs)
  File "/root/tests/rptest/services/cluster.py", line 35, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/random_node_operations_test.py", line 105, in test_node_operations
    admin_fuzz.wait(20, 180)
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 588, in wait
    wait_until(check, timeout_sec=timeout, backoff_sec=2)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 53, in wait_until
    raise e
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 44, in wait_until
    if condition():
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 576, in check
    raise self.error
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 469, in thread_loop
    wait_until(validate_result,
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError
@NyaliaLui NyaliaLui added kind/bug Something isn't working ci-failure labels Jan 10, 2023
@NyaliaLui NyaliaLui changed the title Timeout failure in RandomNodeOperationsTest.test_node_operations.enable_failures=True Timeout failure waiting on Admin Fuzzer in RandomNodeOperationsTest.test_node_operations/ ControllerUpgradeTest.test_updating_cluster_when_executing_operations Jan 10, 2023
@NyaliaLui NyaliaLui changed the title Timeout failure waiting on Admin Fuzzer in RandomNodeOperationsTest.test_node_operations/ ControllerUpgradeTest.test_updating_cluster_when_executing_operations Timeout failure waiting on Admin Fuzzer in RandomNodeOperationsTest.test_node_operations / ControllerUpgradeTest.test_updating_cluster_when_executing_operations Jan 10, 2023
@rystsov
Copy link
Contributor

rystsov commented Jan 10, 2023

Timeout is a general error, let do not put failures in different tests under the same issue unless we dig the true root cause

@rystsov rystsov changed the title Timeout failure waiting on Admin Fuzzer in RandomNodeOperationsTest.test_node_operations / ControllerUpgradeTest.test_updating_cluster_when_executing_operations Timeout failure waiting on Admin Fuzzer in RandomNodeOperationsTest.test_node_operations Jan 10, 2023
@rystsov
Copy link
Contributor

rystsov commented Jan 10, 2023

For ControllerUpgradeTest.test_updating_cluster_when_executing_operations we already have #8083

@redpanda-data redpanda-data deleted a comment from NyaliaLui Jan 10, 2023
@rystsov
Copy link
Contributor

rystsov commented Jan 10, 2023

@rystsov
Copy link
Contributor

rystsov commented Jan 10, 2023

Found the problem

[INFO  - 2023-01-10 12:06:56,234 - admin_ops_fuzzer - execute - lineno:308]: Creating allow cluster describe ACL for user: fuzzy-operator-5552-user-kzltfh
[DEBUG - 2023-01-10 12:06:56,234 - rpk - _execute - lineno:639]: Executing command: ['/var/lib/buildkite-agent/builds/buildkite-amd64-xfs-builders-i-00cc1cc57e48b0bdc-1/redpanda/redpanda/vbuild/redpanda_installs/ci/bin/rpk', 'acl', 'create', '--allow-principal', 'User:fuzzy-operator-5552-user-kzltfh', '--operation', 'describe', '--cluster', '--brokers', 'docker-rp-1:9092,docker-rp-10:9092,docker-rp-2:9092,docker-rp-9:9092']
[DEBUG - 2023-01-10 12:06:56,261 - rpk - _execute - lineno:652]: 
PRINCIPAL                             HOST  RESOURCE-TYPE  RESOURCE-NAME  RESOURCE-PATTERN-TYPE  OPERATION  PERMISSION  ERROR
User:fuzzy-operator-5552-user-kzltfh  *     CLUSTER        kafka-cluster  LITERAL                DESCRIBE   ALLOW       NOT_CONTROLLER

[INFO  - 2023-01-10 12:06:56,261 - admin_ops_fuzzer - validate - lineno:317]: Validating user fuzzy-operator-5552-user-kzltfh ACL is present
[DEBUG - 2023-01-10 12:06:56,262 - rpk - _execute - lineno:639]: Executing command: ['/var/lib/buildkite-agent/builds/buildkite-amd64-xfs-builders-i-00cc1cc57e48b0bdc-1/redpanda/redpanda/vbuild/redpanda_installs/ci/bin/rpk', 'acl', 'list', '--brokers', 'docker-rp-9:9092,docker-rp-2:9092,docker-rp-10:9092,docker-rp-1:9092']
[DEBUG - 2023-01-10 12:06:56,286 - rpk - _execute - lineno:652]: 
PRINCIPAL                             HOST  RESOURCE-TYPE  RESOURCE-NAME  RESOURCE-PATTERN-TYPE  OPERATION  PERMISSION  ERROR
User:fuzzy-operator-5552-user-budvse  *     CLUSTER        kafka-cluster  LITERAL                DESCRIBE   ALLOW
User:fuzzy-operator-5552-user-fgstoc  *     CLUSTER        kafka-cluster  LITERAL                DESCRIBE   ALLOW

rpk-wrapper doesn't parse acl_create_allow_cluster output, treats a failed op as a success and as a result the fuzzer falls into an endless validation loop and eventually times out

rystsov added a commit to rystsov/redpanda that referenced this issue Jan 10, 2023
andrewhsu pushed a commit to andrewhsu/redpanda that referenced this issue Jan 11, 2023
@rystsov rystsov self-assigned this Jan 11, 2023
rystsov added a commit to rystsov/redpanda that referenced this issue Jan 12, 2023
parsing the output to fail on anything but success to avoid
slipping failures as success

fixes redpanda-data#8142
rystsov added a commit to rystsov/redpanda that referenced this issue Jan 13, 2023
parsing the output to fail on anything but success to avoid
slipping failures as success

fixes redpanda-data#8142
rystsov added a commit to rystsov/redpanda that referenced this issue Jan 13, 2023
parsing the output to fail on anything but success to avoid
slipping failures as success

fixes redpanda-data#8142
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants