Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure: ManyPartitionsTest.test_many_partitions timeout in kgo_repeater_service.py #8288

Closed
abhijat opened this issue Jan 18, 2023 · 1 comment
Labels
ci-failure kind/bug Something isn't working

Comments

@abhijat
Copy link
Contributor

abhijat commented Jan 18, 2023

Module: rptest.scale_tests.many_partitions_test
Class:  ManyPartitionsTest
Method: test_many_partitions

https://buildkite.com/redpanda/vtools/builds/5295#0185c159-c9c2-4ea8-9a27-107193993aa2

[ERROR - 2023-01-17 21:07:18,620 - cluster - wrapped - lineno:41]: Test failed, doing failure checks...
Traceback (most recent call last):
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 35, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/many_partitions_test.py", line 715, in test_many_partitions
    self._test_many_partitions(compacted=False)
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/many_partitions_test.py", line 874, in _test_many_partitions
    repeater.await_group_ready()
  File "/home/ubuntu/redpanda/tests/rptest/services/kgo_repeater_service.py", line 188, in await_group_ready
    wait_until(group_ready, timeout_sec=120, backoff_sec=10)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/utils/util.py", line 58, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError

Note that there are other incidents for this test open and one does mention a timeout but here the traceback is very different, it seems to originate in kgo repeater service. Additionally the following error is also seen in test log

[WARNING - 2023-01-17 21:09:24,720 - service_registry - clean_all - lineno:69]: Error cleaning service <KgoVerifierProducer-1-281473481223728: num_nodes: 1, nodes: ['ip-172-31-2-33']>: root@ip-172-31-2-33: Command 'rm -r /tmp/KgoVerifierProducer*' returned non-zero exit status 1. Remote error message: b"rm: cannot remove '/tmp/KgoVerifierProducer*': No such file or directory\n"

It may need to be investigated to see if it has the same cause as #8098 and #8074 - here no node crash or broken promise is seen.

In the same run for the compacted partition version of this test ManyPartitionsTest.test_many_partitions_compacted the following error was seen:

[INFO  - 2023-01-17 21:34:24,980 - runner_client - log - lineno:278]: RunnerClient: rptest.scale_tests.many_partitions_test.ManyPartitionsTest.test_many_partitions_compacted: FAIL: AssertionError('Unable to determine group within set number of attempts')
Traceback (most recent call last):
  File "/home/ubuntu/redpanda/tests/rptest/services/kgo_repeater_service.py", line 188, in await_group_ready
    wait_until(group_ready, timeout_sec=120, backoff_sec=10)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/utils/util.py", line 58, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 35, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/many_partitions_test.py", line 711, in test_many_partitions_compacted
    self._test_many_partitions(compacted=True)
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/many_partitions_test.py", line 874, in _test_many_partitions
    repeater.await_group_ready()
  File "/home/ubuntu/redpanda/tests/rptest/services/kgo_repeater_service.py", line 196, in await_group_ready
    group = rpk.group_describe(self.group_name, summary=False)
  File "/home/ubuntu/redpanda/tests/rptest/clients/rpk.py", line 605, in group_describe
    assert rpk_group is not None, "Unable to determine group within set number of attempts"
AssertionError: Unable to determine group within set number of attempts
@abhijat abhijat added kind/bug Something isn't working ci-failure labels Jan 18, 2023
@abhijat abhijat changed the title CI Failure: ManyPartitionsTest.test_many_partitions timeout in kgo_repeater_service.py CI Failure: ManyPartitionsTest.test_many_partitions timeout in kgo_repeater_service.py Jan 18, 2023
@abhijat abhijat changed the title CI Failure: ManyPartitionsTest.test_many_partitions timeout in kgo_repeater_service.py CI Failure: ManyPartitionsTest.test_many_partitions timeout in kgo_repeater_service.py Jan 18, 2023
@abhijat abhijat changed the title CI Failure: ManyPartitionsTest.test_many_partitions timeout in kgo_repeater_service.py CI Failure: ManyPartitionsTest.test_many_partitions timeout in kgo_repeater_service.py Jan 18, 2023
@jcsp
Copy link
Contributor

jcsp commented Jan 20, 2023

This is a variant of #7405

@jcsp jcsp closed this as completed Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-failure kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants