Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeCrash in ManyPartitionsTest.test_many_partitions/test_many_partitions_compacted #8098

Closed
dlex opened this issue Jan 6, 2023 · 3 comments
Labels

Comments

@dlex
Copy link
Contributor

dlex commented Jan 6, 2023

Module: rptest.scale_tests.many_partitions_test
Class: ManyPartitionsTest
Method: test_many_partitions

on (arm64, VM) https://buildkite.com/redpanda/vtools/builds/5007#01858500-14b5-4276-87c4-449e1f6c4880

    <NodeCrash ip-172-31-32-115: Redpanda process unexpectedly stopped>
Traceback (most recent call last):
  File "/home/ubuntu/redpanda/tests/rptest/services/kgo_repeater_service.py", line 188, in await_group_ready
    wait_until(group_ready, timeout_sec=120, backoff_sec=10)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/utils/util.py", line 58, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 35, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/many_partitions_test.py", line 715, in test_many_partitions
    self._test_many_partitions(compacted=False)
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/many_partitions_test.py", line 869, in _test_many_partitions
    self._restart_stress(scale, topic_names, n_partitions,
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/many_partitions_test.py", line 478, in _restart_stress
    inter_restart_check()
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/many_partitions_test.py", line 846, in progress_check
    repeater.await_group_ready()
  File "/home/ubuntu/redpanda/tests/rptest/services/kgo_repeater_service.py", line 196, in await_group_ready
    group = rpk.group_describe(self.group_name, summary=False)
  File "/home/ubuntu/redpanda/tests/rptest/clients/rpk.py", line 490, in group_describe
    rpk_group = try_describe_group(group)
  File "/home/ubuntu/redpanda/tests/rptest/clients/rpk.py", line 438, in try_describe_group
    out = self._run_group(cmd)
  File "/home/ubuntu/redpanda/tests/rptest/clients/rpk.py", line 542, in _run_group
    return self._execute(cmd, stdin=stdin, timeout=timeout)
  File "/home/ubuntu/redpanda/tests/rptest/clients/rpk.py", line 654, in _execute
    raise RpkException(
rptest.clients.rpk.RpkException: RpkException<command /opt/redpanda/bin/rpk group --brokers ip-172-31-32-146:9092,ip-172-31-45-122:9092,ip-172-31-36-88:9092,ip-172-31-33-85:9092,ip-172-31-40-175:9092,ip-172-31-32-115:9092,ip-172-31-44-174:9092,ip-172-31-33-69:9092,ip-172-31-33-173:9092 describe repeat01 returned 1, output: all 1 DescribeGroups request failures, first error: broker replied that group repeat01 has broker coordinator 4, but did not reply with that broker in the broker list
>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 50, in wrapped
    self.redpanda.raise_on_crash()
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1487, in raise_on_crash
    raise NodeCrash(crashes)
rptest.services.utils.NodeCrash: <NodeCrash ip-172-31-32-115: Redpanda process unexpectedly stopped>

Also looks similar:
Module: rptest.scale_tests.many_partitions_test
Class: ManyPartitionsTest
Method: test_many_partitions_compacted

on (arm64, VM) https://buildkite.com/redpanda/vtools/builds/5047#0185838c-0b8d-4784-abe4-d77cf7b61b3e

    <NodeCrash (ip-172-31-37-85,ip-172-31-45-12) ip-172-31-37-85: Redpanda process unexpectedly stopped>
Traceback (most recent call last):
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 35, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/many_partitions_test.py", line 711, in test_many_partitions_compacted
    self._test_many_partitions(compacted=True)
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/many_partitions_test.py", line 865, in _test_many_partitions
    self._single_node_restart(scale, topic_names, n_partitions)
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/many_partitions_test.py", line 442, in _single_node_restart
    wait_until(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/utils/util.py", line 58, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 50, in wrapped
    self.redpanda.raise_on_crash()
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1487, in raise_on_crash
    raise NodeCrash(crashes)
rptest.services.utils.NodeCrash: <NodeCrash (ip-172-31-37-85,ip-172-31-45-12) ip-172-31-37-85: Redpanda process unexpectedly stopped>
@dlex dlex added kind/bug Something isn't working ci-failure labels Jan 6, 2023
@dlex dlex changed the title NodeCrash in ManyPartitionsTest.test_many_partitions NodeCrash in ManyPartitionsTest.test_many_partitions/test_many_partitions_compacted Jan 6, 2023
@jcsp
Copy link
Contributor

jcsp commented Jan 9, 2023

Child of #7405, leaving open to make the test name searchable.

@NyaliaLui
Copy link
Contributor

@ballard26
Copy link
Contributor

Closing this issue as it's related to #8355

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants