Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v23.2.x] CI Failure (Redpanda node docker-rp-19 failed to stop in 30 seconds) in ShutdownTest.test_timely_shutdown_with_failures #15295

Closed
michael-redpanda opened this issue Dec 4, 2023 · 5 comments
Labels
area/redpanda ci-failure kind/backport PRs targeting a stable branch kind/bug Something isn't working sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages

Comments

@michael-redpanda
Copy link
Contributor

https://buildkite.com/redpanda/vtools/builds/10980#018c356d-9e3b-4936-a00b-6db91546a2b6

Module: rptest.tests.timely_shutdown_test
Class:  ShutdownTest
Method: test_timely_shutdown_with_failures
test_id:    rptest.tests.timely_shutdown_test.ShutdownTest.test_timely_shutdown_with_failures
status:     FAIL
run time:   3 minutes 32.134 seconds


    TimeoutError('Redpanda node docker-rp-19 failed to stop in 30 seconds')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 82, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/timely_shutdown_test.py", line 105, in test_timely_shutdown_with_failures
    self.redpanda.restart_nodes(leader)
  File "/root/tests/rptest/services/redpanda.py", line 882, in restart_nodes
    list(
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/root/tests/rptest/services/redpanda.py", line 883, in <lambda>
    executor.map(lambda n: self.stop_node(n, timeout=stop_timeout),
  File "/root/tests/rptest/services/redpanda.py", line 2738, in stop_node
    wait_until(
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: Redpanda node docker-rp-19 failed to stop in 30 seconds
@michael-redpanda michael-redpanda added kind/bug Something isn't working ci-failure labels Dec 4, 2023
@michael-redpanda
Copy link
Contributor Author

The node does appear to shutdown:

INFO  2023-12-04 16:06:20,102 [shard 0] main - application.cc:369 - Shutdown complete.                                                                                                                                                                                                 
DEBUG 2023-12-04 16:06:20,104 [shard 0] seastar - reactor::drain

but maybe not enough time? It appeared to happen on the fourth iteration through the restart loop

@michael-redpanda
Copy link
Contributor Author

do not use this issue to track dev failures, if you observe similar failure on dev - create new issue, this issue is only for backports

@dotnwat
Copy link
Member

dotnwat commented Dec 16, 2023

05:49
06:20

is like 30 seconds difference between the time it looks like the stop was requested and the time it actually finished shutting down. so something slow about the shutdown, but it was like a second or so off from the deadline.

But also on a debug build, so lots of weird slowness could occur.

Might be worth having a different timeout for debug vs release.

@michael-redpanda
Copy link
Contributor Author

Marking sev/low as it may be a timeout issue with the test infrastructure.

@michael-redpanda michael-redpanda added the sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages label Jan 16, 2024
@piyushredpanda
Copy link
Contributor

Not seen in at least two months, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda ci-failure kind/backport PRs targeting a stable branch kind/bug Something isn't working sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages
Projects
None yet
Development

No branches or pull requests

3 participants