Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (Timeout Error) in RpkRedpandaStartTest.test_rpc_tls_list #11694

Closed
r-vasquez opened this issue Jun 26, 2023 · 5 comments
Closed

CI Failure (Timeout Error) in RpkRedpandaStartTest.test_rpc_tls_list #11694

r-vasquez opened this issue Jun 26, 2023 · 5 comments
Labels
ci-failure kind/bug Something isn't working sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages

Comments

@r-vasquez
Copy link
Contributor

https://buildkite.com/redpanda/redpanda/builds/31981#0188f955-b8c9-48f6-a74f-4a93560210d4/6-5676

Module: rptest.tests.rpk_start_test 
Class:  RpkRedpandaStartTest 
Method: test_rpc_tls_list
test_id:    rptest.tests.rpk_start_test.RpkRedpandaStartTest.test_rpc_tls_list
status:     FAIL
run time:   1 minute 5.802 seconds


    TimeoutError('Redpanda service docker-rp-10 failed to start within 60 sec using rpk')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 79, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/rpk_start_test.py", line 340, in test_rpc_tls_list
    self.redpanda._for_nodes(self.redpanda.nodes,
  File "/root/tests/rptest/services/redpanda.py", line 996, in _for_nodes
    return list(executor.map(cb, nodes))
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/root/tests/rptest/tests/rpk_start_test.py", line 338, in setup_and_start
    self.redpanda.start_node_with_rpk(node, clean_node=False)
  File "/root/tests/rptest/services/redpanda.py", line 2197, in start_node_with_rpk
    self.start_service(node, start_rp)
  File "/root/tests/rptest/services/redpanda.py", line 2227, in start_service
    start()
  File "/root/tests/rptest/services/redpanda.py", line 2188, in start_rp
    wait_until(
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: Redpanda service docker-rp-10 failed to start within 60 sec using rpk

Redpanda Log in docker-rp-10 we see that there is a port in use:

INFO  2023-06-26 21:06:47,527 [shard 0] main - application.cc:367 - Shutdown complete.
ERROR 2023-06-26 21:06:47,528 [shard 0] main - application.cc:387 - Failure during startup: std::runtime_error (vectorized internal rpc protocol - Error attempting to listen on {://0.0.0.0:33145:PLAINTEXT}: std::__1::system_error (error system:98, posix_listen failed for address 0.0.0.0:33145: Address already in use))

If we check the debug logs

[DEBUG - 2023-06-26 21:07:44,985 - redpanda - is_node_ready - lineno:2076]: node docker-rp-10 not yet accepting connections
[DEBUG - 2023-06-26 21:07:45,988 - redpanda - is_node_ready - lineno:2076]: node docker-rp-10 not yet accepting connections
[WARNING - 2023-06-26 21:07:46,989 - redpanda - start_service - lineno:2231]: Failed to start on docker-rp-10, gathering node ps and netstat...
[DEBUG - 2023-06-26 21:07:46,989 - remoteaccount - _log - lineno:166]: root@docker-rp-10: Running ssh command: ps aux
...
[DEBUG - 2023-06-26 21:07:46,994 - remoteaccount - _log - lineno:166]: root@docker-rp-10: Running ssh command: netstat -panelot
[DEBUG - 2023-06-26 21:07:47,051 - redpanda - _log_node_process_state - lineno:2219]: Active Internet connections (servers and established)
[DEBUG - 2023-06-26 21:07:47,051 - redpanda - _log_node_process_state - lineno:2219]: Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name     Timer
[DEBUG - 2023-06-26 21:07:47,051 - redpanda - _log_node_process_state - lineno:2219]: tcp        0      0 127.0.0.11:33145        0.0.0.0:*               LISTEN      0          88671      -                    off (0.00/0/0)
@piyushredpanda
Copy link
Contributor

Sounds like a environment issue, @bharathv ?

@bharathv
Copy link
Contributor

I've seen this symptom occasionally in some CI failures, lack of PID doesn't help (I don't know how to interpret that).. I'll keep an eye on this.. marking sev/low until then because this is mostly a test infra issue and not a RP bug.

@bharathv bharathv added the sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages label Jun 27, 2023
@bharathv bharathv self-assigned this Jun 27, 2023
@bharathv bharathv removed their assignment Jul 11, 2023
@BenPope
Copy link
Member

BenPope commented Jul 24, 2023

Seen here in a different test: https://buildkite.com/redpanda/redpanda/builds/33736#0189787f-9fee-41f6-95f5-76eb949711b5

Hopefully it will help narrow down a previous failing test or something like that.

@twmb
Copy link
Contributor

twmb commented Aug 22, 2023

Untagging area/rpk

@rystsov
Copy link
Contributor

rystsov commented Sep 3, 2023

The issue was open of a PR and hasn't happen on dev since then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-failure kind/bug Something isn't working sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages
Projects
None yet
Development

No branches or pull requests

6 participants