CI Failure (RPK failed FindCoordinator req) in `RpkGroupCommandsTest.test_group_describe` #12291

NyaliaLui · 2023-07-18T18:51:54Z

https://buildkite.com/redpanda/vtools/builds/8505#018966d7-6953-4067-afab-fa6498b0f9bb

Module: rptest.tests.rpk_group_test
Class:  RpkGroupCommandsTest
Method: test_group_describe

test_id:    rptest.tests.rpk_group_test.RpkGroupCommandsTest.test_group_describe
status:     FAIL
run time:   9.182 seconds


    RpkException('command /var/lib/buildkite-agent/builds/buildkite-amd64-xfs-builders-i-082c7dfe8830a81b7-1/redpanda/vtools/vbuild/redpanda_installs/ci/bin/rpk group -X brokers=docker-rp-20:9092,docker-rp-22:9092,docker-rp-21:9092 describe test-g1 -v returned 1, output: ', '02:55:28.186  DEBUG  sharded request  {"req": "FindCoordinator", "destinations": ["any"]}\n02:55:28.186  DEBUG  opening connection to broker  {"addr": "docker-rp-20:9092", "broker": "seed_0"}\n02:55:28.186  DEBUG  connection opened to broker  {"addr": "docker-rp-20:9092", "broker": "seed_0"}\n02:55:28.187  DEBUG  issuing api versions request  {"broker": "seed_0", "version": 3}\n02:55:28.187  DEBUG  wrote ApiVersions v3  {"broker": "seed_0", "bytes_written": 31, "write_wait": "16.409Âµs", "time_to_write": "16.118Âµs", "err": null}\n02:55:28.187  DEBUG  read ApiVersions v3  {"broker": "seed_0", "bytes_read": 296, "read_wait": "36.776Âµs", "time_to_read": "170.578Âµs", "err": null}\n02:55:28.187  DEBUG  connection initialized successfully  {"addr": "docker-rp-20:9092", "broker": "seed_0"}\n02:55:28.187  DEBUG  sharded request failed, resharding and reissuing  {"req": "FindCoordinator", "time_since_start": "1.090625ms", "tries": 0, "err": "broker is too old; the broker has already indicated it will not know how to handle the request"}\n02:55:28.187  DEBUG  sharded request  {"req": "FindCoordinator", "destinations": ["any"]}\n02:55:28.187  DEBUG  opening connection to broker  {"addr": "docker-rp-22:9092", "broker": "seed_1"}\n02:55:28.187  DEBUG  connection opened to broker  {"addr": "docker-rp-22:9092", "broker": "seed_1"}\n02:55:28.187  DEBUG  issuing api versions request  {"broker": "seed_1", "version": 3}\n02:55:28.187  DEBUG  wrote ApiVersions v3  {"broker": "seed_1", "bytes_written": 31, "write_wait": "8.351Âµs", "time_to_write": "15.301Âµs", "err": null}\n02:55:28.188  DEBUG  read ApiVersions v3  {"broker": "seed_1", "bytes_read": 296, "read_wait": "28.104Âµs", "time_to_read": "423.174Âµs", "err": null}\n02:55:28.188  DEBUG  connection initialized successfully  {"addr": "docker-rp-22:9092", "broker": "seed_1"}\n02:55:28.188  DEBUG  wrote FindCoordinator v3  {"broker": "seed_1", "bytes_written": 28, "write_wait": "857.859Âµs", "time_to_write": "17.294Âµs", "err": null}\n02:55:28.193  DEBUG  read FindCoordinator v3  {"broker": "seed_1", "bytes_read": 26, "read_wait": "44.775Âµs", "time_to_read": "5.287227ms", "err": null}\n02:55:28.193  DEBUG  sharded request  {"req": "DescribeGroups", "destinations": ["err"]}\nunable to describe groups: request DescribeGroups has 1 separate shard errors, first: COORDINATOR_NOT_AVAILABLE: The coordinator is not available.\n', 1, '')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 82, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/rpk_group_test.py", line 91, in test_group_describe
    wait_until(lambda: rpk.group_describe(group_1).state == "Stable",
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 53, in wait_until
    raise e
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 44, in wait_until
    if condition():
  File "/root/tests/rptest/tests/rpk_group_test.py", line 91, in <lambda>
    wait_until(lambda: rpk.group_describe(group_1).state == "Stable",
  File "/root/tests/rptest/clients/rpk.py", line 689, in group_describe
    rpk_group = try_describe_group(group)
  File "/root/tests/rptest/clients/rpk.py", line 637, in try_describe_group
    out = self._run_group(cmd)
  File "/root/tests/rptest/clients/rpk.py", line 792, in _run_group
    return self._execute(cmd, stdin=stdin, timeout=timeout)
  File "/root/tests/rptest/clients/rpk.py", line 910, in _execute
    raise RpkException(
rptest.clients.rpk.RpkException: RpkException<command /var/lib/buildkite-agent/builds/buildkite-amd64-xfs-builders-i-082c7dfe8830a81b7-1/redpanda/vtools/vbuild/redpanda_installs/ci/bin/rpk group -X brokers=docker-rp-20:9092,docker-rp-22:9092,docker-rp-21:9092 describe test-g1 -v returned 1, output:  error: 02:55:28.186  DEBUG  sharded request  {"req": "FindCoordinator", "destinations": ["any"]}
02:55:28.186  DEBUG  opening connection to broker  {"addr": "docker-rp-20:9092", "broker": "seed_0"}
02:55:28.186  DEBUG  connection opened to broker  {"addr": "docker-rp-20:9092", "broker": "seed_0"}
02:55:28.187  DEBUG  issuing api versions request  {"broker": "seed_0", "version": 3}
02:55:28.187  DEBUG  wrote ApiVersions v3  {"broker": "seed_0", "bytes_written": 31, "write_wait": "16.409Âµs", "time_to_write": "16.118Âµs", "err": null}
02:55:28.187  DEBUG  read ApiVersions v3  {"broker": "seed_0", "bytes_read": 296, "read_wait": "36.776Âµs", "time_to_read": "170.578Âµs", "err": null}
02:55:28.187  DEBUG  connection initialized successfully  {"addr": "docker-rp-20:9092", "broker": "seed_0"}
02:55:28.187  DEBUG  sharded request failed, resharding and reissuing  {"req": "FindCoordinator", "time_since_start": "1.090625ms", "tries": 0, "err": "broker is too old; the broker has already indicated it will not know how to handle the request"}
02:55:28.187  DEBUG  sharded request  {"req": "FindCoordinator", "destinations": ["any"]}
02:55:28.187  DEBUG  opening connection to broker  {"addr": "docker-rp-22:9092", "broker": "seed_1"}
02:55:28.187  DEBUG  connection opened to broker  {"addr": "docker-rp-22:9092", "broker": "seed_1"}
02:55:28.187  DEBUG  issuing api versions request  {"broker": "seed_1", "version": 3}
02:55:28.187  DEBUG  wrote ApiVersions v3  {"broker": "seed_1", "bytes_written": 31, "write_wait": "8.351Âµs", "time_to_write": "15.301Âµs", "err": null}
02:55:28.188  DEBUG  read ApiVersions v3  {"broker": "seed_1", "bytes_read": 296, "read_wait": "28.104Âµs", "time_to_read": "423.174Âµs", "err": null}
02:55:28.188  DEBUG  connection initialized successfully  {"addr": "docker-rp-22:9092", "broker": "seed_1"}
02:55:28.188  DEBUG  wrote FindCoordinator v3  {"broker": "seed_1", "bytes_written": 28, "write_wait": "857.859Âµs", "time_to_write": "17.294Âµs", "err": null}
02:55:28.193  DEBUG  read FindCoordinator v3  {"broker": "seed_1", "bytes_read": 26, "read_wait": "44.775Âµs", "time_to_read": "5.287227ms", "err": null}
02:55:28.193  DEBUG  sharded request  {"req": "DescribeGroups", "destinations": ["err"]}
unable to describe groups: request DescribeGroups has 1 separate shard errors, first: COORDINATOR_NOT_AVAILABLE: The coordinator is not available.
 returncode: 1>

The text was updated successfully, but these errors were encountered:

NyaliaLui · 2023-07-18T18:53:40Z

02:55:28.187  DEBUG  sharded request failed, resharding and reissuing  {"req": "FindCoordinator", "time_since_start": "1.090625ms", "tries": 0, "err": "broker is too old; the broker has already indicated it will not know how to handle the request"}

This makes me think that either RPK is using an unsupported API version or we need to update FindCoordinator API in our kafka protocol

rystsov · 2023-07-19T16:25:51Z

https://buildkite.com/redpanda/redpanda/builds/33275

michael-redpanda · 2023-07-20T13:47:36Z

Able to reproduce in ducktape. Failed 2/10 times.

michael-redpanda · 2023-07-20T13:55:55Z

My suspicion is that this is an unintended side effect of #12121. Marking as sev/low as this is probably a test issue but shoudl be relatively straight forward to fix.

graphcareful · 2023-07-20T14:59:05Z

Tracked the case down to this commit 61e2512 which causes Rpk to exit 1 and print the output on stderr in the case of certain failures.

Ducktape was previously assuming this command would print COORDINATOR_NOT_AVAILABLE on stdout and not exit 1, now that this isn't the case we should modify ducktape to react correctly to the new behavior of rpk group describe

dotnwat · 2023-07-20T16:46:28Z

Is the fix easy for this? It seems to be triggering quite a bit in CI (at least in one of my PRs)

twmb · 2023-07-20T16:50:55Z

The sharding error is because the client interally tries to send this as a batched FindCoordinator request, and then immediately sees RP doesn't support that, so then the client splits the request and actually issues it. So, expect to see that message until RP supports batched FindCoordinator (v4+)

twmb · 2023-07-20T16:53:00Z

@graphcareful I should change rpk to not fail if there is a partial error -- this matches the old behavior. However, the old code would just print a failure here then exit 0, so perhaps the new behavior is correct.

NyaliaLui · 2023-07-20T17:06:12Z

The sharding error is because the client interally tries to send this as a batched FindCoordinator request, and then immediately sees RP doesn't support that, so then the client splits the request and actually issues it. So, expect to see that message until RP supports batched FindCoordinator (v4+)

Sounds like we need to update FindCoordinator API in our Kafka protocol as well

graphcareful · 2023-07-20T17:33:23Z

IMO the fix would be as easy as including stdout in RpkException.msg, i can self assign, I think we should keep this particular rpk cmd behaving the way it does, exiting 1 when there is an error is consistent with the other commands right?

Changes to rpk caused the COORDINATOR_NOT_AVAILABLE messages to be printed to stderr (makes sense) instead of stdout. Updated to check for this condition in stderr. Fixes: redpanda-data#12291 Signed-off-by: Noah Watkins <[email protected]>

Changes to rpk caused the COORDINATOR_NOT_AVAILABLE messages to be printed to stderr (makes sense) instead of stdout. Updated to check for this condition in stderr. Fixes: redpanda-data#12291 Signed-off-by: Noah Watkins <[email protected]> (cherry picked from commit 231d9e4)

rystsov · 2023-08-15T22:54:54Z

https://buildkite.com/redpanda/redpanda/builds/34840

piyushredpanda · 2023-09-26T05:25:03Z

Issue hasn't occurred for 2 months per Pandatriage; closing

NyaliaLui added kind/bug Something isn't working ci-failure labels Jul 18, 2023

graphcareful mentioned this issue Jul 20, 2023

Map cluster error codes to correct kafka ones on call to prefix_truncate #12283

Merged

7 tasks

michael-redpanda added the area/kafka label Jul 20, 2023

michael-redpanda self-assigned this Jul 20, 2023

michael-redpanda added the sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages label Jul 20, 2023

dotnwat mentioned this issue Jul 20, 2023

kafka: turn connection_context::dispatch_method_once into coroutine #12304

Merged

7 tasks

graphcareful self-assigned this Jul 20, 2023

michael-redpanda removed their assignment Jul 21, 2023

r-vasquez mentioned this issue Jul 21, 2023

[v23.2.x] rpk cloud byoc: pass --redpanda-id to the plugin #12391

Merged

BenPope mentioned this issue Jul 24, 2023

[v23.2.x] schema_registry: Improve sanitization of Avro namespaces #12418

Merged

andijcr mentioned this issue Jul 26, 2023

bytes/io_iterator_consumer: consume_to fix for std::backinserter #12439

Merged

7 tasks

dotnwat mentioned this issue Aug 2, 2023

[storage]: remove storage::log pimpl wrapper #12544

Merged

7 tasks

dotnwat mentioned this issue Aug 2, 2023

test: check stderr for coordinator error messages #12564

Merged

7 tasks

BenPope mentioned this issue Aug 3, 2023

kafka: Abort fetch and list_offsets when client disconnects #12021

Merged

8 tasks

piyushredpanda closed this as completed in #12564 Aug 3, 2023

rockwotj mentioned this issue Aug 8, 2023

Optimize send path error handling #12472

Merged

7 tasks

piyushredpanda mentioned this issue Aug 11, 2023

[v23.2.x] Hotfix/cstore escape hatch #12740

Merged

vbotbuildovich mentioned this issue Aug 11, 2023

[v23.2.x] CI Failure (RPK failed FindCoordinator req) in RpkGroupCommandsTest.test_group_describe #12741

Closed

rystsov reopened this Aug 15, 2023

piyushredpanda closed this as completed Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI Failure (RPK failed FindCoordinator req) in `RpkGroupCommandsTest.test_group_describe` #12291

CI Failure (RPK failed FindCoordinator req) in `RpkGroupCommandsTest.test_group_describe` #12291

NyaliaLui commented Jul 18, 2023

NyaliaLui commented Jul 18, 2023

rystsov commented Jul 19, 2023

michael-redpanda commented Jul 20, 2023

michael-redpanda commented Jul 20, 2023

graphcareful commented Jul 20, 2023

dotnwat commented Jul 20, 2023

twmb commented Jul 20, 2023

twmb commented Jul 20, 2023 •

edited

Loading

NyaliaLui commented Jul 20, 2023

graphcareful commented Jul 20, 2023 •

edited

Loading

rystsov commented Aug 15, 2023

piyushredpanda commented Sep 26, 2023

CI Failure (RPK failed FindCoordinator req) in RpkGroupCommandsTest.test_group_describe #12291

CI Failure (RPK failed FindCoordinator req) in RpkGroupCommandsTest.test_group_describe #12291

Comments

NyaliaLui commented Jul 18, 2023

NyaliaLui commented Jul 18, 2023

rystsov commented Jul 19, 2023

michael-redpanda commented Jul 20, 2023

michael-redpanda commented Jul 20, 2023

graphcareful commented Jul 20, 2023

dotnwat commented Jul 20, 2023

twmb commented Jul 20, 2023

twmb commented Jul 20, 2023 • edited Loading

NyaliaLui commented Jul 20, 2023

graphcareful commented Jul 20, 2023 • edited Loading

rystsov commented Aug 15, 2023

piyushredpanda commented Sep 26, 2023

CI Failure (RPK failed FindCoordinator req) in `RpkGroupCommandsTest.test_group_describe` #12291

CI Failure (RPK failed FindCoordinator req) in `RpkGroupCommandsTest.test_group_describe` #12291

twmb commented Jul 20, 2023 •

edited

Loading

graphcareful commented Jul 20, 2023 •

edited

Loading