Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (TimeoutError) in CloudStorageChunkReadTest.test_read_when_segment_size_smaller_than_chunk_size #11151

Closed
BenPope opened this issue Jun 1, 2023 · 5 comments · Fixed by #11161
Assignees
Labels
area/cloud-storage Shadow indexing subsystem ci-failure kind/bug Something isn't working

Comments

@BenPope
Copy link
Member

BenPope commented Jun 1, 2023

https://buildkite.com/redpanda/redpanda/builds/30349#018876e1-c48f-4b3f-9d5f-b75f75f1b0a8
https://buildkite.com/redpanda/redpanda/builds/30374#018877fa-3bc4-4373-82f9-2925e566f843
https://buildkite.com/redpanda/redpanda/builds/30353#0188770b-f994-4fd5-95ac-cbf5f7c83cde

Module: rptest.tests.cloud_storage_chunk_read_path_test
Class:  CloudStorageChunkReadTest
Method: test_read_when_segment_size_smaller_than_chunk_size
test_id:    rptest.tests.cloud_storage_chunk_read_path_test.CloudStorageChunkReadTest.test_read_when_segment_size_smaller_than_chunk_size
status:     FAIL
run time:   6 minutes 18.561 seconds


    TimeoutError(None)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 79, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/cloud_storage_chunk_read_path_test.py", line 234, in test_read_when_segment_size_smaller_than_chunk_size
    rand_cons.wait(timeout_sec=300)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/services/service.py", line 261, in wait
    if not self.wait_node(node, end - now):
  File "/root/tests/rptest/services/kgo_verifier_services.py", line 181, in wait_node
    self._redpanda.wait_until(lambda: self._status.active is False or self.
  File "/root/tests/rptest/services/redpanda.py", line 1979, in wait_until
    wait_until(wrapped,
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 57, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: None
FAIL test: CloudStorageChunkReadTest.test_read_when_segment_size_smaller_than_chunk_size (3/195 runs)
  failure at 2023-06-01T13:17:55.416Z: TimeoutError(None)
      in job https://buildkite.com/redpanda/redpanda/builds/30349#018876e1-c48f-4b3f-9d5f-b75f75f1b0a8
@BenPope
Copy link
Member Author

BenPope commented Jun 1, 2023

The timeouts seem to be rpc

TRACE 2023-06-01 13:28:54,543 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:28:54,577 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:28:54,801 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:28:54,805 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:28:54,888 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:28:54,892 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:28:55,031 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:28:55,033 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:01,985 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:01,986 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:02,152 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:02,153 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:02,511 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:02,517 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:02,632 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:02,643 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:02,987 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:03,014 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:03,331 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:03,354 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:03,910 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:03,912 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:04,219 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:04,374 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:04,536 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:04,547 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:04,699 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:29:04,701 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:31:06,236 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:31:06,359 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:31:33,088 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:31:33,149 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:31:44,823 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:31:44,882 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:31:48,165 [shard 1] rpc - error connecting to 172.16.16.2:9000 - seastar::timed_out_error (timedout)
TRACE 2023-06-01 13:31:48,170 [shard 1] rpc - Connection error: seastar::timed_out_error (timedout)

172.16.16.2 is minio-s3. There doesn't seem to be logs for that.

Locally I run the test 18 times:

ducktape version: 0.8.8
session_id:       2023-06-01--024
run time:         2 minutes 55.721 seconds
tests run:        18
passed:           18
failed:           0
ignored:          0
opassed:          0
ofailed:          0

Which is 9 of them in parallel, twice. So they take around 1min each.

@twmb
Copy link
Contributor

twmb commented Jun 2, 2023

@abhijat
Copy link
Contributor

abhijat commented Jun 2, 2023

https://buildkite.com/redpanda/redpanda/builds/30349#018876e1-c48f-4b3f-9d5f-b75f75f1b0a8 : timeout after 1719 messages
https://buildkite.com/redpanda/redpanda/builds/30374#018877fa-3bc4-4373-82f9-2925e566f843: timeout after 2000 messages
https://buildkite.com/redpanda/redpanda/builds/30353#0188770b-f994-4fd5-95ac-cbf5f7c83cde: timeout after 1800 messages
https://buildkite.com/redpanda/redpanda/builds/30361#01887794-3986-4462-aa01-2a184ec69d8f: timeout after 2000
messages

the test expects to be able to read 2500 messages in 5 minutes with a 16MiB cache. This often does not seem possible especially in debug mode.

This is a case of adjusting the test expectations, the test is simply to assert that reads work as expected when the segment size (512KiB) is below chunk size (16MiB) which does seem to work.

I will change the test expectations by lowering the number of random consumers.

@ztlpn
Copy link
Contributor

ztlpn commented Jun 14, 2023

@ztlpn ztlpn reopened this Jun 14, 2023
@abhijat
Copy link
Contributor

abhijat commented Jul 11, 2023

The specific failure was an instance of #6399

The consumer did not send fetch requests to redpanda:

docker-rp-6/redpanda.log:TRACE 2023-06-13 11:31:03,658 [shard 0] kafka - requests.cc:94 - [172.16.16.13:53280] processing name:api_versions, key:18, version:3 for KgoVerifierSeqConsumer-0-140346613486816, mem_units: 8128, ctx_size: 13
docker-rp-6/redpanda.log:TRACE 2023-06-13 11:31:03,663 [shard 0] kafka - requests.cc:94 - [172.16.16.13:53280] processing name:metadata, key:3, version:7 for KgoVerifierSeqConsumer-0-140346613486816, mem_units: 30822, ctx_size: 18
docker-rp-6/redpanda.log:TRACE 2023-06-13 11:31:03,669 [shard 1] kafka - requests.cc:94 - [172.16.16.13:53286] processing name:api_versions, key:18, version:3 for KgoVerifierSeqConsumer-0-140346613486816, mem_units: 8128, ctx_size: 13
docker-rp-6/redpanda.log:TRACE 2023-06-13 11:31:03,673 [shard 1] kafka - requests.cc:94 - [172.16.16.13:53286] processing name:metadata, key:3, version:7 for KgoVerifierSeqConsumer-0-140346613486816, mem_units: 30822, ctx_size: 18
docker-rp-6/redpanda.log:TRACE 2023-06-13 11:31:03,696 [shard 0] kafka - requests.cc:94 - [172.16.16.13:53302] processing name:api_versions, key:18, version:3 for KgoVerifierSeqConsumer-0-140346613486816, mem_units: 8128, ctx_size: 13
docker-rp-6/redpanda.log:TRACE 2023-06-13 11:31:03,700 [shard 0] kafka - requests.cc:94 - [172.16.16.13:53302] processing name:metadata, key:3, version:7 for KgoVerifierSeqConsumer-0-140346613486816, mem_units: 30822, ctx_size: 18
docker-rp-9/redpanda.log:TRACE 2023-06-13 11:31:03,679 [shard 0] kafka - requests.cc:94 - [172.16.16.13:42624] processing name:api_versions, key:18, version:3 for KgoVerifierSeqConsumer-0-140346613486816, mem_units: 8128, ctx_size: 13
docker-rp-9/redpanda.log:TRACE 2023-06-13 11:31:03,682 [shard 0] kafka - requests.cc:94 - [172.16.16.13:42624] processing name:list_offsets, key:2, version:4 for KgoVerifierSeqConsumer-0-140346613486816, mem_units: 8184, ctx_size: 42

@abhijat abhijat closed this as completed Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem ci-failure kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants