Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (BadLogLines ptree_bad_path (No such node (DeleteResult)) in ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy #14014

Closed
andijcr opened this issue Oct 6, 2023 · 8 comments · Fixed by #14180 or #14848
Assignees
Labels
area/cloud-storage Shadow indexing subsystem ci-failure kind/bug Something isn't working sev/medium Bugs that do not meet criteria for high or critical, but are more severe than low.

Comments

@andijcr
Copy link
Contributor

andijcr commented Oct 6, 2023

https://buildkite.com/redpanda/vtools/builds/9804#018afee9-476c-416b-b96d-5a2a2d6f674c

Module: rptest.tests.e2e_shadow_indexing_test
Class:  ShadowIndexingWhileBusyTest
Method: test_create_or_delete_topics_while_busy
Arguments:
{
  "cloud_storage_type": 1,
  "short_retention": true
}


  | <BadLogLines nodes=ip-172-31-2-8(2),ip-172-31-6-244(2),ip-172-31-5-105(2),ip-172-31-8-73(2),ip-172-31-12-62(2),ip-172-31-0-120(2) example="ERROR 2023-10-05 09:50:29,100 [shard 2:au  ] s3 - s3_client.cc:889 - DeleteObjects response parse failed: boost::wrapexcept<boost::property_tree::ptree_bad_path> (No such node (DeleteResult))"> Traceback (most recent call last):   File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run     data = self.run_test()   File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test     return self.test_context.function(self.test)   File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper     return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)   File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 142, in wrapped     redpanda.raise_on_bad_logs(   File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1244, in raise_on_bad_logs     raise BadLogLines(bad_lines) rptest.services.utils.BadLogLines: <BadLogLines nodes=ip-172-31-2-8(2),ip-172-31-6-244(2),ip-172-31-5-105(2),ip-172-31-8-73(2),ip-172-31-12-62(2),ip-172-31-0-120(2) example="ERROR 2023-10-05 09:50:29,100 [shard 2:au  ] s3 - s3_client.cc:889 - DeleteObjects response parse failed: boost::wrapexcept<boost::property_tree::ptree_bad_path> (No such node (DeleteResult))">
-- | --



        {
            "title": "<BadLogLines nodes=ip-172-31-2-8(2),ip-172-31-6-244(2),ip-172-31-5-105(2),ip-172-31-8-73(2),ip-172-31-12-62(2),ip-172-31-0-120(2) example=\"ERROR 2023-10-05 09:50:29,100 [shard 2:au  ] s3 - s3_client.cc:889 - DeleteObjects response parse failed: boost::wrapexcept<boost::property_tree::ptree_bad_path> (No such node (DeleteResult))\">",
            "id": 15190,
            "ts": 1696520189.320468,
            "type": "cdt",
            "build": "release",
            "arch": "amd64",
            "link": "https://buildkite.com/redpanda/vtools/builds/9804"
        }
@andijcr andijcr added kind/bug Something isn't working ci-failure area/cloud-storage Shadow indexing subsystem sev/medium Bugs that do not meet criteria for high or critical, but are more severe than low. labels Oct 6, 2023
@andijcr
Copy link
Contributor Author

andijcr commented Oct 6, 2023

sev/medium because the exception may be a symptom that we are not correctly handling an error response

@piyushredpanda piyushredpanda assigned Lazin and abhijat and unassigned Lazin Oct 8, 2023
@abhijat
Copy link
Contributor

abhijat commented Oct 10, 2023

It looks like the 200 OK response contains the error (quite possibly slowdown as there are many of them in the broker logs).

Some AWS operations can return an error with 200 OK status eg https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html

If the error occurs during the copy operation, the error response is embedded in the 200 OK response. This means that a 200 OK response can contain either a success or an error. If you call the S3 API directly, make sure to design your application to parse the contents of the response and handle it appropriately. If you use AWS SDKs, SDKs handle this condition. 

The DeleteObjects API does not mention this but quite possibly we are facing the same issue here. The AWS rust SDK has a very similar issue recently:

awslabs/aws-sdk-rust#873

We might need to handle the 200 response as potentially containing an error. AFAIK there have been some changes to this area recently.

@rockwotj
Copy link
Contributor

This happened again yesterday on nightly: https://buildkite.com/redpanda/vtools/builds/10097

@ztlpn
Copy link
Contributor

ztlpn commented Oct 25, 2023

@ztlpn ztlpn reopened this Oct 25, 2023
@vbotbuildovich
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem ci-failure kind/bug Something isn't working sev/medium Bugs that do not meet criteria for high or critical, but are more severe than low.
Projects
None yet
6 participants