Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-14523: [C++] Fix potential data loss in S3 multipart upload #11594

Closed
wants to merge 6 commits into from

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Nov 2, 2021

Work around a critical bug in the AWS SDK for C++ that fails to detect errors returned by CompleteMultipartUpload in the body of a 200 OK response:
aws/aws-sdk-cpp#658

@github-actions
Copy link

github-actions bot commented Nov 2, 2021

@github-actions
Copy link

github-actions bot commented Nov 2, 2021

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I notice their own workaround looked for a case where the root element was correct but the first child was Error which isn't handled here. I'm not sure what they were after and I couldn't find any documentation that such a response was possible so I think what you have is correct but figured I'd mention it in case you'd seen something like that before.

Comment on lines +608 to +610
stream.clear();
stream.seekg(pos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this is redundant since you cleared above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetErrorMarshaller()->Marshall will involve XML parsing over the response stream again.

cpp/src/arrow/filesystem/s3fs.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/s3fs.cc Outdated Show resolved Hide resolved
@pitrou
Copy link
Member Author

pitrou commented Nov 3, 2021

@github-actions crossbow submit wheel

@github-actions
Copy link

github-actions bot commented Nov 3, 2021

Revision: 9ca9fd4

Submitted crossbow builds: ursacomputing/crossbow @ actions-1076

Task Status
verify-rc-wheels-linux-amd64 Github Actions
verify-rc-wheels-macos-10.15-amd64 Github Actions
verify-rc-wheels-macos-11-amd64 Github Actions
verify-rc-wheels-macos-11-arm64 Github Actions
verify-rc-wheels-windows Github Actions
wheel-macos-big-sur-cp310-arm64 Github Actions
wheel-macos-big-sur-cp310-universal2 Github Actions
wheel-macos-big-sur-cp38-arm64 Github Actions
wheel-macos-big-sur-cp39-arm64 Github Actions
wheel-macos-big-sur-cp39-universal2 Github Actions
wheel-macos-high-sierra-cp310-amd64 Github Actions
wheel-macos-high-sierra-cp36-amd64 Github Actions
wheel-macos-high-sierra-cp37-amd64 Github Actions
wheel-macos-high-sierra-cp38-amd64 Github Actions
wheel-macos-high-sierra-cp39-amd64 Github Actions
wheel-macos-mavericks-cp310-amd64 Github Actions
wheel-macos-mavericks-cp36-amd64 Github Actions
wheel-macos-mavericks-cp37-amd64 Github Actions
wheel-macos-mavericks-cp38-amd64 Github Actions
wheel-macos-mavericks-cp39-amd64 Github Actions
wheel-manylinux2010-cp310-amd64 Github Actions
wheel-manylinux2010-cp36-amd64 Github Actions
wheel-manylinux2010-cp37-amd64 Github Actions
wheel-manylinux2010-cp38-amd64 Github Actions
wheel-manylinux2010-cp39-amd64 Github Actions
wheel-manylinux2014-cp310-amd64 Github Actions
wheel-manylinux2014-cp310-arm64 TravisCI
wheel-manylinux2014-cp36-amd64 Github Actions
wheel-manylinux2014-cp36-arm64 TravisCI
wheel-manylinux2014-cp37-amd64 Github Actions
wheel-manylinux2014-cp37-arm64 TravisCI
wheel-manylinux2014-cp38-amd64 Github Actions
wheel-manylinux2014-cp38-arm64 TravisCI
wheel-manylinux2014-cp39-amd64 Github Actions
wheel-manylinux2014-cp39-arm64 TravisCI
wheel-windows-cp310-amd64 Github Actions
wheel-windows-cp36-amd64 Github Actions
wheel-windows-cp37-amd64 Github Actions
wheel-windows-cp38-amd64 Github Actions
wheel-windows-cp39-amd64 Github Actions

@pitrou
Copy link
Member Author

pitrou commented Nov 3, 2021

I notice their own workaround looked for a case where the root element was correct but the first child was Error which isn't handled here.

Hmm, I may try to detect that indeed.

long long) { // NOLINT runtime/int
auto& stream = http_resp->GetResponseBody();
const auto pos = stream.tellg();
const auto doc = Aws::Utils::Xml::XmlDocument::CreateFromXmlStream(stream);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's possible to get a partial XML response from S3 due to network failure, does this handle that and throw a 5xx/retry or will an XML error be thrown back up? https://github.com/boto/botocore/blob/04d1fae43b657952e49b21d16daa86378ddb4253/botocore/handlers.py#L141
Per AWS:

# In cases of network disruptions, we may end up with a partial
# streamed response from S3. We need to treat these cases as
# 500 Service Errors and try again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be normally detected as an error by the AWS SDK, so no need to special-case this on our own (hopefully).

@pitrou pitrou force-pushed the ARROW-14523-s3-cmu-error-fix branch from 9ca9fd4 to 958dc16 Compare November 3, 2021 16:07
@pitrou
Copy link
Member Author

pitrou commented Nov 3, 2021

@github-actions crossbow submit wheel--cp39-

@github-actions
Copy link

github-actions bot commented Nov 3, 2021

Revision: 958dc16

Submitted crossbow builds: ursacomputing/crossbow @ actions-1078

Task Status
wheel-macos-big-sur-cp39-arm64 Github Actions
wheel-macos-big-sur-cp39-universal2 Github Actions
wheel-macos-high-sierra-cp39-amd64 Github Actions
wheel-macos-mavericks-cp39-amd64 Github Actions
wheel-manylinux2010-cp39-amd64 Github Actions
wheel-manylinux2014-cp39-amd64 Github Actions
wheel-manylinux2014-cp39-arm64 TravisCI
wheel-windows-cp39-amd64 Github Actions

@pitrou
Copy link
Member Author

pitrou commented Nov 3, 2021

@seittema Could you try out one the wheels produced just above? You'll find the download URL in the "upload artifacts" step of the corresponding build.

@kszucs
Copy link
Member

kszucs commented Nov 4, 2021

@pitrou could you please remove the excessive logging so we can cut the first release candidate for 6.0.1?

We can address additional problems during the verification process and create a new release candidate if required.

@pitrou
Copy link
Member Author

pitrou commented Nov 4, 2021

Will do.

Work around a critical bug in the AWS SDK for C++ that fails to detect errors
returned by CompleteMultipartUpload in the body of a 200 OK response:
aws/aws-sdk-cpp#658
@pitrou pitrou force-pushed the ARROW-14523-s3-cmu-error-fix branch from 958dc16 to 87f7573 Compare November 4, 2021 14:01
@pitrou pitrou closed this in 3626a08 Nov 4, 2021
@ursabot
Copy link

ursabot commented Nov 4, 2021

Benchmark runs are scheduled for baseline = 0ead7c9 and contender = 3626a08. 3626a08 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.31% ⬆️0.62%] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

@pitrou pitrou deleted the ARROW-14523-s3-cmu-error-fix branch November 4, 2021 15:15
kszucs pushed a commit that referenced this pull request Nov 4, 2021
Work around a critical bug in the AWS SDK for C++ that fails to detect errors returned by CompleteMultipartUpload in the body of a 200 OK response:
aws/aws-sdk-cpp#658

Closes #11594 from pitrou/ARROW-14523-s3-cmu-error-fix

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants