Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google-cloud-php-storage contains critical data corruption in \Google\Cloud\Storage\Connection\Rest->downloadObject() #7864

Closed
indreka opened this issue Nov 27, 2024 · 2 comments

Comments

@indreka
Copy link
Contributor

indreka commented Nov 27, 2024

Environment details

  • OS: Linux
  • PHP version: 8.1.29
  • Package name and version: google-cloud-php-storage:v1.33.1

Steps to reproduce

  1. try to fetch files from storage bucket when storage.googleapis.com is having intermittent issues and returns server error 500 or 503 and response body like "We encountered an internal error. Please try again." for first request and actual response for next request.
  2. \Google\Cloud\Storage\Connection\Rest->downloadObject returns mangled file content where the beginning of actual file data is replaced with text "We encountered an internal error. Please try again."

Code example

$content = $storageAdapter->read("path/to/someFile")["contents"];

Problematic code in downloadObject function is following:

        $requestOptions['restRetryListener'] = function (
            \Exception $e,
            $retryAttempt,
            &$arguments
        ) use (
            $resultStream,
            $requestedBytes,
            $invocationId
        ) {
            // if the exception has a response for us to use
            if ($e instanceof RequestException && $e->hasResponse()) {
                $msg = (string) $e->getResponse()->getBody();

                $fetchedStream = Utils::streamFor($msg);

                // add the partial response to our stream that we will return
                Utils::copyToStream($fetchedStream, $resultStream);

                // Start from the byte that was last fetched
                $startByte = intval($requestedBytes['startByte']) + $resultStream->getSize();
                $endByte = $requestedBytes['endByte'];

                // modify the range headers to fetch the remaining data
                $arguments[1]['headers']['Range'] = sprintf('bytes=%s-%s', $startByte, $endByte);
                $arguments[0] = $this->modifyRequestForRetry($arguments[0], $retryAttempt, $invocationId);
            }
        };

It tries to handle all RequestException cases by remembering $e->getResponse()->getBody() and assuming that these were the correct bytes and then tries to resume fetching by asking the remaining data (skipping number of bytes already remembered).
As a result, whenever internal server occurs in a way that some content is returned (not just header but actual body), that said content is injected into the file.
We found out by getting intermittent image format exceptions when fetching files from storage bucket during partial outage and suddenly random images got their first 51 bytes replaced with "We encountered an internal error. Please try again.".

We were quite surprised that instead of throwing exception during storage.googleapis.com partial outage, that retry/resume logic started randomly replacing data like this.

Same problematic code exists in current latest 1.44 version:

https://github.com/googleapis/google-cloud-php-storage/blob/ebdec855364c1df9e81755e9626e3ff4687263f4/src/Connection/Rest.php#L321

I understand that the point of that code is to resume file download (useful when there is some network error when 990MB of 1000MB file is downloaded), but any non-network-errors should get ignored, not injected into file content silently.

@indreka
Copy link
Contributor Author

indreka commented Dec 30, 2024

Is there any additional information I could provide? Or how to proceed from here?
I tried to look at the code and I can see there is a test class RetryConformanceTest with some setup in retry_tests.json but unable to run them locally.
What I can see, is that there is no test-case to handle error 500 that contains "We encountered an internal error. Please try again.".
testDownloadAsStreamForStartBytesGiven seems to only test for happy-case return-broken-stream-after-256K and does not check any error 500 or anything else. Also it does not verify actual content, only content size, which based on our experience, allows for data corruption.

I can also see that in RequestProcessorTrait.php there is mapping:

            case Code::UNKNOWN:
                $exception = Exception\ServerException::class;
                break;

            case Code::INTERNAL:
                $exception = Exception\ServerException::class;
                break;

One possible solution is to handle the downloadObject code like this (cannot confirm with conformance tests):

if ($e instanceof RequestException && $e->hasResponse() && $e->getResponse()->getStatusCode() >= 200 && $e->getResponse()->getStatusCode() < 300) {

If response header has arrived, then it is the beginning of the response which means any following body should be completely ignored if it is not in success status range as per https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#informational_responses

@indreka
Copy link
Contributor Author

indreka commented Jan 10, 2025

Fix merged in PR #7970

@indreka indreka closed this as completed Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant