-
Notifications
You must be signed in to change notification settings - Fork 438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: attempt to only remember received content when the header is within success range by http spec #7970
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Could @vishwarajanand, @JesseLovelace or someone on the Storage team confirm that this is what we want? |
@indreka one question I have is - if the Request has a status code between 200 and 300, would it even throw an exception? I assume not, and so I think this logic may be too restrictive. |
@bshaffer I did a quick local test where in apache I ran a script that did:
Then I did a sample script I executed from command line:
Then I triggered the command line script and in the middle of sleep, I used kill -9 to "surprise-kill" apache instance and got this result. So response status is indeed 200 while body contains partial message and exception is thrown because curl detected network-level issue. Using direct curl_init + curl_exec the output is different from guzzle, body is not returned from curl_exec, meaning Guzzle does something different (did not dig too deep there): I am not versatile with the conformance test system used in this project so maybe someone can hint how to do this sort of network interrupt test case (if it is possible at all). |
Wow @indreka thank you for your very thorough investigation! This does seem to indicate that the fix you've provided will stil allow for chunks to be added as expected for certain types of exceptions. I'd still like to confirm with the Google Cloud Storage team (and/or compare with other Google Cloud Storage client libraries) that these are the status codes we want to select for before merging, but I agree that this seems like the correct solution. Thanks again! |
@indreka as you've mentioned, it's hard to see what the current Conformance tests are actually testing. I've commented out that block of code entirely, and all the conformance tests still pass. I am going to wait to hear from @frankyn or @saranshdhingra to see if they can give me more context as to what that block of code is supposed to be doing, and if it's being tested anywhere. |
I can guess the intent of the retry based on the code: if 1GB download gets network error at 950MB then it is much more efficient to only ask the last 50MB than the full 1GB (n-th 1GB retry may encounter error again etc). Two-fold result: allow graceful resuming for intermittent errors and avoid excessive bandwidth usage. The apache surprise-kill example proved that Guzzle throws exception with status 200 and partial body so for pretty cases the code works quite well. I will check if I manage to add simple unittest (not conformance case) to cover this fix. |
Added a unittest that shows how the partial resuming is expected to work and shows that when statuscode check blindly accepts error message as content, retry request tries to do partial request that skips bytes from beginning and ends up corrupting the data. |
Co-authored-by: Brent Shaffer <[email protected]>
Any other feedback or comments/suggestions? |
@indreka I agree the data corruption is a pretty bad issue with this behavior. We have a release later today, and I will work to get confirmation from the Storage team so that we can merge by then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I believe the fix is indeed correct and follows from other Storage libraries. For example in Python lib, (although a little different context of batches) there's a pattern of having the error code checked for being in the range 200 to 300.
Co-authored-by: Vishwaraj Anand <[email protected]>
Co-authored-by: Vishwaraj Anand <[email protected]>
Style updates merged. Is all set now from my side or should I squash all changes into single commit? |
Remembering "We encountered an internal error. Please try again." for http response 500, 503 etc will result in data corruption otherwise.