-
Notifications
You must be signed in to change notification settings - Fork 460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do
function panics when HTTPClient returns a nil response
#714
Do
function panics when HTTPClient returns a nil response
#714
Conversation
@agatamigalska-newstore Thanks a lot for opening a PR so quickly and adding a test. This one is quite specific to some Go internals so I'll defer to @brandur-stripe who can look next week and he'll follow up with feedback! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a crack at the fix! I left a few comments below, but a couple general ones as well:
- Do we need to handle this condition outside of the retry loop as well? There's very similar code to what you fixed on https://github.com/stripe/stripe-go/blob/master/stripe.go#L354 which might also be prone to the same problem (say # of retries is set to 0).
- In case this condition does occur, it'd be nice to produce some information logging about it. On the lines following the ones you changed you'll notice we log this:
if s.LogLevel > 0 {
s.Logger.Printf("Request failed with: %s (error: %v)\n", string(resBody), err)
}
I believe that under your implementation, this will just produce:
Request failed with: (error: )
Which isn't very good. We should set err
with errors.New
to a message like "Received nil response which may indicate a timeout).
Lastly, do you understand why exactly this problem is happening? Go's HTTP client is supposed to return an error when a request times out. I wrote a simple test program to demonstrate this:
package main
import (
"fmt"
"net/http"
"time"
)
func main() {
client := &http.Client{
Timeout: 1 * time.Millisecond,
}
resp, err := client.Get("http://example.com")
fmt.Printf("resp = %+v\n", resp)
fmt.Printf("err = %+v\n", err)
}
Result:
$ go run main.go
resp = <nil>
err = Get http://example.com: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
(And relatedly, I sort of suspect that the test you added may not be testing for the case you're hoping for given that it's probably just handling the normal error that gets returned in the normal case.)
stripe.go
Outdated
err error | ||
) | ||
// Checking res instead of err because s.HTTPClient.Do(req) has one special case for Go 1 compatibility | ||
// when both the response and an error are returned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we clarify this comment a bit? It should read something like "when both the response and error are returned as nil" I think?
Also, what exactly do you mean by "Go 1 compatibility"? All Go versions right now are Go 1 right? Do you mean Go 1.10 or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment was based on the following statement made in spripe-go's dependency go-1.10/src/net/http/client.go
(lines 602 - 605) :
// Special case for Go 1 compatibility: return both the response
// and an error if the CheckRedirect function failed.
// See https://golang.org/issue/3795
// The resp.Body has already been closed.
Current github repo don't have this comment anymore. Anyway, since I understood this library belongs to Go's core, I assumed that must make sense and didn't give it more thought. I still don't know what exactly they meant, so I updated my comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. I just took a look at that issue and it looks like they were talking about Go "1.0" (which was somewhat ambiguously just called "Go 1").
stripe_test.go
Outdated
@@ -118,6 +119,71 @@ func TestDo_Retry(t *testing.T) { | |||
assert.Equal(t, 2, requestNum) | |||
} | |||
|
|||
// Tests client retries when HTTP returns timeout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we either elaborate on this comment or remove it? e.g. You could add something like "Timeouts can result in an edge whereby both a request's result and error come back as nil, which can fail the retry infrastructure if it's not handling it."
stripe_test.go
Outdated
// To run only this test: | ||
// | ||
// go test . -run TestDo_RetryOnTimeout -test.v | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I think this "run only this test" comment was copied from TestDo_Retry
above. Let's just remove this part. I included it in TestDo_Retry
because it gives you a hint at how to better understand what the test is doing, but in general there's sufficient instruction for this type of thing in the README.
stripe_test.go
Outdated
// previously had regressions in this behavior as we switched to HTTP/2 | ||
// and `Request` became non-reusable, so we want to check it with every | ||
// request. | ||
assert.Equal(t, "bar", r.Form.Get("foo")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's strip out this assert
and just rely on the one in TestDo_Retry
above (it and the comment above it were copied from there).
stripe_test.go
Outdated
defer func() { | ||
var recovery = recover() | ||
assert.Nil(t, recovery) | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's sufficient to just let the test framework catch the panic for us and fail the run if one occurs. Let's strip out this defer
.
stripe_test.go
Outdated
// request. | ||
assert.Equal(t, "bar", r.Form.Get("foo")) | ||
time.Sleep(timeout) | ||
requestNum++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this is a little subtle, but there's a race here which is causing the Travis build to fail on this test. Check there for details, but you should also be able to reproduce it locally by using go test
with the -race
flag (which is what we do in Travis):
go test . -run TestDo_RetryOnTimeout -test.v -race
I believe what's happening is that you have multiple server Goroutines that are active in this handler concurrently. Because the client is timing out early, the first request is still active on the server when the second starts processing. The fix is pretty easy though: make sure to use a mutex around any variable accesses like this one that could be touched from multiple Goroutines.
@brandur-stripe Please see my update to the PR description, I added the initial motivation for this fix. For your general remarks:
I think it looks quite fine, doesn't it?
is probably confusing due to my misconception about Go principles. I thought it is a practice to return nil response if err is not nil, and not nil response if err is nil (like an XOR approach). |
The build for Go 1.8 has failed due to
Any ideas? |
@brandur-stripe Please review, thanks. |
@agatamigalska-newstore Ah! Okay, I misunderstood the premise of the PR a bit — re-reading the code now, I can see that we are not correctly handling the Which you mind changing your implementation a bit so that we very explicitly handle the case on diff --git a/stripe.go b/stripe.go
index e356177..dc6fcf1 100644
--- a/stripe.go
+++ b/stripe.go
@@ -351,17 +351,23 @@ func (s *BackendImplementation) Do(req *http.Request, body *bytes.Buffer, v inte
break
}
- resBody, err := ioutil.ReadAll(res.Body)
- res.Body.Close()
if err != nil {
if s.LogLevel > 0 {
- s.Logger.Printf("Cannot read response: %v\n", err)
+ s.Logger.Printf("Request failed with error: %v\n", err)
+ }
+ } else {
+ resBody, err := ioutil.ReadAll(res.Body)
+ res.Body.Close()
+ if err != nil {
+ if s.LogLevel > 0 {
+ s.Logger.Printf("Cannot read response: %v\n", err)
+ }
+ return err
}
- return err
- }
- if s.LogLevel > 0 {
- s.Logger.Printf("Request failed with: %s (error: %v)\n", string(resBody), err)
+ if s.LogLevel > 0 {
+ s.Logger.Printf("Request failed with: %s (error: %v)\n", string(resBody), err)
+ }
}
sleepDuration := s.sleepTime(retry) Regarding your test, I gave you a little bit of bad advice. Go actually has an atomic counter built in, so would you mind if we switched to just using that. Here's an example of using it, but your implementation should look something like this: var counter uint64
...
atomic.AddUint64(&counter, 1)
...
atomic.LoadUint64(&counter) Thanks! |
stripe_test.go
Outdated
var body = bytes.NewBufferString("foo=bar") | ||
var response testServerResponse | ||
|
||
// make sure that there was no panic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we strip this comment? I think it's somewhat self-explanatory that we don't want a panic.
stripe_test.go
Outdated
// timeout should not prevent retry | ||
assert.Equal(t, 2, counter.Get()) | ||
var recovery = recover() | ||
assert.Nil(t, recovery) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And can we take out the recover()
and subsequent assert? If there's a panic, the test run will fail automatically.
@agatamigalska-newstore Unfortunately an upstream change in Go's HTTP/2 package ended up breaking it for 1.8, and that took our build down with it. I dropped 1.8 from the build matrix in #716 and that's now merged, so you should be able to rebase on that to get things passing again. |
2515dfe
to
062c822
Compare
Thanks for your comments @brandur-stripe. The code is ready for the next round :) |
stripe.go
Outdated
return err | ||
} | ||
if s.LogLevel > 0 { | ||
s.Logger.Printf("Request failed with: %s (error: %v)\n", string(resBody), err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agatamigalska-newstore Sorry, I know this is code I gave you, but reading it again, I now realize that on this last log line err
will never be non-nil
because we're now handling that case above. It would also be kind of nice if we logged the response status code because that's one of the factors that the library will use to determine the retry.
Would you mind changing it to:
s.Logger.Printf("Request failed with body: %s (status: %v)\n", string(resBody), res.StatusCode)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And thanks for making all those other changes!
After this last one, this looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated as requested, this time I also read it instead of just copy-pasting ;)
stripe.go
Outdated
if s.LogLevel > 0 { | ||
s.Logger.Printf("Cannot read response: %v\n", err) | ||
} | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brandur-stripe This return statement was here before I started messing with this code, and I'm wondering - why not let it retry? It seems like the only case when (apart from shouldRetry() returning false) the retrial is not permitted, and I wonder why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that it wasn't originally retried because it's kind of an "error inside of an error" condition, and also probably somewhat unlikely to occur (usually once you have the response you'll get back a response body).
If you want, you can start retrying it as well since you're right in that it'd be strictly more correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the return statement and amended logged messages. Please have a look @brandur-stripe
Excellent! Thanks for all the work here. |
Released as 53.0.1. |
Bumps [sorbet](https://github.com/sorbet/sorbet) from 0.5.10346 to 0.5.10382. - [Release notes](https://github.com/sorbet/sorbet/releases) - [Commits](https://github.com/sorbet/sorbet/commits) --- updated-dependencies: - dependency-name: sorbet dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This PR covers the scenario when
res, err = s.HTTPClient.Do(req)
results inres
that isnil
(https://github.com/stripe/stripe-go/blob/master/stripe.go#L341 ). This can happen for example in case of timeout.Do
function an additional check was added to prevent panicCloses #713
[EDIT] Initial motivation:
Integration tests in the codebase I'm working on are randomly failing with the following stack trace:
In this PR I'm trying to capture the situation in which this panic can happen in client.go:38. The best hypothesis was that res is nil, and so the attempt to access res.Body causes a panic.
The goal in the test was to simulate the scenario where HttpClient.Do() function returns nil response.
The timeout scenario for the test was chosen simply because, given the infrastructure used for testing stripe-go, it was the easiest to simulate.