-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set a NopCloser request body with retry middleware #1016
Conversation
Hey @bamarni , first of all, thanks for your contribution 👍 I think it's fine for the moment. If we encounter an issue, where we encounter them send by upstream, we can still iterate over it. Could you please rebase your branch with the current master? LGTM 👼 /cc @containous/traefik |
Hi @SantoDE , you're welcome 😸 Maybe a cleaner solution would be to pass an explicit error handler to the oxy forwarder, which instead of passing the dial error in the http response would use the request context for example, then we would only retry for this specific error, what do you think? |
Hi @bamarni , that would probably be even better! The oxy forwarder already supports passing it an error handler so why not? :-) |
Agreed, I'll check this out soon. |
5e53ade
to
03c025c
Compare
Hey @bamarni , sadly, you have to rebase again :( For your commits, I'm okay with splitting it in 2 commits, if the fixing commit has a reference to what it fixes in the message :-) ping @containous/traefik |
sure, just rebased |
Thanks. Now it looks really LGTM 👼 /ping @containous/traefik |
LGTM 👍 |
middlewares/retry.go
Outdated
attempts := 1 | ||
for { | ||
recorder := NewRecorder() | ||
recorder.responseWriter = rw | ||
retry.next.ServeHTTP(recorder, r) | ||
if !isNetworkError(recorder.Code) || attempts >= retry.attempts { | ||
if recorder.Error == nil || attempts >= retry.attempts { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bamarni are you sure you only have network errors in there? Maybe I'm wrong but I don't see the filter anymore (even in the ErrorHandler
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for double checking. The previous filter was looking at response status as it's how the default oxy error handler sets it, but my error handler would pass the error as a new field in the recorder struct. If this field is not nil then it comes from the transport RoundTrip call, meaning it is a network error.
However what I'm not sure about is whether the next attempt would be properly forwarded to the next backend and not the same one, I guess this depends on the load balancer configuration? If it's a sticky session for instance retrying wouldn't really make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this field is not nil then it comes from the transport RoundTrip call, meaning it is a network error.
I'm really not sure about that. In oxy, err
is not null at multiple places:
- https://github.com/containous/oxy/blob/master/forward/fwd.go#L164
- https://github.com/containous/oxy/blob/master/forward/fwd.go#L198
- https://github.com/containous/oxy/blob/master/forward/fwd.go#L265
- https://github.com/containous/oxy/blob/master/forward/fwd.go#L278
- https://github.com/containous/oxy/blob/master/forward/fwd.go#L288
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you consider these as network errors too? At least from my pov, if the forwarder doesn't manage to send a request or get a proper response from upstream for whatever reason I'd see it as a network error, as opposed to a valid http response from upstream with a 5XX status code, which would currently be retried.
Shall I stick to the first commit and open an other issue to discuss about the exact behaviour we want for the retry feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you consider these as network errors too?
Not all these errors are network errors. For example https://github.com/containous/oxy/blob/master/forward/fwd.go#L278.
Therefore, I think it would be better to revert to the old test if !isNetworkError(recorder.Code) || attempts >= retry.attempts {
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reverted it, however please note that it wouldn't change the behaviour for errors you're mentioning, they'd still lead to retries.
The errors I was tackling with this change are instead explicit http errors from upstream, to compare this according to proxy_next_upstream semantics, currently the behaviour is :
error timeout http_502 http_504 non_idempotent
And with my changes it would be :
error timeout non_idempotent
The motivation behind this is that imo http_5XX + non_idempotent
is a bit aggressive for a default. If upstream sends http 504 for a POST request it still might have altered a resource but didn't manage to serve the response in a timely manner, and retrying might not be the best option, at least hard to assume. Maybe even non_idempotent
shouldn't be by default? Or the retry behaviour should be configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
It seems like there are random failures in the build 😿 For example mine failed on There are multiple places with this kind of random wait period in the test suite, which are subject to failures. How about introducing generic helpers, for example |
@bamarni |
@emilevauge : cool, let me know if you need extra help. As per this PR I've rebased it. |
As the http client always closes the request body, this makes sure the request can be retried if needed. Fixes traefik#1008
ecabf0b
to
86fd5b4
Compare
fixes #1008