-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loki should not send "request was cancelled by the client" as an http response #4991
Comments
Thank @bboreham for the report. 499 is mostly for us to have a status code for SLO reason. I think the problem is this Line 127 in e573a4d
When we rollout this gets turned into 499, the frontend thinks it's a cancel, and propagate it back up (and actually cancel instead of retrying.). It should 500 and the frontend will retry. |
The root cause of the issue is that the connection to the ingester is canceled here https://github.com/grafana/dskit/blob/main/ring/client/pool.go#L130 because the health check of GRPC connection is failed and when another goroutine that serves the request from the user tries to close the connection, it leads to the error. |
Sorry, that may be the root cause of why you see a cancellation, but the error I described is in the logic of how errors are reported, not how they are caused. |
you are completely right, and I will fix the logic of how the error is reported, but I had to understand why the issue happens... And now I see that we can also fix the root cause of the issue by adding retry mechanism to health-check or something like this... |
Good find! Suggest filing that as another issue (in DSKit?). |
yes, I will report an issue in dskit. thanks. |
It is impossible for this to work if true - the way you cancel an http request is to close the socket, so if that has happened then the server cannot write the response.
But we do see such responses:
Such reports can come from here:
loki/pkg/util/server/error.go
Line 60 in e573a4d
One improvement is to go to all the places where it is called, like
loki/pkg/lokifrontend/frontend/transport/handler.go
Line 125 in e573a4d
and check if
r.Context()
has errored - if so then we should bail out and not write anything.If not, we know the client did not cancel; some internal condition caused this, so we can return a 500 or similar.
It appears this code was added in #4593 in response to #4502.
The text was updated successfully, but these errors were encountered: