Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle_response errors with "read on closed response body" #4298

Closed
polarathene opened this issue Aug 22, 2021 · 9 comments · Fixed by #4388
Closed

handle_response errors with "read on closed response body" #4298

polarathene opened this issue Aug 22, 2021 · 9 comments · Fixed by #4388
Labels
bug 🐞 Something isn't working
Milestone

Comments

@polarathene
Copy link

Following the docs for reverse_proxy directive with handle_response on a very basic JSON response endpoint results in failure, but modifying the header elsewhere in Caddy works as expected.

Caddyfile (v2.4.3):

:80 {
  
  # Works:
  #header Cache-Control "max-age=10"
  reverse_proxy localhost:8000 {
    handle_response {
      # Errors:
      header Cache-Control "max-age=10"
    }
    # Works:
    #header_down Cache-Control "max-age=10"
  }
}

Error logged:

2021/08/22 02:34:00.645 ERROR   http.handlers.reverse_proxy     reading from backend    {"error": "http: read on closed response body"}
2021/08/22 02:34:00.645 ERROR   http.handlers.reverse_proxy     aborting with incomplete response       {"error": "http: read on closed response body"}

The upstream still received the request and responded no differently AFAIK, in the web browser network tools Caddy has sent a response header but omitted the body.


For added context, intention was to match on Content-Type application/json rather than the URI; but AFAIK this is only available via the response, and thus can only be matched within handle_response? As the response matcher is not required to reproduce the error I have excluded it.

@polarathene
Copy link
Author

cURL response if useful:

curl -v http://localhost/json
*   Trying ::1:80...
* Connected to localhost (::1) port 80 (#0)
> GET /json HTTP/1.1
> Host: localhost
> User-Agent: curl/7.74.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Cache-Control: max-age=10
< Content-Length: 18
< Content-Type: application/json
< Date: Sun, 22 Aug 2021 02:16:35 GMT
< Server: Caddy
< Server: Rocket
< 
* transfer closed with 18 bytes remaining to read
* Closing connection 0
curl: (18) transfer closed with 18 bytes remaining to read

@francislavoie
Copy link
Member

That's not how handle_response works. It's meant to match some responses, and do something completely different instead of actually returning that response to the client. If you use handle_response with routes defined, then the response body will not be copied from the proxy response to the client response. As the docs say:

When this happens, the response from the backend is not written to the client, and the configured handle_response route will be executed instead, and it is up to that route to write a response.

Misuse aside, I'm not sure why that error is being logged. I can replicate the issue, but I'll need to dig further to understand why it's happening.

@francislavoie
Copy link
Member

francislavoie commented Aug 22, 2021

Alright I see the issue -- but again, what you were trying was not something we intended to support with this feature, it was designed to only handle empty responses OR just modifying the status code of a response (and those are mutually exclusive operations).

So the issue is that we call res.Body.Close() just before running the handle_response routes, since by that point we know we don't need the response anymore (as per the design of the feature), but after the routes are run, if there's no error emitted by the routes, reverseProxy() continues until it hits h.copyResponse() which attempts to read from the res.Body, which yields the error read on closed response body.

This only happens if res.Body was non-empty. If res.Body was empty, then no error is logged (which is the case we tested for). We hadn't tested for handling responses that did have a body, because that wasn't the intended usecase for this feature.


Examples (for both, run curl localhost:8881), notice how the only thing that changes is what the upstream responded with:

First, this one works fine, no errors, because :8882 returns no body (Caddy's default is to write an empty response)

{
	admin off
	debug
}

:8881 {
        reverse_proxy localhost:8882 {
                handle_response {
                        respond "handled" 200
                }
        }
}

:8882 {
        header Foo Bar
}

Secondly, this one emits errors, because the body is non-empty, and handle_response routes are defined:

{
	admin off
	debug
}

:8881 {
        reverse_proxy localhost:8882 {
                handle_response {
                        respond "handled" 200
                }
        }
}

:8882 {
        header Content-Type application/json
        respond `{"hello": "world"}`
}

As an aside @mholt I noticed that this section of code is probably no longer necessary, because of the changes we made to initial flushing (the stuff pulled from stdlib):

// some apps need the response headers before starting to stream content with http2,
// so it's important to explicitly flush the headers to the client before streaming the data.
// (see https://github.com/caddyserver/caddy/issues/3556 for use case and nuances)
if h.isBidirectionalStream(req, res) {
if wf, ok := rw.(http.Flusher); ok {
wf.Flush()
}
}

@francislavoie francislavoie added the bug 🐞 Something isn't working label Aug 22, 2021
@polarathene
Copy link
Author

We hadn't tested for handling responses that did have a body, because that wasn't the intended usecase for this feature.

https://caddyserver.com/docs/caddyfile/directives/reverse_proxy#intercepting-responses

When this happens, the response from the backend is not written to the client, and the configured handle_response route will be executed instead, and it is up to that route to write a response.

My understanding was that modifying the response header was valid, and as it was received by the client it is. What I did not expect was that the body would be dropped. These docs do not communicate that.

The docs presently communicate that handle_response route will write/modify the response to the client. It should probably be more clear that it only does so for a response headers and discards the associated response body.


Adding the header for JSON responses thus is not possible to match by Content-Type? I must ensure the route being handled is always a JSON response since Caddy will only allow matching to request headers if I want to preserve the body?

@francislavoie
Copy link
Member

francislavoie commented Aug 22, 2021

My understanding was that modifying the response header was valid, and as it was received by the client it is. What I did not expect was that the body would be dropped. These docs do not communicate that.

It absolutely does. From the docs:

When this happens, the response from the backend is not written to the client

That's exactly what that means.

A new response may be written, but it is not the response the from backend.


Adding the header for JSON responses thus is not possible to match by Content-Type?

It is possible to match, that's not the issue. The issue is that there's a response body, and that isn't supported currently (wasn't intended to be either).

The feature has one of two purposes:

  1. Conditionally replace the status code of the response.

  2. Take an empty response with special headers, and use those headers to write a response. The prototypical example for the is X-Accel-Redirect feature from Nginx that you can use this to mimic (i.e. backend tells Caddy via that header "actually, respond with this file from disk on my behalf, thanks")

Modifying the response in any other way is not an intended usecase.


I understand what you're trying to achieve, but all I'm saying is that's not currently supported by this feature, and would need some design discussion to determine the right approach to support. It's complicated.

@polarathene
Copy link
Author

It absolutely does.

I don't think that was communicated well.

The docs are clear that the backend response was not being returned directly to the client, but that handle_response would take the responsibility to write to the client, allowing for modification/replacement first.

A new response may be written, but it is not the response the from backend.

Why are the response headers carried over from the backend in this case then? (my backend Server header was retained) It's not a new/fresh response in that sense..

All I was aware of as a user, was that handle_response would allow intercepting the backend response to modify/transform the response in flight before sent back to the client.

That is what I observed with the response headers, I only expected this to be different with the example of file_server directive which clearly replaces the backend response with a different one. Dropping the backend response body and the associated error logged just left me confused, even after reviewing the quoted docs.


Modifying the response in any other way is not an intended usecase.

Understood, thanks! :)

all I'm saying is that's not currently supported by this feature, and would need some design discussion to determine the right approach to support. It's complicated.

No worries, I don't need this to be supported, it was just a misunderstanding. I can either match the request URI or have the backend add the header(s) in it's response for Caddy 👍

I do believe the docs could better communicate that the backend response headers persist, but the backend response body is discarded.

@francislavoie
Copy link
Member

The docs are clear that the backend response was not being returned directly to the client, but that handle_response would take the responsibility to write to the client

Right, that's all true, but you're making the assumption that there's implicit passthrough behaviour. There's not (except for the status code replacement case which is a bit different).

(And now that I say that I think we should change that status replacement case to use a subdirective called replace_status and deprecate passing a status code to handle_response, for clarity)

But sure. The wording can be improved.

Why are the response headers carried over from the backend in this case then? (my backend Server header was retained) It's not a new/fresh response in that sense..

That's part of the buggy behaviour, it's not supposed to. If you don't actually run a terminal handler that writes a response (e.g. file_server or respond or reverse_proxy) then it will fall through to the rest of the logic which would normally work, first copying the headers and status, but then dies when trying to copy the body because the body is closed.

All I was aware of as a user, was that handle_response would allow intercepting the backend response to modify/transform the response in flight before sent back to the client.

Like I said earlier, it's not so much "modify/transform", more "make a new response".

@simaotwx
Copy link
Contributor

By the way, I found that adding defer in the header config often solves the body-dropped issue:

# ...
header X-Some-Header example {
    defer
}
# ...

Just leaving this here in case someone is looking for it as this is closely related.

@francislavoie
Copy link
Member

@simaotwx I'm not sure what problem that fixes. What's a full config example where you saw that behaviour?

francislavoie added a commit that referenced this issue Oct 17, 2021
This should fix #4298.

Pretty obvious fix, just using a boolean to track whether we did close it ahead of time or not, before reaching the `h.copyResponse()` part.
francislavoie added a commit that referenced this issue Oct 19, 2021
Followup to #4298 and #4388.

This adds a new `copy_response` handler which may only be used in `reverse_proxy`'s `handle_response` routes, which can be used to actually copy the proxy response downstream. 

Previously, if `handle_response` was used (with routes, not the status code mode), it was impossible to use the upstream's response body at all, because we would always close the body, expecting the routes to write a new body from scratch.

To implement this, I had to refactor `h.reverseProxy()` to move all the code that came after the `HandleResponse` loop into a new function. This new function `h.finalizeResponse()` takes care of preparing the response by removing extra headers, dealing with trailers, then copying the headers and body downstream.

Since basically what we want `copy_response` to do is invoke `h.finalizeResponse()` at a configurable point in time, we need to pass down the proxy handler, the response, and some other state via a new `req.WithContext(ctx)`. Wrapping a new context is pretty much the only way we have to jump a few layers in the HTTP middleware chain and let a handler pick up this information. Feels a bit dirty, but it works.

Also fixed a bug with the `http.reverse_proxy.upstream.duration` placeholder, it always had the same duration as `http.reverse_proxy.upstream.latency`, but the former was meant to be the time taken for the roundtrip _plus_ copying/writing the response.
francislavoie added a commit that referenced this issue Oct 19, 2021
Followup to #4298 and #4388.

This adds a new `copy_response` handler which may only be used in `reverse_proxy`'s `handle_response` routes, which can be used to actually copy the proxy response downstream. 

Previously, if `handle_response` was used (with routes, not the status code mode), it was impossible to use the upstream's response body at all, because we would always close the body, expecting the routes to write a new body from scratch.

To implement this, I had to refactor `h.reverseProxy()` to move all the code that came after the `HandleResponse` loop into a new function. This new function `h.finalizeResponse()` takes care of preparing the response by removing extra headers, dealing with trailers, then copying the headers and body downstream.

Since basically what we want `copy_response` to do is invoke `h.finalizeResponse()` at a configurable point in time, we need to pass down the proxy handler, the response, and some other state via a new `req.WithContext(ctx)`. Wrapping a new context is pretty much the only way we have to jump a few layers in the HTTP middleware chain and let a handler pick up this information. Feels a bit dirty, but it works.

Also fixed a bug with the `http.reverse_proxy.upstream.duration` placeholder, it always had the same duration as `http.reverse_proxy.upstream.latency`, but the former was meant to be the time taken for the roundtrip _plus_ copying/writing the response.
francislavoie added a commit that referenced this issue Oct 20, 2021
Followup to #4298 and #4388.

This adds a new `copy_response` handler which may only be used in `reverse_proxy`'s `handle_response` routes, which can be used to actually copy the proxy response downstream. 

Previously, if `handle_response` was used (with routes, not the status code mode), it was impossible to use the upstream's response body at all, because we would always close the body, expecting the routes to write a new body from scratch.

To implement this, I had to refactor `h.reverseProxy()` to move all the code that came after the `HandleResponse` loop into a new function. This new function `h.finalizeResponse()` takes care of preparing the response by removing extra headers, dealing with trailers, then copying the headers and body downstream.

Since basically what we want `copy_response` to do is invoke `h.finalizeResponse()` at a configurable point in time, we need to pass down the proxy handler, the response, and some other state via a new `req.WithContext(ctx)`. Wrapping a new context is pretty much the only way we have to jump a few layers in the HTTP middleware chain and let a handler pick up this information. Feels a bit dirty, but it works.

Also fixed a bug with the `http.reverse_proxy.upstream.duration` placeholder, it always had the same duration as `http.reverse_proxy.upstream.latency`, but the former was meant to be the time taken for the roundtrip _plus_ copying/writing the response.
francislavoie added a commit that referenced this issue Oct 31, 2021
Followup to #4298 and #4388.

This adds a new `copy_response` handler which may only be used in `reverse_proxy`'s `handle_response` routes, which can be used to actually copy the proxy response downstream. 

Previously, if `handle_response` was used (with routes, not the status code mode), it was impossible to use the upstream's response body at all, because we would always close the body, expecting the routes to write a new body from scratch.

To implement this, I had to refactor `h.reverseProxy()` to move all the code that came after the `HandleResponse` loop into a new function. This new function `h.finalizeResponse()` takes care of preparing the response by removing extra headers, dealing with trailers, then copying the headers and body downstream.

Since basically what we want `copy_response` to do is invoke `h.finalizeResponse()` at a configurable point in time, we need to pass down the proxy handler, the response, and some other state via a new `req.WithContext(ctx)`. Wrapping a new context is pretty much the only way we have to jump a few layers in the HTTP middleware chain and let a handler pick up this information. Feels a bit dirty, but it works.

Also fixed a bug with the `http.reverse_proxy.upstream.duration` placeholder, it always had the same duration as `http.reverse_proxy.upstream.latency`, but the former was meant to be the time taken for the roundtrip _plus_ copying/writing the response.
@francislavoie francislavoie modified the milestones: v2.5.0, v2.4.6 Nov 9, 2021
francislavoie added a commit that referenced this issue Nov 30, 2021
Followup to #4298 and #4388.

This adds a new `copy_response` handler which may only be used in `reverse_proxy`'s `handle_response` routes, which can be used to actually copy the proxy response downstream. 

Previously, if `handle_response` was used (with routes, not the status code mode), it was impossible to use the upstream's response body at all, because we would always close the body, expecting the routes to write a new body from scratch.

To implement this, I had to refactor `h.reverseProxy()` to move all the code that came after the `HandleResponse` loop into a new function. This new function `h.finalizeResponse()` takes care of preparing the response by removing extra headers, dealing with trailers, then copying the headers and body downstream.

Since basically what we want `copy_response` to do is invoke `h.finalizeResponse()` at a configurable point in time, we need to pass down the proxy handler, the response, and some other state via a new `req.WithContext(ctx)`. Wrapping a new context is pretty much the only way we have to jump a few layers in the HTTP middleware chain and let a handler pick up this information. Feels a bit dirty, but it works.

Also fixed a bug with the `http.reverse_proxy.upstream.duration` placeholder, it always had the same duration as `http.reverse_proxy.upstream.latency`, but the former was meant to be the time taken for the roundtrip _plus_ copying/writing the response.
francislavoie added a commit that referenced this issue Dec 19, 2021
Followup to #4298 and #4388.

This adds a new `copy_response` handler which may only be used in `reverse_proxy`'s `handle_response` routes, which can be used to actually copy the proxy response downstream. 

Previously, if `handle_response` was used (with routes, not the status code mode), it was impossible to use the upstream's response body at all, because we would always close the body, expecting the routes to write a new body from scratch.

To implement this, I had to refactor `h.reverseProxy()` to move all the code that came after the `HandleResponse` loop into a new function. This new function `h.finalizeResponse()` takes care of preparing the response by removing extra headers, dealing with trailers, then copying the headers and body downstream.

Since basically what we want `copy_response` to do is invoke `h.finalizeResponse()` at a configurable point in time, we need to pass down the proxy handler, the response, and some other state via a new `req.WithContext(ctx)`. Wrapping a new context is pretty much the only way we have to jump a few layers in the HTTP middleware chain and let a handler pick up this information. Feels a bit dirty, but it works.

Also fixed a bug with the `http.reverse_proxy.upstream.duration` placeholder, it always had the same duration as `http.reverse_proxy.upstream.latency`, but the former was meant to be the time taken for the roundtrip _plus_ copying/writing the response.
francislavoie added a commit that referenced this issue Mar 7, 2022
Followup to #4298 and #4388.

This adds a new `copy_response` handler which may only be used in `reverse_proxy`'s `handle_response` routes, which can be used to actually copy the proxy response downstream. 

Previously, if `handle_response` was used (with routes, not the status code mode), it was impossible to use the upstream's response body at all, because we would always close the body, expecting the routes to write a new body from scratch.

To implement this, I had to refactor `h.reverseProxy()` to move all the code that came after the `HandleResponse` loop into a new function. This new function `h.finalizeResponse()` takes care of preparing the response by removing extra headers, dealing with trailers, then copying the headers and body downstream.

Since basically what we want `copy_response` to do is invoke `h.finalizeResponse()` at a configurable point in time, we need to pass down the proxy handler, the response, and some other state via a new `req.WithContext(ctx)`. Wrapping a new context is pretty much the only way we have to jump a few layers in the HTTP middleware chain and let a handler pick up this information. Feels a bit dirty, but it works.

Also fixed a bug with the `http.reverse_proxy.upstream.duration` placeholder, it always had the same duration as `http.reverse_proxy.upstream.latency`, but the former was meant to be the time taken for the roundtrip _plus_ copying/writing the response.
francislavoie added a commit that referenced this issue Mar 7, 2022
Followup to #4298 and #4388.

This adds a new `copy_response` handler which may only be used in `reverse_proxy`'s `handle_response` routes, which can be used to actually copy the proxy response downstream. 

Previously, if `handle_response` was used (with routes, not the status code mode), it was impossible to use the upstream's response body at all, because we would always close the body, expecting the routes to write a new body from scratch.

To implement this, I had to refactor `h.reverseProxy()` to move all the code that came after the `HandleResponse` loop into a new function. This new function `h.finalizeResponse()` takes care of preparing the response by removing extra headers, dealing with trailers, then copying the headers and body downstream.

Since basically what we want `copy_response` to do is invoke `h.finalizeResponse()` at a configurable point in time, we need to pass down the proxy handler, the response, and some other state via a new `req.WithContext(ctx)`. Wrapping a new context is pretty much the only way we have to jump a few layers in the HTTP middleware chain and let a handler pick up this information. Feels a bit dirty, but it works.

Also fixed a bug with the `http.reverse_proxy.upstream.duration` placeholder, it always had the same duration as `http.reverse_proxy.upstream.latency`, but the former was meant to be the time taken for the roundtrip _plus_ copying/writing the response.
mholt added a commit that referenced this issue Mar 9, 2022
…onse routes (#4391)

* reverseproxy: New `copy_response` handler for `handle_response` routes

Followup to #4298 and #4388.

This adds a new `copy_response` handler which may only be used in `reverse_proxy`'s `handle_response` routes, which can be used to actually copy the proxy response downstream. 

Previously, if `handle_response` was used (with routes, not the status code mode), it was impossible to use the upstream's response body at all, because we would always close the body, expecting the routes to write a new body from scratch.

To implement this, I had to refactor `h.reverseProxy()` to move all the code that came after the `HandleResponse` loop into a new function. This new function `h.finalizeResponse()` takes care of preparing the response by removing extra headers, dealing with trailers, then copying the headers and body downstream.

Since basically what we want `copy_response` to do is invoke `h.finalizeResponse()` at a configurable point in time, we need to pass down the proxy handler, the response, and some other state via a new `req.WithContext(ctx)`. Wrapping a new context is pretty much the only way we have to jump a few layers in the HTTP middleware chain and let a handler pick up this information. Feels a bit dirty, but it works.

Also fixed a bug with the `http.reverse_proxy.upstream.duration` placeholder, it always had the same duration as `http.reverse_proxy.upstream.latency`, but the former was meant to be the time taken for the roundtrip _plus_ copying/writing the response.

* Delete the "Content-Length" header if we aren't copying

Fixes a bug where the Content-Length will mismatch the actual bytes written if we skipped copying the response, so we get a message like this when using curl:

```
curl: (18) transfer closed with 18 bytes remaining to read
```

To replicate:

```
{
	admin off
	debug
}

:8881 {
	reverse_proxy 127.0.0.1:8882 {
		@200 status 200
		handle_response @200 {
			header Foo bar
		}
	}
}

:8882 {
	header Content-Type application/json
	respond `{"hello": "world"}` 200
}
```

* Implement `copy_response_headers`, with include/exclude list support

* Apply suggestions from code review

Co-authored-by: Matt Holt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants