Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reverseproxy: Sanitize scheme and host on incoming requests #4237

Merged
merged 3 commits into from
Oct 26, 2021

Conversation

francislavoie
Copy link
Member

@francislavoie francislavoie commented Jul 6, 2021

See https://caddy.community/t/caddy-attempts-to-connect-downstream-server-via-https-although-not-configured-to-do-so/12765 for context, and later another report of the same problem in https://caddy.community/t/reverse-proxying-trying-to-make-a-tls-connection-to-an-http-only-backend/13954

Basically some wacky HTTP clients may sometimes make HTTP requests with request-line like this:

GET https://example.com/some/path
Host: example.com

Most well behaved clients make requests like this:

GET /some/path
Host: example.com

Technically, both are valid as per HTTP RFCs, but the former ends up breaking some assumptions that the code makes, like that the scheme is empty before being passed onto the reverse_proxy module.

The effect was that these requests would tell http.Client to use a different scheme than configured. For example, configured reverse_proxy to proxy over HTTP, but the scheme in the original request's request-line had https:// so it would try to proxy over HTTPS 😱

The docs for Request.URL do say:

// For server requests, the URL is parsed from the URI
// supplied on the Request-Line as stored in RequestURI.  For
// most requests, fields other than Path and RawQuery will be
// empty. (See RFC 7230, Section 5.3)

So it's definitely pretty unexpected to receive the scheme and host in the URL.

This change should just wipe out the scheme and host in the URL, which are otherwise covered by Request.Host and Request.TLS (and that's how all of the code in Caddy access the host and whether the request was secured).

I realize this is a risky change because it affects every request handled by Caddy. We should definitely tread lightly here. But I think this is the most logical and simplest way to solve this. I've adjusted the implementation to only do the sanitization in the reverseproxy package, and to restore it after reverseproxy is done.

@francislavoie francislavoie added the bug 🐞 Something isn't working label Jul 6, 2021
@francislavoie francislavoie added this to the v2.4.4 milestone Jul 6, 2021
@francislavoie francislavoie requested a review from mholt July 6, 2021 20:54
@mholt mholt added the under review 🧐 Review is pending before merging label Jul 19, 2021
@francislavoie francislavoie modified the milestones: v2.4.4, v2.5.0 Aug 20, 2021
@mholt
Copy link
Member

mholt commented Aug 20, 2021

As discussed in Slack, going to push this back to the 2.5 tree because we're not sure yet what the side-effects could be. We want more time to investigate, and I'd be more comfortable pushing a riskier fix into a minor release than a patch release.

@mholt
Copy link
Member

mholt commented Sep 30, 2021

What if we just did this sanitization in the reverse proxy code instead of globally for the whole HTTP server?

Just thinking of something safer and less scary first.

@francislavoie
Copy link
Member Author

We could, but that would be no different because we don't use a copy of http.Request to proxy upstream, we use the same request, so any modification to it would end up being kinda the same as doing it earlier because anything that reads the request after reverse_proxy runs would see the modified request.

Did you get a chance to reach out to any Go people to see if they agree this makes sense?

@mholt
Copy link
Member

mholt commented Oct 1, 2021

We could, but that would be no different

The difference is that the modification would only happen for the purposes of proxying to the upstream. After the proxy has handled the request, no other handlers will be handling it, so I'm not too worried about the mutated Request. If that is a problem, those values are very easy to restore (unlike headers, which we do kind of "snapshot" anyway). We do that here:

// we will need the original headers and Host value if
// header operations are configured; and we should
// restore them after we're done if they are changed
// (for example, changing the outbound Host header
// should not permanently change r.Host; issue #3509)
reqHost := r.Host
reqHeader := r.Header
defer func() {
r.Host = reqHost // TODO: data race, see #4038
r.Header = reqHeader // TODO: data race, see #4038
}()

Granted, this is racey, and we could probably find a better way to handle this, but I'm just saying that making a more localized change is less scary and might be just as effective.

@francislavoie francislavoie force-pushed the sanitize-request branch 2 times, most recently from 63161ba to 233070c Compare October 16, 2021 14:02
@francislavoie francislavoie changed the title caddyhttp: Sanitize scheme and host on incoming requests reverseproxy: Sanitize scheme and host on incoming requests Oct 16, 2021
@francislavoie
Copy link
Member Author

@mholt I updated the PR to move this logic to the reverseproxy package.

I'm not a fan of all this code being right in ServeHTTP, but it kinda needs to be in there because of the defer for cleanup. I tried moving this stuff to prepareRequest and returning a closure to be called like defer cleanup() but it doesn't work because we still need the reqHeader backup inside the retry loop.

Maybe you have ideas on how to refactor this?

Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much less scary, thanks. I think this way of doing it is fine for now, maybe we can refactor later if we feel it would be beneficial.

My only request is minor, otherwise this is an approve. Good job.

modules/caddyhttp/reverseproxy/reverseproxy.go Outdated Show resolved Hide resolved
modules/caddyhttp/reverseproxy/reverseproxy.go Outdated Show resolved Hide resolved
Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 😃

@mholt mholt merged commit f73f55d into master Oct 26, 2021
@mholt mholt removed the under review 🧐 Review is pending before merging label Oct 26, 2021
@francislavoie francislavoie deleted the sanitize-request branch October 26, 2021 20:59
@francislavoie francislavoie modified the milestones: v2.5.0, v2.4.6 Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants