Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot visit some http1.1 website under caddy+forwardproxy setup #537

Closed
Chilledheart opened this issue Jul 19, 2023 · 9 comments
Closed

Comments

@Chilledheart
Copy link
Contributor

Chilledheart commented Jul 19, 2023

I was using naiveproxy to visit some server website which have some server settings, such as:

I got a 403 error. Without the proxy, it returned OK. Call curl directly from the serer, it returned OK.

Checked with naiveproxy client:

[0719/145422.464929:WARNING:http_proxy_socket.cc(352)] GET http://www.haproxy.org/download/1.9/doc/proxy-protocol.txt HTTP/1.1
Host: www.haproxy.org
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Cookie: served=1:TCP:IPv6; served=1:TCP:IPv6

Checked with the server (with tcpdump):

07:57:33.793805 IP6 (flowlabel 0x8698c, hlim 64, next-header TCP (6) payload length: 561) epic-one.34614 > 2001:bc8:35ee:100::1.http: Flags [P.], cksum 0x4c27 (incorrect -> 0xd19a), seq 1:530, ack 1, win 43, options [nop,nop,TS val 4289579088 ecr 204303789], length 529: HTTP, length: 529
        GET http://www.haproxy.org/download/1.9/doc/proxy-protocol.txt HTTP/1.1
        Host: www.haproxy.org
        Cache-Control: max-age=0
        Upgrade-Insecure-Requests: 1
        User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
        Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
        Accept-Encoding: gzip, deflate
        Accept-Language: en-US,en;q=0.9
        Cookie: served=1:TCP:IPv6; served=1:TCP:IPv6



`.i..1.@ .......T....... ...5............6.P+.r.nU(....+L'.....
...P.-m.GET http://www.haproxy.org/download/1.9/doc/proxy-protocol.txt HTTP/1.1
Host: www.haproxy.org
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Cookie: served=1:TCP:IPv6; served=1:TCP:IPv6




07:57:33.948309 IP6 (hlim 54, next-header TCP (6) payload length: 252) 2001:bc8:35ee:100::1.http > epic-one.34614: Flags [P.], cksum 0x205f (correct), seq 1:221, ack 530, win 232, options [nop,nop,TS val 204303944 ecr 4289579088], length 220: HTTP, length: 220
        HTTP/1.1 403 Forbidden
        content-length: 93
        cache-control: no-cache
        content-type: text/html
        set-cookie: served=1:TCP:IPv6

        <html><body><h1>403 Forbidden</h1>
        Request forbidden by administrative rules.
        </body></html>
`......6 ...5........... .......T........P.6nU(.+.u..... _.....
.-nH...PHTTP/1.1 403 Forbidden
content-length: 93
cache-control: no-cache
content-type: text/html
set-cookie: served=1:TCP:IPv6

<html><body><h1>403 Forbidden</h1>
Request forbidden by administrative rules.
</body></html>

07:57:33.948362 IP6 (flowlabel 0x8698c, hlim 64, next-header TCP (6) payload length: 32) epic-one.34614 > 2001:bc8:35ee:100::1.http: Flags [.], cksum 0x4a16 (incorrect -> 0x2963), ack 221, win 43, options [nop,nop,TS val 4289579242 ecr 204303944], length 0
`.i.. .@ .......T....... ...5............6.P+.u.nU)....+J......
.....-nH
07:57:34.731645 IP6 (flowlabel 0x8698c, hlim 64, next-header TCP (6) payload length: 486) epic-one.34614 > 2001:bc8:35ee:100::1.http: Flags [P.], cksum 0x4bdc (incorrect -> 0x8f61), seq 530:984, ack 221, win 43, options [nop,nop,TS val 4289580026 ecr 204303944], length 454: HTTP, length: 454
        GET http://www.haproxy.org/favicon.ico HTTP/1.1
        Host: www.haproxy.org
        Proxy-Connection: keep-alive
        User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
        Accept: image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8
        Referer: http://www.haproxy.org/download/1.9/doc/proxy-protocol.txt
        Accept-Encoding: gzip, deflate
        Accept-Language: en-US,en;q=0.9
        Cookie: served=1:TCP:IPv6

`.i....@ .......T....... ...5............6.P+.u.nU)....+K......
.....-nHGET http://www.haproxy.org/favicon.ico HTTP/1.1
Host: www.haproxy.org
Proxy-Connection: keep-alive
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
Accept: image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8
Referer: http://www.haproxy.org/download/1.9/doc/proxy-protocol.txt
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Cookie: served=1:TCP:IPv6


07:57:34.886416 IP6 (hlim 54, next-header TCP (6) payload length: 252) 2001:bc8:35ee:100::1.http > epic-one.34614: Flags [P.], cksum 0x1661 (correct), seq 221:441, ack 984, win 240, options [nop,nop,TS val 204304882 ecr 4289580026], length 220: HTTP, length: 220
        HTTP/1.1 403 Forbidden
        content-length: 93
        cache-control: no-cache
        content-type: text/html
        set-cookie: served=1:TCP:IPv6

        <html><body><h1>403 Forbidden</h1>
        Request forbidden by administrative rules.
        </body></html>
`......6 ...5........... .......T........P.6nU).+.v......a.....
.-q.....HTTP/1.1 403 Forbidden
content-length: 93
cache-control: no-cache
content-type: text/html
set-cookie: served=1:TCP:IPv6

<html><body><h1>403 Forbidden</h1>
Request forbidden by administrative rules.
</body></html>

The naiveproxy had removed the header such as Proxy-Connection but the server (caddy+forwardproxy) added it again. I checked with forwardproxy code, it should be removed by design.
I don't know how it happened, the request is correct except for Referer request. So, still investigating.

caddy version:

v2.6.4 h1:2hwYqiRwk1tf3VruhMpLcYTg+11fCdr8S3jhNAdnPy8=

naiveproxy version:

naive 114.0.5735.91
@klzgrad
Copy link
Owner

klzgrad commented Aug 5, 2023

Root cause similar to http-party/node-http-proxy#529

@Chilledheart
Copy link
Contributor Author

Chilledheart commented Aug 5, 2023

nice catch, I can verify the below primitive patch works for this url

    auto absolute_prefix = buffer_.find("://", first_space + 1);
    if (absolute_prefix == std::string::npos) {
      ss << buffer_.substr(0, first_line_end) << "\r\n";
    } else {
      auto uri_start = buffer_.find('/', absolute_prefix + 3);
      ss << buffer_.substr(0, first_space) << " " << buffer_.substr(uri_start);
    }

EDIT: even better, taking care of redirect url which might contains 'http://' or 'https://' as well.

    std::ostringstream ss;
    // https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2
    auto second_space = buffer_.find(' ', first_space + 1);
    if (second_space == std::string::npos || second_space >= first_line_end) {
      LOG(WARNING) << "Invalid Http Request: " << buffer_.substr(0, first_line_end);
      return ERR_INVALID_ARGUMENT;
    }

    if (buffer_[first_space + 1] == '*' || buffer_[first_space + 1] == '/') {
      ss << buffer_.substr(0, first_line_end) << "\r\n";
    } else {
      auto absolute_prefix = buffer_.find("://", first_space + 1);
      if (absolute_prefix == std::string::npos || absolute_prefix >= second_space) {
        LOG(WARNING) << "Invalid Uri: " << buffer_.substr(first_space + 1, second_space);
        return ERR_INVALID_ARGUMENT;
      }
      auto uri_start = buffer_.find('/', absolute_prefix + 3);
      if (uri_start == std::string::npos || uri_start >= second_space) {
        ss << buffer_.substr(0, first_space) << " / " << buffer_.substr(second_space + 1);
      } else {
        ss << buffer_.substr(0, first_space) << " " << buffer_.substr(uri_start);
      }
    }

@klzgrad
Copy link
Owner

klzgrad commented Aug 5, 2023

I use GURL to parse the request URI. See if you can review the following:

size_t first_line_end = buffer_.find("\r\n");
size_t first_space = buffer_.find(' ');
bool is_http_1_0 = false;
if (first_space == std::string::npos || first_space + 1 >= first_line_end) {
LOG(WARNING) << "Invalid request: " << buffer_.substr(0, first_line_end);
return ERR_INVALID_ARGUMENT;
}
size_t second_space = buffer_.find(' ', first_space + 1);
if (second_space == std::string::npos || second_space >= first_line_end) {
LOG(WARNING) << "Invalid request: " << buffer_.substr(0, first_line_end);
return ERR_INVALID_ARGUMENT;
}
std::string method = buffer_.substr(0, first_space);
std::string uri =
buffer_.substr(first_space + 1, second_space - (first_space + 1));
std::string version =
buffer_.substr(second_space + 1, first_line_end - (second_space + 1));
if (method == HttpRequestHeaders::kConnectMethod) {
request_endpoint_ = HostPortPair::FromString(uri);
} else {
// postprobe endpoint handling
is_http_1_0 = true;
}
size_t second_line = first_line_end + 2;
HttpRequestHeaders headers;
std::string headers_str;
if (second_line < header_end) {
headers_str = buffer_.substr(second_line, header_end - second_line);
headers.AddHeadersFromString(headers_str);
}
if (is_http_1_0) {
GURL url(uri);
if (!url.is_valid()) {
LOG(WARNING) << "Invalid URI: " << uri;
return ERR_INVALID_ARGUMENT;
}
std::string host;
int port;
std::string host_str;
if (headers.GetHeader(HttpRequestHeaders::kHost, &host_str)) {
if (!ParseHostAndPort(host_str, &host, &port)) {
LOG(WARNING) << "Invalid Host: " << host_str;
return ERR_INVALID_ARGUMENT;
}
if (port == -1) {
port = 80;
}
} else {
if (!url.has_host()) {
LOG(WARNING) << "Missing host: " << uri;
return ERR_INVALID_ARGUMENT;
}
host = url.host();
port = url.EffectiveIntPort();
host_str = url.host();
if (url.has_port()) {
host_str.append(":").append(url.port());
}
headers.SetHeader(HttpRequestHeaders::kHost, host_str);
}
// Host is already known. Converts any absolute URI to relative.
uri = url.path();
if (url.has_query()) {
uri.append("?").append(url.query());
}
request_endpoint_.set_host(host);
request_endpoint_.set_port(port);
}
std::optional<PaddingType> padding_type = ParsePaddingHeaders(headers);
if (!padding_type.has_value()) {
return ERR_INVALID_ARGUMENT;
}
padding_detector_delegate_->SetClientPaddingType(*padding_type);
if (is_http_1_0) {
// Regenerates http header to make sure don't leak them to end servers
HttpRequestHeaders sanitized_headers = headers;
sanitized_headers.RemoveHeader(HttpRequestHeaders::kProxyConnection);
sanitized_headers.RemoveHeader(HttpRequestHeaders::kProxyAuthorization);
std::ostringstream ss;
ss << method << " " << uri << " " << version << "\r\n"
<< sanitized_headers.ToString();
if (buffer_.size() > header_end + 4) {
ss << buffer_.substr(header_end + 4);
}
buffer_ = ss.str();
// Skips padding write for raw http proxy
completed_handshake_ = true;
next_state_ = STATE_NONE;
return OK;
}
buffer_ = buffer_.substr(header_end + 4);
next_state_ = STATE_HEADER_WRITE;
return OK;
}

@Chilledheart
Copy link
Contributor Author

yeah, better.

@Chilledheart
Copy link
Contributor Author

https://github.com/klzgrad/naiveproxy/blob/c942f02b7cf2d8c9411fb052343adeba16dc3d89/src/net/tools/naive/http_proxy_server_socket.cc#L391C1-L395C6

I am not so sure, but what if path component could be empty in absoluteURI? what about:

    uri = url.has_path() ? url.path() : "/";
    if (url.has_query()) {
      uri.append("?").append(url.query());
    }

@Chilledheart
Copy link
Contributor Author

I cannot figure out what's the meaning by the comment of GURL:

  // Path, this is everything following the host name, stopping at the query of
  // ref delimiter (if any). Length will be -1 if unspecified. This includes
  // the preceeding slash, so the path on http://www.google.com/asdf" is
  // "/asdf". As a result, it is impossible to have a 0 length path, it will
  // be -1 in cases like "http://host?foo".
  // Note that we treat backslashes the same as slashes.

@klzgrad
Copy link
Owner

klzgrad commented Aug 6, 2023

@klzgrad klzgrad closed this as completed Aug 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants