Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add http proxy support to request_uri() #112

Merged
merged 4 commits into from
Dec 23, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 34 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ Production ready.

* [new](#new)
* [connect](#connect)
* [connect_proxy](#connect_proxy)
* [set_proxy_options](#set_proxy_options)
* [set_timeout](#set_timeout)
* [set_timeouts](#set_timeouts)
* [ssl_handshake](#ssl_handshake)
Expand Down Expand Up @@ -158,6 +160,24 @@ An optional Lua table can be specified as the last argument to this method to sp
* `pool`
: Specifies a custom name for the connection pool being used. If omitted, then the connection pool name will be generated from the string template `<host>:<port>` or `<unix-socket-path>`.

## connect_proxy

`syntax: ok, err = httpc:connect_proxy(proxy_uri, scheme, host, port)`

Attempts to connect to the web server through the given proxy server. The method accepts the following arguments:

* `proxy_uri` - Full URI of the proxy server to use (e.g. `http://proxy.example.com:3128/`). Note: Only `http` protocol is supported.
* `scheme` - The protocol to use between the proxy server and the remote host (`http` or `https`). If `https` is specified as the scheme, `connect_proxy()` makes a `CONNECT` request to establish a TCP tunnel to the remote host through the proxy server.
* `host` - The hostname of the remote host to connect to.
* `port` - The port of the remote host to connect to.

If an error occurs during the connection attempt, this method returns `nil` with a string describing the error. If the connection was successfully established, the method returns `1`.

There's a few key points to keep in mind when using this api:

* If the scheme is `https`, you need to perform the TLS handshake with the remote server manually using the `ssl_handshake()` method before sending any requests through the proxy tunnel.
* If the scheme is `http`, you need to ensure that the requests you send through the connections conforms to [RFC 7230](https://tools.ietf.org/html/rfc7230) and especially [Section 5.3.2.](https://tools.ietf.org/html/rfc7230#section-5.3.2) which states that the request target must be in absolute form. In practice, this means that when you use `send_request()`, the `path` must be an absolute URI to the resource (e.g. `http://example.com/index.html` instead of just `/index.html`).

## set_timeout

`syntax: httpc:set_timeout(time)`
Expand Down Expand Up @@ -192,6 +212,18 @@ Note that calling this instead of `close` is "safe" in that it will conditionall

In case of success, returns `1`. In case of errors, returns `nil, err`. In the case where the conneciton is conditionally closed as described above, returns `2` and the error string `connection must be closed`.

## set_proxy_options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this function be added to the ToC?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good spot, please add to the ToC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.


`syntax: httpc:set_proxy_options(opts)`

Configure an http proxy to be used with this client instance. The `opts` is a table that accepts the following fields:

* `http_proxy` - an URI to a proxy server to be used with http requests
* `https_proxy` - an URI to a proxy server to be used with https requests
* `no_proxy` - a comma separated list of hosts that should not be proxied.

Note that proxy options are only applied when using the high-level `request_uri()` API.

## get_reused_times

`syntax: times, err = httpc:get_reused_times()`
Expand Down Expand Up @@ -232,7 +264,7 @@ When the request is successful, `res` will contain the following fields:
* `status` The status code.
* `reason` The status reason phrase.
* `headers` A table of headers. Multiple headers with the same field name will be presented as a table of values.
* `has_body` A boolean flag indicating if there is a body to be read.
* `has_body` A boolean flag indicating if there is a body to be read.
* `body_reader` An iterator function for reading the body in a streaming fashion.
* `read_body` A method to read the entire body into a string.
* `read_trailers` A method to merge any trailers underneath the headers, after reading the body.
Expand Down Expand Up @@ -408,7 +440,7 @@ local res, err = httpc:request{
}
```

If `sock` is specified,
If `sock` is specified,

# Author

Expand Down
150 changes: 148 additions & 2 deletions lib/resty/http.lua
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ local tbl_concat = table.concat
local tbl_insert = table.insert
local ngx_encode_args = ngx.encode_args
local ngx_re_match = ngx.re.match
local ngx_re_gmatch = ngx.re.gmatch
local ngx_re_sub = ngx.re.sub
local ngx_re_gsub = ngx.re.gsub
local ngx_re_find = ngx.re.find
local ngx_log = ngx.log
Expand Down Expand Up @@ -787,7 +789,6 @@ function _M.request_pipeline(self, requests)
return responses
end


function _M.request_uri(self, uri, params)
params = tbl_copy(params or {}) -- Take by value

Expand All @@ -800,11 +801,55 @@ function _M.request_uri(self, uri, params)
if not params.path then params.path = path end
if not params.query then params.query = query end

local c, err = self:connect(host, port)
-- See if we should use a proxy to make this request
local proxy_uri = self:get_proxy_uri(scheme, host)

-- Make the connection either through the proxy or directly
-- to the remote host
local c, err

if proxy_uri then
c, err = self:connect_proxy(proxy_uri, scheme, host, port)
else
c, err = self:connect(host, port)
end

if not c then
return nil, err
end

if proxy_uri then
if scheme == "http" then
-- When a proxy is used, the target URI must be in absolute-form
-- (RFC 7230, Section 5.3.2.). That is, it must be an absolute URI
-- to the remote resource with the scheme, host and an optional port
-- in place.
--
-- Since _format_request() constructs the request line by concatenating
-- params.path and params.query together, we need to modify the path
-- to also include the scheme, host and port so that the final form
-- in conformant to RFC 7230.
if port == 80 then
params.path = scheme .. "://" .. host .. path
else
params.path = scheme .. "://" .. host .. ":" .. port .. path
end
end

if scheme == "https" then
-- don't keep this connection alive as the next request could target
-- any host and re-using the proxy tunnel for that is not possible
self.keepalive = false
end

-- self:connect_uri() set the host and port to point to the proxy server. As
-- the connection to the proxy has been established, set the host and port
-- to point to the actual remote endpoint at the other end of the tunnel to
-- ensure the correct Host header added to the requests.
self.host = host
self.port = port
end

if scheme == "https" then
local verify = true
if params.ssl_verify == false then
Expand Down Expand Up @@ -914,5 +959,106 @@ function _M.proxy_response(self, response, chunksize)
until not chunk
end

function _M.set_proxy_options(self, opts)
self.proxy_opts = tbl_copy(opts) -- Take by value
end

function _M.get_proxy_uri(self, scheme, host)
if not self.proxy_opts then
return nil
end

-- Check if the no_proxy option matches this host. Implementation adapted
-- from lua-http library (https://github.com/daurnimator/lua-http)
if self.proxy_opts.no_proxy then
if self.proxy_opts.no_proxy == "*" then
-- all hosts are excluded
return nil
end

local no_proxy_set = {}
-- wget allows domains in no_proxy list to be prefixed by "."
-- e.g. no_proxy=.mit.edu
for host_suffix in ngx_re_gmatch(self.proxy_opts.no_proxy, "\\.?([^,]+)") do
no_proxy_set[host_suffix[1]] = true
end

-- From curl docs:
-- matched as either a domain which contains the hostname, or the
-- hostname itself. For example local.com would match local.com,
-- local.com:80, and www.local.com, but not www.notlocal.com.
--
-- Therefore, we keep stripping subdomains from the host, compare
-- them to the ones in the no_proxy list and continue until we find
-- a match or until there's only the TLD left
repeat
if no_proxy_set[host] then
return nil
end

-- Strip the next level from the domain and check if that one
-- is on the list
host = ngx_re_sub(host, "^[^.]+\\.", "")
until not ngx_re_find(host, "\\.")
end

if scheme == "http" and self.proxy_opts.http_proxy then
return self.proxy_opts.http_proxy
end

if scheme == "https" and self.proxy_opts.https_proxy then
return self.proxy_opts.https_proxy
end

return nil
end


function _M.connect_proxy(self, proxy_uri, scheme, host, port)
-- Parse the provided proxy URI
local parsed_proxy_uri, err = self:parse_uri(proxy_uri, false)
if not parsed_proxy_uri then
return nil, err
end

-- Check that the scheme is http (https is not supported for
-- connections between the client and the proxy)
local proxy_scheme = parsed_proxy_uri[1]
if proxy_scheme ~= "http" then
return nil, "protocol " .. proxy_scheme .. " not supported for proxy connections"
end

-- Make the connection to the given proxy
local proxy_host, proxy_port = parsed_proxy_uri[2], parsed_proxy_uri[3]
local c, err = self:connect(proxy_host, proxy_port)
Copy link

@mikz mikz Feb 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pintsized @sjakthol It should be possible to create keepalive pool for each unique destination proxy+host.

tcpsocket:connect(host,port,options_table?) supports pool in the options_table.

So setting that to string.format("%s:%s/%s:%s", proxy_host, proxy_port, host, port) should do the trick and not reuse connections between different proxy/host combinations.

If my thinking is right I'm happy to propose a patch :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that solve a real world problem?
I would've thought it would be up to the forwarding proxy to handle keepalives on upstream/downstream connections properly itself.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well yes. Any high performance system has to use keepalives or is going to run out of ephemeral ports very soon.
Doing several thousand requests per second would run through all the ephemeral ports in few seconds making the machine unusable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, keepalives are vital, but does having a unique pool for the proxied destination solve a real world problem? Surely the forwarding proxy is responsible for managing keepalives to the destination?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the proxy will be responsible for connections between proxy and upstream. But the issue is between openresty and the proxy.

openresty  <> proxy <> upstream

Imagine 100 same requests would be made serially to openresty. Openresty would open 100 connections to the proxy. And the proxy when using keepalive would open just one connection to the upstream.

Now imagine 3k requests per second. That burns through ~40k ephemeral ports in less than 15 seconds. Then openresty will no longer be able to open new connections until the previous ones are recycled.

Using keepalives is vital in high performance servers that connect to external tcp services.
Keepalives keep the number of opened connections to N of unique pairs of host:port times number of connections in parallel. Without keepalives new connection is needed for every request.

if not c then
return nil, err
end

if scheme == "https" then
-- Make a CONNECT request to create a tunnel to the destination through
-- the proxy. The request-target and the Host header must be in the
-- authority-form of RFC 7230 Section 5.3.3. See also RFC 7231 Section
-- 4.3.6 for more details about the CONNECT request
local destination = host .. ":" .. port
local res, err = self:request({
method = "CONNECT",
path = destination,
headers = {
["Host"] = destination
}
})

if not res then
return nil, err
end

if res.status < 200 or res.status > 299 then
return nil, "failed to establish a tunnel through a proxy: " .. res.status
end
end

return c, nil
end

return _M
29 changes: 28 additions & 1 deletion t/14-host-header.t
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ $ENV{TEST_COVERAGE} ||= 0;
our $HttpConfig = qq{
lua_package_path "$pwd/lib/?.lua;/usr/local/share/lua/5.1/?.lua;;";
error_log logs/error.log debug;
resolver 8.8.8.8;
resolver 8.8.8.8 ipv6=off;

init_by_lua_block {
if $ENV{TEST_COVERAGE} == 1 then
Expand Down Expand Up @@ -165,3 +165,30 @@ GET /a
[error]
--- response_body
Unable to generate a useful Host header for a unix domain socket. Please provide one.

=== TEST 6: Host header is correct when http_proxy is used
--- http_config
lua_package_path "$TEST_NGINX_PWD/lib/?.lua;;";
error_log logs/error.log debug;
resolver 8.8.8.8;
server {
listen *:8080;
}

--- config
location /lua {
content_by_lua '
local http = require "resty.http"
local httpc = http.new()
httpc:set_proxy_options({
http_proxy = "http://127.0.0.1:8080"
})
local res, err = httpc:request_uri("http://127.0.0.1:8081")
';
}
--- request
GET /lua
--- no_error_log
[error]
--- error_log
Host: 127.0.0.1:8081
Loading