-
-
Notifications
You must be signed in to change notification settings - Fork 626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add http proxy support to request_uri() #112
Conversation
Thanks for this, will try to review shortly (I have a bit of a backlog to deal with). |
local res, err = self:request({ | ||
method = "CONNECT", | ||
path = destination, | ||
headers = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Miss User-Agent
, because request method will add a default User-Agent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was not able to parse your comment. Is there an issue with how User-Agent header is set in self:request()
for CONNECT requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really sorry so late reply to you, I think this is a issue like you said.
Looks pretty good. I'd be worried about the HTTPS support though. Could you add tests using real TLS connection? Or I could contribute them if you want. You can easily embed SSL certs into the Test::Nginx test: https://github.com/3scale/apicast/blob/7df0f152b79eabddad15cfa6db0ff3f71d33bb18/t/003-apicast.t#L319-L358 |
lib/resty/http.lua
Outdated
@@ -776,11 +776,65 @@ function _M.request_uri(self, uri, params) | |||
if not params.path then params.path = path end | |||
if not params.query then params.query = query end | |||
|
|||
local c, err = self:connect(host, port) | |||
-- See if we should use a proxy to make this request | |||
local proxy_host, proxy_port; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trailing semicolon is pretty uncommon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad habit from elsewhere. Will remove
lib/resty/http.lua
Outdated
if not c then | ||
return nil, err | ||
end | ||
|
||
if proxy_uri and scheme == "https" then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about extracting handling of the proxy_uri into own function?
Like
local c, err
if proxy_uri then
c, err = self:connect_proxy(proxy_uri, host, port, params)
else
c, err = self:connect(host, port)
end
Not only that would split the code into logical segments, but also would allow use of that function as low-level API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good idea. Let's see if I can find some time to refactor this one.
lib/resty/http.lua
Outdated
if proxy_uri and scheme == "http" then | ||
-- http proxies expect to see the full URI in the request line | ||
if port == 80 then | ||
params.path = scheme .. "://" .. host .. path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the uri
parameter?
What I mean is is it necessary to create new string on every request? Wouldn't it be just fine to reuse the uri
parameter passed to this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference between uri
and this is that the uri
may contain a query string but this one does not.
Both the path and the query string was extracted from the URI earlier in the function and added to the params
table: https://github.com/pintsized/lua-resty-http/blob/master/lib/resty/http.lua#L801
Later on when the actual request is made, the code (https://github.com/pintsized/lua-resty-http/blob/master/lib/resty/http.lua#L275) appends params.query
to the value of params.path
to form the request line. If we were to set params.path = uri
, the query string extracted from the URI would be appended at the end of the uri causing all kinds of breakage.
So params.path
should be set to the uri without the query string. And that's what this concatenation here does.
I should probably add a comment here that explains why this is needed. It took me a few moments to figure out why did I do this here.
lib/resty/http.lua
Outdated
-- the proxy | ||
local destination = host .. ":" .. port | ||
local res, err = self:request({ | ||
method = "CONNECT", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to point to https://tools.ietf.org/html/rfc7231#section-4.3.6 and think about the Proxy-Authorization
header. Maybe extract that header from params
?
Thanks for the great comments. I will try to address them as soon as possible. |
On the topic of testing TLS: I tried to come up with a way to test a TLS connection over the proxy but didn't find a way to make it happen with the available tooling. In order to make a HTTPS connection, I would need to have a forward proxy available that knows how to create a transparent TCP tunnel between two hosts when it receives a That was the reason why https over a proxy was not tested. If you have an idea how to create a forward proxy that understands CONNECT requests and can create a TCP tunnel between the client here and an nginx serving https traffic, I will be more than happy to implement such a case. |
e36da86
to
0fb09c1
Compare
These changes introduce support for making HTTP requests through a forward proxy when using the high-level request_uri() API. The new set_proxy_options() function can be used to configure HTTP proxy settings. This method takes a table as a parameter that can include the following fields: * http_proxy - an URI to a proxy that should be used for http:// requests * https_proxy - an URI to a proxy that should be used for https:// requests * no_proxy - a comma separated list of hosts / IPs that should not go through the configured proxy When request_uri() is called with an http:// URI and http_proxy has been configured for the client instance, the connection will be established to the proxy host. Once the connection has been established, the HTTP request is sent to the proxy host as if the peer on the other end of the connection was the remote server. The only difference to the non-proxy case is that the path sent to the proxy is actually a full URI instead of a relative path. When request_uri() is called with an https:// URI and https_proxy has been configured for the client instance, the connection will be established to the proxy host. Once the connection has been established, we go ahead and perform a CONNECT request to open a TCP tunnel to the remote server. If the proxy gives a success response, to the CONNECT request, the client continues to perform the TLS handshake with the remote server and making the request as if there was no proxy configured. Some tests have been included which verify that the proxy options are interpreted correctly in different cases and that http_proxy works as real proxies expect. This is about as much there is to test without actually including a real forward proxy in the test harness. Manual testing has been performed against squid and tinyproxy. Fixes #63
0fb09c1
to
5c96e1f
Compare
Rebased on top of master & removed an unneeded semicolon from the code. Will try to get the larger changes done at some point in the coming weeks. |
…t_proxy() method This commit separates the logic that sets up the connection to the proxy server to a separate connect_proxy() method. This method is provided to users as a low-level API they can use similarly to the connect() method. The connect_proxy() will handle the connection establishment to the proxy server and performs the CONNECT request to setup a TCP tunnel to a https protected host. Similar to the connect() method, it is then up to the user to take care of the details that are relevant when using a proxy (i.e use absolute uris for http requests and perform a TLS handshake for https connections). There's also a new test case that verifies the CONNECT request is used properly to establish a tunnel to the remote server when TLS is used. Due to the limitations of the test framework, this case only considers the format of the outgoing CONNECT request and how the code handles errors sent by the proxy. Testing a full TLS tunnel is unfortunately not possible with the tools the test framework provides as it would require a real reverse proxy or a method of forwarding the TCP connection after the CONNECT request is received to a real web server that can talk TLS.
Finally managed to refactor the code a bit according to the suggested changes. There's now a low-level method connect_proxy() that is equivalent to the low-level connect() method for connections that use a proxy server. The commit message has more details about this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome 🥇 👍 🎺
Hey guys, any chance to get this one merged? It perfectly solves the problem that we have, and it would be awesome to use it ;) |
I also tested this successfully. Great work! Any hope to have this merged and published soon? |
Hi all, many thanks for the work and testing on this. I'm really sorry for the slow response (recently started a new job). Looks good at a glance but I just need to find some time to understand the usage properly. I'll look at it as soon as I can. Thanks for your patience! |
@sjakthol this is awesome! From what I can see this doesn't seem to cover the |
@@ -192,6 +211,18 @@ Note that calling this instead of `close` is "safe" in that it will conditionall | |||
|
|||
In case of success, returns `1`. In case of errors, returns `nil, err`. In the case where the conneciton is conditionally closed as described above, returns `2` and the error string `connection must be closed`. | |||
|
|||
## set_proxy_options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this function be added to the ToC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good spot, please add to the ToC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me, although I've left a few small comments. The tests don't pass for me but I suspect for a minor reason. Happy to merge with this stuff resolved.
@@ -192,6 +211,18 @@ Note that calling this instead of `close` is "safe" in that it will conditionall | |||
|
|||
In case of success, returns `1`. In case of errors, returns `nil, err`. In the case where the conneciton is conditionally closed as described above, returns `2` and the error string `connection must be closed`. | |||
|
|||
## set_proxy_options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good spot, please add to the ToC.
lib/resty/http.lua
Outdated
@@ -914,5 +957,99 @@ function _M.proxy_response(self, response, chunksize) | |||
until not chunk | |||
end | |||
|
|||
function _M.set_proxy_options(self, opts) | |||
self.proxy_opts = opts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you modify self.proxy_opts
in other code paths, please take opts
by value so that user tables can be reused without being modified by the module.
e.g.
self.proxy_opts = tbl_copy(opts) -- Take by value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
lib/resty/http.lua
Outdated
local no_proxy_set = {} | ||
-- wget allows domains in no_proxy list to be prefixed by "." | ||
-- e.g. no_proxy=.mit.edu | ||
for host_suffix in self.proxy_opts.no_proxy:gmatch("%.?([^,]+)") do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ngx_lua we're assured access to the ngx.re.*
regex functions which can be JIT compiled. At the risk of premature optimisation (I haven't tested and compared) I'm pretty sure gmatch
will be quite a bit slower here. Please consider changing these to use ngx.re.gmatch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrote the logic to use ngx.re functions instead of built-in gmatch. I had to make some changes to the logic here as apparently the built-in gmatch and ngx.re.gmatch use completely different regex syntax. Please check carefully that I did not make any mistakes in the process (at leasts tests are passing for me).
lib/resty/http.lua
Outdated
-- matched as either a domain which contains the hostname, or the | ||
-- hostname itself. For example local.com would match local.com, | ||
-- local.com:80, and www.local.com, but not www.notlocal.com. | ||
for pos in host:gmatch("%f[^%z%.]()") do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
# The reply cannot be successful or otherwise the client would start | ||
# to do a TLS handshake with the proxied host and that we cannot | ||
# do with these sockets | ||
--- tcp_reply |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test fails for me (on Arch Linux)
# got: 'CONNECT 127.0.0.1:443 HTTP/1.1
# Host: 127.0.0.1:443
# User-Agent: test_ua
#
# '
# expected: 'CONNECT 127.0.0.1:443 HTTP/1.1
# User-Agent: test_ua
# Host: 127.0.0.1:443
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh... It seems that the order of headers in the request is not guaranteed.
I changed the test to not care about the order of headers. Or at least I think I managed to do that. Please test if it works now.
Thanks for the review. I will try to fix those issues you spotted soonish. |
This includes: * Add set_proxy_options to README TOC. * Make a copy of proxy options gotten from the user. * Rewrite no_proxy matching to use ngx.re.gmatch and friends instead of Lua's built-in gmatch functions. Since the regular expression formats are completely different, the no_proxy matching process had to be modified a bit. * Fix 14-host-header.t on systems with ipv6 support by disabling ipv6 address resolution. * Make 16-http-proxy.t more reliable by ignoring the order of the headers in the CONNECT request (the order is not guaranteed). The new check uses a regular expression to check that the CONNECT line is correct and that the headers include correct Host in some position.
I believe the comments made here are now addressed. Except for the comment by @p0pr0ck5 about proxy_request() not being affected by proxy configuration: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 amazing work! Can't wait for this to get merged.
@pintsized I think this should be fine to merge too. It only affects the simple interface and if you haven't configured any proxy stuff there's just a couple of additional is nil conditional checks. |
ping @pintsized @hamishforbes, any movement here? we'd love to be able to leverage this :D |
Merged (finally!) and released via LuaRocks and OPM as v0.12. Thanks again for everyone's input, and patience. |
|
||
-- Make the connection to the given proxy | ||
local proxy_host, proxy_port = parsed_proxy_uri[2], parsed_proxy_uri[3] | ||
local c, err = self:connect(proxy_host, proxy_port) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pintsized @sjakthol It should be possible to create keepalive pool for each unique destination proxy+host.
tcpsocket:connect(host,port,options_table?)
supports pool
in the options_table
.
So setting that to string.format("%s:%s/%s:%s", proxy_host, proxy_port, host, port)
should do the trick and not reuse connections between different proxy/host combinations.
If my thinking is right I'm happy to propose a patch :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that solve a real world problem?
I would've thought it would be up to the forwarding proxy to handle keepalives on upstream/downstream connections properly itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well yes. Any high performance system has to use keepalives or is going to run out of ephemeral ports very soon.
Doing several thousand requests per second would run through all the ephemeral ports in few seconds making the machine unusable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, keepalives are vital, but does having a unique pool for the proxied destination solve a real world problem? Surely the forwarding proxy is responsible for managing keepalives to the destination?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the proxy will be responsible for connections between proxy and upstream. But the issue is between openresty and the proxy.
openresty <> proxy <> upstream
Imagine 100 same requests would be made serially to openresty. Openresty would open 100 connections to the proxy. And the proxy when using keepalive would open just one connection to the upstream.
Now imagine 3k requests per second. That burns through ~40k ephemeral ports in less than 15 seconds. Then openresty will no longer be able to open new connections until the previous ones are recycled.
Using keepalives is vital in high performance servers that connect to external tcp services.
Keepalives keep the number of opened connections to N of unique pairs of host:port times number of connections in parallel. Without keepalives new connection is needed for every request.
These changes introduce support for making HTTP requests through a
forward proxy when using the high-level request_uri() API.
The new set_proxy_options() function can be used to configure HTTP proxy
settings. This method takes a table as a parameter that can include the
following fields:
the configured proxy
When request_uri() is called with an http:// URI and http_proxy has been
configured for the client instance, the connection will be established to
the proxy host. Once the connection has been established, the HTTP request
is sent to the proxy host as if the peer on the other end of the connection
was the remote server. The only difference to the non-proxy case is that
the path sent to the proxy is actually a full URI instead of a relative
path.
When request_uri() is called with an https:// URI and https_proxy has been
configured for the client instance, the connection will be established to
the proxy host. Once the connection has been established, we go ahead and
perform a CONNECT request to open a TCP tunnel to the remote server. If
the proxy gives a success response, to the CONNECT request, the client
continues to perform the TLS handshake with the remote server and making
the request as if there was no proxy configured.
Some tests have been included which verify that the proxy options are
interpreted correctly in different cases and that http_proxy works as
real proxies expect. This is about as much there is to test without
actually including a real forward proxy in the test harness. Manual
testing has been performed against squid and tinyproxy.
Fixes #63