-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create CDN compatible Websocket tunnels #390
Comments
Do said CDNs support domain fronting? Or is it not the same class of usage affected by this https://en.wikipedia.org/wiki/Domain_fronting#Disabling ?
Is this WebSocket or something else? |
This is not about domain fronting at all. We use our own domains. We just use the CDN to bypass the situation where only datacenters are connected to the global internet, while normal users only have access to a nation-wide intranet (which is also connected to the said CDN). I'm thinking of using the same mechanism as Websocket (like v2ray does). That would work behind the CDN. Of course, that's just an idea, but it's the only thing I can thing of to make this work behind a CDN. |
The main thing is, the CDN does not support CONNECT. We just need something that works with a GET, POST, or something like that. |
The often requested "CDN feature" here is about obfuscating the SNI, which is domain fronting. The issue with your described idea is that CDNs would not welcome or it would not appear as a typical use case or common traffic behavior to have a long standing connection tunnel, whatever protocol it uses. I think I saw some papers at net4people that mentioned long connections are being unconditionally interrupted in Iran. Naiveproxy is really designed with the assumption that long connections would work. So this is the main mismatch. The other issue is this C++ project costs much more to add features than a Go project, and the main feature of perfectly mimicking chrome net stack isn't really proven by evidence to be the most important thing once you are past the level of having a utls stack set up and verified. The more important and fruitful work right now is to have more sophisticated traffic shaping and this would happen much faster in Go than in C++. |
While they do abominable things with Internet traffic in Iran, other TLS based solutions have been working as well as one can expect under the circumstances, and our CDN hasn't been causing much of an issue so far. Obviously this is far from ideal, but we're trying to work with what we have. The reason I've been looking at naiveproxy has been mainly that it's not in common use in Iran and if I get it to work, I might be able to have it as a backup solution, because we already predict even harder days to come. Still, I understand what you're saying about this being more difficult to handle in the C++ codebase than in Go (even though I personally have much more experience in C++), and that you might not be interested in working on it. I might try and start working on a more sophisticated solution myself anyways, be it a naiveproxy fork, or something based on utls. Thanks for the help. I'm also having another issue with my current naiveproxy server, but I'm gonna have to open another ticket for that. |
If you're ready to put in the effort, I can give advices, review, and accept PRs. First, need to minimize code change to minimize long term maintenance cost, so try to find best places to modify existing behaviors to support new use cases. In your case, try to abuse the https:// proxy scheme as an wss:// scheme (chrome can proxy a wss:// request, but cannot proxy stuff over a wss:// tunnel, which is what you're looking for). You can look at http proxy client socket (for h1 wss) and spdy proxy client socket (for h2 wss) and abuse them into dealing with upgrade headers instead of CONNECT headers. You can use proxy delegates to smuggle control data as headers with the proxy client socket so no API changes are needed. |
Having read the net4people post you mentioned in the other issue, I'm now not even sure if this is going to be worth trouble if they are going as far as blocking Chrome's TLS fingerprint entirely. I need to see how this further develops, but if it turns out they are not actually going to permanently block use of Chrome, I'll definitely come back to this. Appreciate all the help. |
Fellow this |
Hardly. I've used C++ professionally for years of course, but I never call myself an expert. Anyways, I think I know enough C++ for this. What I don't know is the flow and structure of the code and this is a big codebase. For example, I don't even know how those functions you mentioned figure into this. Can you explain a bit more? |
https://source.chromium.org/chromium/chromium/src is immensely helpful in understanding large codebase. Try to click on functions to find back references. |
Thanks for the tips. So far I've managed to build naive with some logs put here and there to get a feel for things. I've read some of the stuff you sent and will read the rest later. Since I can't spend more than an hour or two per day on this, and that not everyday, I'm a bit slow, but hopefully I'll get there. |
Okay. I'm starting to slowly get the hang of this. I've actually got some nasty hacks that do work, at least with HTTP 1.1. I've got some issues with HTTP2 though and I need to debug. I was hoping I could get naiveproxy to dump a TLS key file to decrypt traffic in Wireshark. But looks like |
I'm almost (but not completely) certain at this point, that websocket over HTTP/2 (RFC 8441) is not supported by Cloudflare (and one other CDN I tested). One sign of that is Chrome itself chooses to use HTTP/1.1 for websockets when taking to a website behind CF, even though HTTP/2 is used for other content on the same website. My trouble is that with the hacky approach I wanted to use, naiveproxy chooses HTTP/2 to talk to the server (since it supports it), and then the CDN does not like what happens next and sends a 400 error back. One option might have been to disable HTTP/2 on the CDN side, but apparently CF does not allow you to do that on the free plan. I can think of two ways to get around this:
|
I think you can set alpn to http/1.1 only somewhere. |
Wouldn't that change the expected TLS signature of Chrome? |
But then again, that's probably what Chrome itself does when it wants to force HTTP/1.1... |
You can verify what chrome does by capturing the tls clienthello used for wss://. Chrome added support for ws over h2 recently but there is some config that makes this not so easily turned on. |
https://bugs.chromium.org/p/chromium/issues/detail?id=801564 reading this I think there is some server feature detection logic missing in naiveproxy. Should not require manual overriding of alpn. |
The main problem for me is that the CDN doesn't support it. In the browser, this is detected from h2 settings (I assume) and the browser switches back to 1.1. I assume that's how it works, because Chrome requires an existing H2 connection to the website, in order to use websocket-over-http2. In naiveproxy however, we just detect h2 support, which is not the same as websocket-over-h2 support. And since I am trying to make a normal h2 stream look like websocket, it fails with the CDN. Anyways, I'll take a look at the actual Chrome client hello, and if I see only HTTP/1.1 in ALPN, I might go on with that solution (if I can make it work, obviously!). |
Update: confirmed with wireshark. As expected, chrome sends a clienthello, with ALPN containing only http/1.1. |
Would you mind taking a look at this to see if it's an acceptable approach? It seems to be working with a quick backend I put together using python (which does not support padding protocol yet of course). |
Sure. Here goes. #!/usr/bin/env python3
import socket
import select
from hashlib import sha1
from base64 import b64encode
from http.server import HTTPServer, BaseHTTPRequestHandler, ThreadingHTTPServer
class MyHandler(BaseHTTPRequestHandler):
def do_GET(self):
if self.headers.get('upgrade', '').lower() != 'websocket':
self.return_camouflage()
return
if self.headers.get('connection', '').lower() != 'upgrade':
self.return_camouflage()
return
key = self.headers.get('sec-websocket-key')
if not key:
self.return_camouflage()
return
connect_host = self.headers.get('x-connect-host')
if not connect_host:
self.return_camouflage()
return
self.send_response(101)
self.send_header('Upgrade', 'websocket')
self.send_header('Connection', 'Upgrade')
accept = key + '258EAFA5-E914-47DA-95CA-C5AB0DC85B11'
accept = sha1(accept.encode('ascii')).digest()
accept = b64encode(accept).decode('ascii')
self.send_header('Sec-Websocket-Accept', accept)
self.end_headers()
sock = self.connect(connect_host)
if sock is None:
return
sock.setblocking(False)
self.request.setblocking(False)
while True:
ready, _, _ = select.select([self.request, sock], [], [])
if self.request in ready:
chunk = self.rfile.read()
if not chunk:
break
sock.sendall(chunk)
if sock in ready:
chunk = sock.recv(1400)
if not chunk:
break
self.wfile.write(chunk)
def return_camouflage(self):
page = b'<html><body>foobar</body></head>'
self.send_response(200)
self.send_header('Content-Type', 'text/html')
self.send_header('Content-Length', len(page))
self.end_headers()
self.wfile.write(page)
def connect(self, hostname):
port = 80
if ':' in hostname:
host, port = hostname.split(':')
port = int(port)
sock = None
for res in socket.getaddrinfo(host, port, socket.AF_UNSPEC,
socket.SOCK_STREAM):
af, socktype, proto, canonname, sa = res
try:
sock = socket.socket(af, socktype, proto)
except OSError:
continue
try:
sock.connect(sa)
except OSError:
sock.close()
continue
break
return sock
def main():
server = ThreadingHTTPServer(('localhost', 2000), MyHandler)
server.serve_forever()
if __name__ == '__main__':
main() |
You can push your pr |
I need to see if I can make a better backend (hopefully an updated forwardproxy or something like that, though I've got no experience with golang), make sure paddings work, and then send in a PR. |
Can this faux websocket transit through CDNs with handshakes but without websocket framing? In the current design the faux websocket requires client and server side to create the opposite faux websocket with framing only but without handshakes. This is kind of inconvenient for integration with other proxy systems. |
I've tested it with Cloudflare, and it does work. We could add websocket framing, but I'm not sure if that's gonna be worth the trouble, since it would make the client more complex, and we'd still need a custom backend, to make it work like a CONNECT tunnel. So this seems to be the minimum implementation that allows travel through a CDN, unless of course other CDNs actually parse whole websocket streams, but I think that's unlikely. |
How would this make integration with other proxies inconvenient? I'm not sure I follow that part? |
It's not an RFC conforming implementation of wss so it would be confusing to use the name. Client and server network libraries would expect a web socket with framing. You can check v2ray and see if they use this websocket handshake only tls socket or full websocket. Without a conforming implementation it could be problematic for interoperability. If this is really needed, it will have to use a separate name, wss-handshake:// or something. |
nice .waiting for new forward .thks the greate tools |
This I agree with. Calling it something other than "wss" makes more sense. A complete websocket implementation is imo too much of a hassle for this purpose. And the backend might be beyond me with my current golang knowledge (or lack thereof!). The header logic I might be able to handle in golang; I'm yet to take an actual look at it though. Life has thrown a few wrenches in my way, and time has become even more scarce atm! |
What are those logs from? Looks like whatever server you're using is actually trying to parse the tunnel contents as websocket, which is obviously not going to work. |
Ah, okay. Makes sense then. Anyways, this is because we are not implementing websocket framing. Can you think of a case where that actually matters or causes a problem? Because otherwise, as I said before, I think that's just needless complication. |
I haven't gotten to that part yet (I'm being real slow, I know!), but I imagine I need to do that using a custom header since |
I don't see how encrypting the password is going to help with anything. In this case, the encrypted password would be the password to use then, and there's be no practical difference. Anyone having that encrypted password (which is visible to the CDN) can use it to connect to the proxy. As to merging the python logic with the C++ code, I'm not sure I understand this. How can the server logic be moved to the client? |
Okay. I think I understand now what you mean by using http_proxy_socket.cc. I would have rather modified forwardproxy, if only for my own learning, but I encountered two issues. First I tried creating a middleware, but the middleware can only (easily) change the request, I need to also change the response, and that is either not possible when calling forwardproxy's ServeHTTP, or at least only possible by passing a custom ResponseWriter. I then tried modifying forwardproxy itself, but the response seems to be weirdly missing three bytes at the beginning (I can only force curl to look at it by passing Anyways, I might try your method when I get a little bit of time. As to CDN seeing the credentials, I don't think there's any easy way of preventing that. Simply encoding the credentials with another key is not enough, since the CDN can use the encoded value as easily. We'd need some sort of challenge-response protocol in-place to prevent that, which is definitely outside the scope of what I'm doing. |
If it is a CDN inside the country, what stops the censor from controlling the CDN? If the censor does not control the CDN, the credentials leaking to the CDN is not part of the threat model. |
There's really nothing stopping them from controlling the CDN. We're actually not sure what the deal is. It could be incompetence (which is likely), or that they don't care to close all the holes, or that the folks working at the CDN provider wanting to help others by letting this loophole work. FWIW, they've recently started sending "fair use" warnings for all my servers, so maybe the good days are numbered. Anyways, I don't see a reasonable way of hiding things from the CDN in a simple proxy like naive. They could get us if they really want to, and then we'd just have to find another way. The good ol' cat and mouse game. |
If you mean by a malicious CDN, no idea. If by other third-parties, the path should be enough, right? And we could or could not have the username/password, or we could just send that as the path. But frankly, I'm a little stuck with the backend. No idea why my modified forwardproxy doesn't work (probably something stupid I did, but still!). You pointed out before something about being able to use naiveproxy as forwardproxy, but that doesn't seem to work for me either, even without wss mode (using the haproxy config as a frontend). haproxy forwards stuff to naive and then nothing is sent until connection times out. |
Not sure I understand the context. Is this with the python backend? If so, do you see any errors on the backend side? Does it happen with any web page you visit? |
This is very strange indeed. There are some extra spaces in those logs, but I'm assuming those are related to something with copying the logs. Does the python server print any errors while this happens? I haven't seen anything like this happen to me. FWIW, I'm starting to doubt if I am going to actually finish this. My modified golang server (caddy/forwardproxy) does weird things and I can't quite understand why. Also I'm not even sure we came up with an acceptable solution for the credentials issue either. |
I'm giving my golang backend one last try. I finally managed to implement it in a caddy middleware, which is nice. But it doesn't work! Inspecting the traffic using wireshark, I see an extra set of headers are sent after the 101 response. This extra response has a 200 status, but contains all the websocket headers we add in the middleware (it's also chunked, and I have no idea who does that). Some logging proves that the code to add the websocket headers is only called once. Anyways, the extra response obviously breaks everything. If you could take a look at it and see if you can spot any obvious issues, I'd be very grateful. You can see the changes here: klzgrad/forwardproxy@naive...grimpenmire:forwardproxy:wss I build this using
|
The unknown opcodes are expected, and as far as I understand, they should be completely harmless. There is no entity in between that actually attempts to parse the websocket protocol. Wireshark obviously does that of course, and that's why I just look at the raw data there. As to occasional freezes, the only thing I can think of is that the Python server I put together in a few minutes is probably far from an ideal server. For example, it's multithreaded, which is something that's generally to be avoided in Python. It's also likely that there are some situations that are not handled as gracefully as they should. Anyways, I'm sure the current issue with the golang backend will not be fixed even if we add appropriate websocket framing. There is clearly an extra set of headers there which would cause trouble in any case. |
|
Looks like you need to locate who is creating this 200 reply to proceed. |
Must of CDN providers, have problem with connecting to upstream with HTTP/2, is there any solution for naiveproxy to be compatible with this? |
I have given up wss, because it is not perfect, and compared with connect, it is unstable and the flow is severe. It feels that the connect connection is already very satisfied, which also confirms the reason why the author @klzgrad has never added WSS because it is unnecessary at present. |
Yeah, turned out to be too much of a trouble (not that I've spent a lot of time on it in the past couple of weeks!). Thanks for all the help folks. |
What about grpc? some CDN providers like cloudflare have the ability for connecting to upstream (proxy server) on gRPC. User <---HTTP/2 ---> CDN <---- gRPC ----> Proxy Server < ---- > Free Internet |
Don't hold illusions that it can be realized, it is impossible in naiveproxy. |
So, is it possible to use any CDN??Not limited to websocket. |
The author believes that adding faux websocket support itself is not realistic, because faux websocket does not reuse the websocket framework. If the framework is reused, the naive project needs to be greatly modified, and in actual tests, the efficiency is far lower than http2. After my own test, I also agree with the author's idea. So don't even think about adding websocket to naiveproxy. |
The built-in websocket framework of chromium is itself a prototype and has not actually been deepened. After all, chromium did not expect that someone would use the chromium source code to modify it into httpproxy. Remarks: Chrome or chromium initiates websocket usually through javascript, so if you want to integrate the websockt high-efficiency framework, you have to integrate v8, which will greatly deform the naive code, and c++ is not as easy to implement as go, because The chromium source code structure itself has limitations. |
I don't mean to add websocket, but any request format to let the CDN to carry for us,just like get the index.html page |
Well, websocket is pretty much the only thing we can use to pass through CDNs, because CDNs don't support CONNECT tunnels that naive uses. |
As @grimpenmire pointed out, CDNs don't support CONNECT tunnels, which can only be carried over requests based on GET, POST, or HEAD. If you want to use a CDN to pass CONNECT requests back to the origin server, your idea won't work. I also mentioned earlier why Naive doesn't consider faux-websocket for ws communication, and in reality, the performance of faux-websocket is far worse than that of http/2. If you want to truly use ws, it is recommended to use other tools as naive may not be suitable for you. |
I've looked through existing issues, and I know the current view of the maintainers for using naiveproxy behind a CDN. However, I want to make a new argument for this.
For the past couple of months, I've been setting up and maintaining proxy servers for people in Iran (mainly v2ray based ones). The folks in Iran are in the rather unique and unfortunate position that they have their access to the global Internet shut down at critical times (like when there are mass protests, as there has been in the last two months).
Crucially, the data centers inside the country still have Internet access even when residential and mobile customers do not. So what we've been doing is setting up TLS based proxy servers and put them behind a CDN inside the country. This has been a saving grace for us, and that's how we've managed to keep people connected.
So I'm trying to see if this can be made to work with naiveproxy. I know naiveproxy uses CONNECT tunnels which are not supported by CDNs. So we need a workaround, like for example using an HTTP upgrade mechanism. I might want to try to get a stab at it myself if the maintainers are not interested in doing it, but I'd appreciate any pointers and ideas. I'm also interested to know if you'd still be against the idea given our use case.
The text was updated successfully, but these errors were encountered: