Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAT Busting for keynodes behind NAT layers #37

Closed
robert-cronin opened this issue Jun 1, 2020 · 12 comments · Fixed by #84
Closed

NAT Busting for keynodes behind NAT layers #37

robert-cronin opened this issue Jun 1, 2020 · 12 comments · Fixed by #84
Assignees
Labels
development Standard development
Milestone

Comments

@robert-cronin
Copy link
Contributor

We need to be able to connect two keynode pairs that might both be behing NAT layers and not exposed via public IPs.

The way I see it there are two options. We could either incorporate NAT traversal options directly into polykey (UDP hole punching, peer circuit relays, git sharing over ssh port forwarding?).

The other way is to assume all polykey nodes are discoverable on the same virtual network and provide mesh capability like ZeroTierOne does. We could even set up our own ZeroTierOne controller (instructions) and provide it as a public service, or incorporate it some how into the MatrixOS as a system service. I think the latter will end up being useful for other parts of MatrixOS.

@robert-cronin robert-cronin added the development Standard development label Jun 1, 2020
@robert-cronin robert-cronin self-assigned this Jun 1, 2020
@CMCDragonkai
Copy link
Member

May require intermediate relay Pk keynodes.

Also Twilio offers STUN relays.

@robert-cronin
Copy link
Contributor Author

May require intermediate relay Pk keynodes.

This was something I was thinking about, sort of like a hosting service for intermediate keynodes that we could put up on polykey.com or something like that. But I remember something about TWILIOs STUN/TURN being good enough for a first attempt.

@robert-cronin
Copy link
Contributor Author

After looking into STUN/TURN/ICE in more detail, there are a few concerns that make it less straightforward than originally thought. So STUN/TURN/ICE work exclusively with the WebRTC framework which only works in the browser environment. So first of all, it makes more sense that this would be an implementation of the PolyKey-Electron and PolyKey-Nativescript repo's.

Secondly, when considering implementation in electron, we need to consider whether we use the renderer process (which is chromium and will work out of the box) or use a custom headless process like puppeteer or electron-webrtc. The benefit to useing the renderer process is no extra overhead, but we do need to connect the webrtc data channel to the backend polykey agent.

Implementation in nativescript can be achieved via the nativescript-webrtc-plugin.

In any case, it is looking like we should make it byo RTCPeerConnection for this library. Possibly an extra parameter in the PeerManager that conforms to a webrtc interface. Basically we need to just be able to input a peers public key and get back a connection.

@robert-cronin
Copy link
Contributor Author

Looks like there is an issue with running electron-webrtc on nixos:
image
This looks like it cannot create a new electron process.

@robert-cronin
Copy link
Contributor Author

robert-cronin commented Aug 11, 2020

There is also another issue with letting the agent communicate with the front end electron-renderer process. Don't we want the polykey nodes (which are run by the agent as a daemon or by the electron-main process to be available at all times for peer queries/requests? If you close the PolyKey window, it will be minimized to the system tray and the electron-renderer process will stop working. Also, for users who want to use pk via the shell, webrt won't work at all.

Also I've just learnt that puppeteer won't run WebRTC in headless mode. So it's looking like WebRTC and STUN/TURN/ICE are infeasible for what we want to do.

Should look into other ways of nat traversal (BitTorrent, LibP2P, we could also just build a server ourselves similar to: https://github.com/codefresh-io/nat-traversal)

@robert-cronin
Copy link
Contributor Author

robert-cronin commented Aug 13, 2020

I am currently in the process of building a custom STUN/TURN functionality into js-polykey. I thought I should document my progress here.

So it looks like STUN/TURN is not strictly linked to WebRTC and there have been attempts to replicate the functionality in nodejs:
https://www.npmjs.com/package/stunsrv
https://github.com/summerwind/node-stun
https://github.com/enobufs/stun

I am currently trying to understand and adapt the last one to js-polykey but this is purely just for the STUN server. For the TURN server, I think something much more like the nat-traversal library would do and I have an up and running example on my LAN using this for exposing a random port and it should work with the client being behind a NAT, but I still need to test that out with QEMU and NixOS testing.

STUN also requires 2 different public ip addresses as per this stackoverflow post

There is another protocol called ICE (Interactive Connectivity Establishment) but as far as I can tell, this is just a way to coordinate the use of STUN/TURN to establish a connection which I think we can do just in the PeerManager by utilizing both of these classes (StunServer/TurnServer).

Another point to be understood is that TURN is much more resource intensive than STUN and should be left as a last resort. TURN is a relay that takes up an existing port on the public machine for each connection that needs to be made and every packet is routed via that public machine. It should only be used if STUN cannot find a public ip:port address for each of the private nodes or negotiation fails for some other reason.

I've heard some people suggest that STUN/TURN is specifically for UDP but I don't see any reason why we couldn't adapt it to TCP.

I was going to develop some Ascii diagrams for our STUN/TURN impl but I think there are already some awesome diagrams from this primer on WebRTC: https://www.html5rocks.com/en/tutorials/webrtc/infrastructure/#after-signaling-using-ice-to-cope-with-nats-and-firewalls

This was referenced Aug 18, 2020
@robert-cronin robert-cronin added this to the PolyKey MVP milestone Aug 18, 2020
@robert-cronin
Copy link
Contributor Author

As per #84, NAT traversal will be left as TCP hole punching and TURN functionality for now. STUN functionality has been separated into issue #85

@robert-cronin
Copy link
Contributor Author

Here is the general process for peer communication over a public TURN node:
Stage 1: peerA sets up a TURN relay connection to peerC

                     +-------------+
                     | TURN Server |
                     +-------------+
                            |
        Request Relay   +-------+
    +------------------>+ PeerC |
    |                   +-------+
    |                       |
    |                       |
    |                       |
    |                       |
+-------+                   |                       +-------+
| PeerA +<------------------+                       | PeerB |
+-------+   Route Packets                           +-------+
            To This Address

Stage 2: peerA opens a connection to the git server and relays all packets to and from peerC's TURN server

                Route Packets       +-------------+
      +---------------------------->+ TURN Server |
      |                             +-------------+
      |                                    |
      |                                +-------+
      |                                | PeerC |
      |                                +-------+
      |
      |
      |
      |
      |        +-------+                                           +-------+
      |  +-----+ PeerA +--------+                                  | PeerB |
      |  |     +-------+        |                                  +-------+
      |  |                      |
      V  v                      v
+-------------+          +------------+
| TURN Client |          | Git Server |
+-------------+          +------------+
      |                         |
      |                         |
      |                         |
      +-------<---------->------+
               Open Socket

Final stage: peerB wants to connect to peerA so it queries the known, public intermediary: peerC. PeerC responds with peerA's relayed address and peerB can communicate with peerA now via TURN server on peer B just as it normally could.

                                                       Normal Git
                Route Packets       +-------------+    Request
      +---------------------------->+ TURN Server +<-------------------+
      |                             +------+------+                    |
      |                                    |                           |
      |                                +---+---+                       |
      |                                | PeerC |                       |
      |                                +---+---+                       |
      |                                    ^                           |
      |                                    |                           |
      |                                    |                           |
      |                                    |     Request PeerA         V
      |        +-------+                   |     Relay Address     +-------+
      |  +-----+ PeerA +--------+          +---------------------->+ PeerB |
      |  |     +-------+        |                                  +-------+
      |  |                      |
      V  v                      v
+--------+----+          +------+-----+
| TURN Client |          | Git Server |
+-----+-------+          +------+-----+
      |                         |
      |                         |
      |                         |
      +-------<---------->------+
               Open Socket

@robert-cronin
Copy link
Contributor Author

This is not really conforming to the TURN RFC's or message protocol's, so let's call it a custom NAT traversal protocol for now

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 15, 2020

Once the PR #84 is merged, while it fixes this issue. And @robert-cronin please summarize here in what situations does this cover. Specifically what kind of NAT architecture does this bust. Because there multiple kinds of NAT architectures.

A new issue should be created to verify that the specification of TURN and STUN and compare it to our implementation and see if we need to cover other NAT architectures.

Main goals are:

  • Ensure that Polykey can work across all relevant NAT architectures (especially corporate networking environments, these NATs are often quite full-cone-symmetric sort of stuff)
  • Also if something in the spec sort of already implements something, conformance is still better than rolling our own, especially if we need pass this work to someone else
  • Even if we are not entirely meeting the spec, we can be sure what we are missing out on or not
  • We may be deviating from the spec in terms of using GRPC for some of the messages
  • Also review how uTP and GRPC can interoperate with STUN and TURN

@robert-cronin
Copy link
Contributor Author

I have created a new issue for supporting all NAT types so it doesn't get forgotten when this one is closed shortly. #100

@robert-cronin
Copy link
Contributor Author

verification of STUN/TURN issue: #101

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development
Development

Successfully merging a pull request may close this issue.

2 participants