Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NAT traversal utilities #84

Merged
merged 2 commits into from
Sep 17, 2020
Merged

Add NAT traversal utilities #84

merged 2 commits into from
Sep 17, 2020

Conversation

robert-cronin
Copy link
Contributor

@robert-cronin robert-cronin commented Aug 18, 2020

This PR is essential for making polykey more available to more users and increasing the p2p aspect of secret sharing. I feel this should be part of MVP too as it peer sharing is essential to the whole polykey idea.

The bulk of this PR is adding functionality for a public keynode that is run on a machine that is exposed to the public internet for relay purposes. This can be broken down into 3 main functions that this public keynode can provide to private keynodes behind NAT layers and other obscurities:

  1. STUN (custom version that tries to locate a public address (after NAT) for both of the private nodes to talk to each other)
  2. TCP hole punching
  3. TURN server for relaying packets between the two nodes.
    The above 3 points are also in order of which are tried first in order to establish connectivity between two private nodes.

Fixes #37
Fixes #86
Fixes #92

@robert-cronin robert-cronin added the development Standard development label Aug 18, 2020
@robert-cronin robert-cronin added this to the PolyKey MVP milestone Aug 18, 2020
@robert-cronin robert-cronin self-assigned this Aug 18, 2020
@robert-cronin
Copy link
Contributor Author

robert-cronin commented Aug 18, 2020

STUN actually looks quite complicated, so I suggest we leave it as an enhancement and just go with TCP hole punching and TURN functionality for now.

And I should say that STUN doesn't always find the type of NAT the private node is behind.

#85

@robert-cronin robert-cronin force-pushed the nat-traversal branch 9 times, most recently from c33bfac to 4edd2f3 Compare August 26, 2020 07:11
@robert-cronin robert-cronin force-pushed the nat-traversal branch 3 times, most recently from 9ded781 to 493210d Compare August 31, 2020 01:28
@robert-cronin
Copy link
Contributor Author

Forgoing the QEMU testing for now, I believe this PR is ready for review.

@CMCDragonkai
Copy link
Member

I remember you said your implementation deviates from the standard.

Can you describe how your implementation deviates from STUN and TURN standards?

What features are missing or differently implemented?

With reference to pieces on the code too.

@CMCDragonkai
Copy link
Member

There's a lot of files here.

Can you generate a call graph diagram of the new modules and isolate to the NAT functionality.

I think I mentioned things like:

Before.

But there's also xstate which you need to use to create protocol modelling. Can you also post the diagrams/visualizations in these in the PRs as comments.

When I create such massive PRs, I always provide visual documentation along with it. Useful for later refactoring.

@robert-cronin
Copy link
Contributor Author

There are some diagrams and explanation in the related issues, but I will copy them here for convenience and elaborate on the points of difference:

Here is the general process for peer communication over a public TURN node:
Stage 1: peerA sets up a TURN relay connection to peerC

                     +-------------+
                     | TURN Server |
                     +-------------+
                            |
        Request Relay   +-------+
    +------------------>+ PeerC |
    |                   +-------+
    |                       |
    |                       |
    |                       |
    |                       |
+-------+                   |                       +-------+
| PeerA +<------------------+                       | PeerB |
+-------+   Route Packets                           +-------+
            To This Address

Stage 2: peerA opens a connection to the git server and relays all packets to and from peerC's TURN server

                Route Packets       +-------------+
      +---------------------------->+ TURN Server |
      |                             +-------------+
      |                                    |
      |                                +-------+
      |                                | PeerC |
      |                                +-------+
      |
      |
      |
      |
      |        +-------+                                           +-------+
      |  +-----+ PeerA +--------+                                  | PeerB |
      |  |     +-------+        |                                  +-------+
      |  |                      |
      V  v                      v
+-------------+          +------------+
| TURN Client |          | Git Server |
+-------------+          +------------+
      |                         |
      |                         |
      |                         |
      +-------<---------->------+
               Open Socket

Final stage: peerB wants to connect to peerA so it queries the known, public intermediary: peerC. PeerC responds with peerA's relayed address and peerB can communicate with peerA now via TURN server on peer B just as it normally could.

                                                       Normal Git
                Route Packets       +-------------+    Request
      +---------------------------->+ TURN Server +<-------------------+
      |                             +------+------+                    |
      |                                    |                           |
      |                                +---+---+                       |
      |                                | PeerC |                       |
      |                                +---+---+                       |
      |                                    ^                           |
      |                                    |                           |
      |                                    |                           |
      |                                    |     Request PeerA         V
      |        +-------+                   |     Relay Address     +-------+
      |  +-----+ PeerA +--------+          +---------------------->+ PeerB |
      |  |     +-------+        |                                  +-------+
      |  |                      |
      V  v                      v
+--------+----+          +------+-----+
| TURN Client |          | Git Server |
+-----+-------+          +------+-----+
      |                         |
      |                         |
      |                         |
      +-------<---------->------+
               Open Socket

@CMCDragonkai
Copy link
Member

Ok, but please generate the call graphs as well. I need to map it to the files as well.

@CMCDragonkai
Copy link
Member

Ok after going through the codebase. I have 131 comments on various parts of it.

One part I didn't review which was the meat of this PR is the actual turn server and hole punching logic.

I want to go through that in detail with you together. The other comments above are all Quality of Life improvements and abstraction/modelling improvements and design improvements and standardisation and Quality Assurance stuff.

All stuff to be focused on to reduce the entropy of the codebase before moving ahead to add more features.

Let's schedule another time to go through it. Perhaps Monday or later.

First I need that UI case flow diagrams and then we go through it.

The very next PR I want to focus on:

  1. Quality of Life improvements and abstraction/modelling improvements and design improvements and standardisation and Quality Assurance stuff.
  2. Proper model based testing and property testing using xstate-test and fast-check.
  3. Spec sheet that can derive UML diagrams... and so on. Need to see how it all fits together.
  4. Nix based integration testing.

All in all, this is amazing amount of work.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 14, 2020 via email

@robert-cronin
Copy link
Contributor Author

Oh right yea, testing npm build?

On 14/09/2020 1:47 pm, Robbie Cronin wrote: @.**** commented on this pull request. ------------------------------------------------------------------------ In .gitlab-ci.yml <#84 (comment)>: > +build_npm: + image: registry.gitlab.com/matrixai/engineering/maintenance/gitlab-runner + stage: build + script: + - nix-shell --packages nodejs --run "npm install" this is not to test |npm install|, this is to test building via npm, this can pick up a lot of errors in the code actually, including import errors/type errors. But like you said, it probably will be picked up in |nix-build| anyway, so I guess we could remove it. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#84 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE4OHJ6OPRMVMFVTAX54XTSFWG6RANCNFSM4QCTOA4Q.

yeah thats right, but I agree that the errors will just be picked up in the nix build anyway, so npm build is kind of redundant.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 14, 2020 via email

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 14, 2020 via email

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 14, 2020 via email

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 14, 2020 via email

@robert-cronin
Copy link
Contributor Author

robert-cronin commented Sep 14, 2020

Yeah so perhaps the names can be PeerClient and PeerClientManager. Seems to make sense.

Yeah channels are definitely multiplexed over a single PeerConnection but you have to have a PeerConnection for each peer since they each have different addresses.

Usually we "multiplex channels" on a connection. So there's one connection from 1 place to another place. But there may be multiple channels. This is arbitrary but that's how I've seen it been used. So PeerConnection is fine for this actually. But we need to differentiate client objects as containers of RPC methods versus client interfaces which mostly contain types. If you have a client object wrapping another client object, then it should be differentiated as something as a lower level client vs a higher level client. Usually I have shorter names for higher level clients, and longer names for lower level clients if there's a nameclash.

On 9/14/20 4:39 PM, Robbie Cronin wrote: @.**** commented on this pull request. ------------------------------------------------------------------------ In src/lib/peers/peer-connection/PeerConnection.ts <#84 (comment)>: > + this.peerManager = peerManager; + + const pkiInfo = keyManager.PKIInfo; + if (pkiInfo.caCert && pkiInfo.cert && pkiInfo.key) { + this.credentials = grpc.credentials.createSsl(pkiInfo.caCert, pkiInfo.key, pkiInfo.cert); + } else { + this.credentials = grpc.credentials.createInsecure(); + } + } + + private async connectDirectly(): Promise { + // try to create a direct connection + if (this.getPeerInfo().connectedAddress) { + // direct connection attempt + const address = this.getPeerInfo().connectedAddress!; + const peerClient = new PeerClient(address.toString(), this.credentials); Just for clarity here, |PeerClient| is the client that is auto-generated by |protoc|, i.e. the actual grpc client. |PeerConnection| is the class abstracted over the top that contains the logic to select which of the 3 channels to use for connection: direct connection/relayed connection/UDP hole punched connection. Perhaps we could rename |PeerConnection| to something more appropriate or even add it into the |PeerManager| class but I think it is a good separation of concerns. The |PeerServer| is another abstraction over the grpc server stub. I will add a note to the domain modelling issue about renaming of these classes to something less confusing. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#84 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE4OHMR6ZYZCWGY3NRAD6DSFW3BPANCNFSM4QCTOA4Q.

@robert-cronin
Copy link
Contributor Author

Done!

Then call it promiseAny then!

On 9/14/20 4:56 PM, Robbie Cronin wrote: @.**** commented on this pull request. ------------------------------------------------------------------------ In src/lib/peers/peer-connection/PeerConnection.ts <#84 (comment)>: > + this.connected = true; + return peerClient; + } else if (!this.getPeerInfo().relayPublicKey) { + throw Error('peer does not have relay public key specified'); + } else { + throw Error('peer is already connected'); + } + } + + async connectFirstChannel() { + return await new Promise((resolve, reject) => { + const promiseList = [this.connectDirectly(), this.connectHolePunch(), this.connectRelay()]; + + const errorList: Error[] = []; + for (const promise of promiseList) { + promise I remember now, async/await doesn't work here. The point is you don't want to wait for any particular promise, you want them to race as in |Promise.race|. The only issue with that convenience function is that it rejects when any promise rejects as well, here we want a resolve race but not a reject race. The |bluebird| promise library has this functionality in the form of |Promise.any(..)| but I decided to just implement this ourselves to save a dependency. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#84 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE4OHJ35CSRQITYNYE3SXLSFW5CRANCNFSM4QCTOA4Q.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 14, 2020 via email

@robert-cronin
Copy link
Contributor Author

robert-cronin commented Sep 14, 2020

isn't sonar ping a kind of scan? to follow the analogy, you're sending out interrogation packets (e.g. sound waves) and listening to what comes back which tells you what is out there. In our broadcast situation, we're skipping the 'sending out' phase and just listening to what is out there on the multicast address.

I guess you could say its more akin to ADS-B since the broadcast is active and the listening is passive. So when we talk about our sonar ping, that function as far as the user is concerned is listening to the multicast address to see what/who is out there on the LAN, right?

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 14, 2020 via email

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 15, 2020 via email

@CMCDragonkai
Copy link
Member

Reviewed the entire NAT stuff. Lots of things to work on. Please summarize the notes and things to do next with a new issue linked to #84.

image

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 16, 2020 via email

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 16, 2020 via email

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 16, 2020 via email

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 16, 2020 via email

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 17, 2020 via email

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 17, 2020 via email

@robert-cronin
Copy link
Contributor Author

Okay I might go ahead and merge this now.

@robert-cronin robert-cronin merged commit 6af07f9 into master Sep 17, 2020
@robert-cronin robert-cronin deleted the nat-traversal branch September 17, 2020 03:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development
2 participants