Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To investigate: adding circuit relay information to multiaddr #1491

Closed
richard-ramos opened this issue Jan 14, 2023 · 26 comments
Closed

To investigate: adding circuit relay information to multiaddr #1491

richard-ramos opened this issue Jan 14, 2023 · 26 comments
Assignees

Comments

@richard-ramos
Copy link
Member

richard-ramos commented Jan 14, 2023

This might not be a problem in nwaku, but i'm reporting this just in case:

The status fleet has circuit relay enabled to facilitate hole punching,this means that nodes that nodes behind NAT or firewalls will have multiaddresses that follow this format: /ip4/7.7.7.7/tcp/55555/p2p/QmRelay/p2p-circuit/p2p/QmAlice, in which everything prior to the /p2p-circuit/ above is the address of the relay peer (a node from the fleet), which includes the transport address and their Peer ID QmRelay. After /p2p-circuit/ is the actual peer ID.

While using status-desktop / go-waku, I noticed that the information about the p2p-circuit is not part of the ENR, so nodes would advertise themselves in DiscV5 using information from the relay node instead of their own and not be discoverable then. Here's an example of the record data, using the above multiaddress as example

tcp | 55555   // Port of the circuit relay node
ip | 7.7.7.7 // IP of the circuit relay node
secp256k1 | QmRelay // The circuit relay secp256k1 public key, 33 bytes

In 31/WAKU2-ENR I read that we have a multiaddrs key which is used to store additional multiaddress information, so to fix this problem I'll add this information in that key (the p2p-circuit/p2p/QmAlice part).

cc: @cammellos @Menduist

@richard-ramos
Copy link
Member Author

richard-ramos commented Jan 14, 2023

Reading more about ENR, i see there's a limit of 300 bytes. Wouldn't it be better to have a new key called p2p-circuit to store the secp256k1 public key (33 bytes) instead of storing p2p-circuit/p2p/QmSomePublicKey in the multiaddrs field? it should save some bytes

@richard-ramos
Copy link
Member Author

richard-ramos commented Jan 14, 2023

Hm, after trying for a while, looks like storing only the p2p-circuit part complicates the code for multiaddrs field. If I understood correctly, it seems that p2p-circuit is a multiaddress separator. I saw code within go-libp2p that supports this theory. For example:

relayaddr, destaddr := ma.SplitFunc(m, func(c multiaddr.Component) bool {
   return c.Protocol().Code == ma.P_CIRCUIT
})

If we pass /ip4/7.7.7.7/tcp/55555/p2p/QmRelay/p2p-circuit/p2p/QmAlice to that function the result will be:

  • relayaddr: /ip4/7.7.7.7/tcp/55555/p2p/QmRelay
  • destaddr: /p2p-circuit/p2p/16Uiu2HAmET3N3P5t7DipHr8o2sY2jkFu61osqUkiubWjHDdxFwKC

Based on this, for experimenting I'll create a draft PR in go-waku that will modify the enr-record to have a p2p-circuit key, and using the example multiaddr above, the record would look like this:

tcp | 55555   // Port of the circuit relay node
ip | 7.7.7.7 // IP of the circuit relay node
secp256k1 | QmAlice // The node secp256k1 public key, 33 bytes
p2p-circuit | QmRelay// The circuit relay secp256k1 public key, 33 bytes  <-----

Let me know what you think

@jm-clius
Copy link
Contributor

Thanks, @richard-ramos. I think this makes sense as a possible extension to the ENR spec for p2p-circuit addresses (as the multiaddrs field will become to overloaded).

I'm still confused though how nodes were discovering each other without this? 😅 Afaik, hole punching is active and working, so how are these peers finding each other?

@richard-ramos
Copy link
Member Author

There's some magic going on there. I haven't figured it out yet but will update this issue once i find out what's going on!

@jm-clius
Copy link
Contributor

Right! Because an alternative here would be to do something like implement libp2p rendezvous discovery on multiaddrs as secondary discovery method without needing this encoding.

@Menduist
Copy link
Contributor

Isn't the multiaddrs field already used to send the full relayed address? I thought that was how it worked currently

300 bytes should be enough, you shouldn't advertise many addresses anyway. (2 or 3 max)

Long term it will make sense to switch to a libp2p discovery method either way, because the discv5 DHT is built with the assumption that everyone is reachable, if you have too many users only reachable through HP, that assumption starts to fail

@richard-ramos
Copy link
Member Author

Isn't the multiaddrs field already used to send the full relayed address? I thought that was how it worked currently

Currently it is only being used for ws and wss addresses.

@richard-ramos
Copy link
Member Author

I gotta mention that I find using multiaddrs field for circuit relay to be +- confusing. For example assuming you have a node that has websockets, secure websockets enabled and also has a circuit relay address, you might end up with a big variety of multiaddresses. For example, i just started desktop, and got the following list:

"/ip4/192.168.0.106/tcp/60000/p2p/16Uiu2HAmUVVrJo1KMw4QwUANYF7Ws4mfcRqf9xHaaGP87GbMuY2f",
"/ip4/127.0.0.1/tcp/60000/p2p/16Uiu2HAmUVVrJo1KMw4QwUANYF7Ws4mfcRqf9xHaaGP87GbMuY2f",
"/ip4/192.168.1.20/tcp/19710/p2p/16Uiu2HAmUVVrJo1KMw4QwUANYF7Ws4mfcRqf9xHaaGP87GbMuY2f",
"/dns4/node-02.gc-us-central1-a.status.prod.statusim.net/tcp/30303/p2p/16Uiu2HAmDQugwDHM3YeUp86iGjrUvbdw3JPRgikC7YoGBsT2ymMg/p2p-circuit/p2p/16Uiu2HAmUVVrJo1KMw4QwUANYF7Ws4mfcRqf9xHaaGP87GbMuY2f",
"/dns4/node-02.gc-us-central1-a.status.prod.statusim.net/tcp/443/wss/p2p/16Uiu2HAmDQugwDHM3YeUp86iGjrUvbdw3JPRgikC7YoGBsT2ymMg/p2p-circuit/p2p/16Uiu2HAmUVVrJo1KMw4QwUANYF7Ws4mfcRqf9xHaaGP87GbMuY2f"
"/dns4/node-01.gc-us-central1-a.wakuv2.test.statusim.net/tcp/8000/wss/p2p/16Uiu2HAmJb2e28qLXxT5kZxVUUoJt72EMzNGXB47Rxx5hw3q4YjS/p2p-circuit/p2p/16Uiu2HAmUVVrJo1KMw4QwUANYF7Ws4mfcRqf9xHaaGP87GbMuY2f"
"/dns4/node-01.gc-us-central1-a.wakuv2.test.statusim.net/tcp/30303/p2p/16Uiu2HAmJb2e28qLXxT5kZxVUUoJt72EMzNGXB47Rxx5hw3q4YjS/p2p-circuit/p2p/16Uiu2HAmUVVrJo1KMw4QwUANYF7Ws4mfcRqf9xHaaGP87GbMuY2f"

I'm not sure then what to put in the multiaddrs field

@Menduist
Copy link
Contributor

Right..
So the first 3 could be skipped since they are local address
But then indeed, you need to publish each public address of each relay, and you generally have at least 2 relays, so it can lead to quite an explosion

@richard-ramos
Copy link
Member Author

So the first 3 could be skipped since they are local address

Ah! this could be tied to this issue I created here: #1493 to make discv5 ignore private addresses.

it can lead to quite an explosion

Indeed!, It seems that I'll get a multiaddress per protocol for each node with circuit relay enabled I connect to?

@Menduist
Copy link
Contributor

Menduist commented Jan 16, 2023

Your node should only establish circuits with up to desiredRelays, which is 2 by default https://github.com/libp2p/go-libp2p/blob/82315917f76ca1e9e1c3aca3ad957080f0ac870c/p2p/host/autorelay/options.go#L41

/dns4/node-02.gc-us-central1-a.status.prod.statusim.net/tcp/30303/p2p/16Uiu2HAmDQugwDHM3YeUp86iGjrUvbdw3JPRgikC7YoGBsT2ymMg/p2p-circuit/p2p/16Uiu2HAmUVVrJo1KMw4QwUANYF7Ws4mfcRqf9xHaaGP87GbMuY2f
is 140 bytes long once encoded, so we can't put more than 2 in the ENR as-is (even 2 seems hard)

If you avoid sending your own key, and let the dialer set it, it becomes
/dns4/node-02.gc-us-central1-a.status.prod.statusim.net/tcp/30303/p2p/16Uiu2HAmDQugwDHM3YeUp86iGjrUvbdw3JPRgikC7YoGBsT2ymMg/p2p-circuit/
98 bytes, so you can comfortably fit 2, so we could do "one WSS on relay°1, one TCP on relay°2"

The next optimization would be to resolve the dns and send the ip instead,
/ip4/34.132.213.233/tcp/30303/p2p/16Uiu2HAmDQugwDHM3YeUp86iGjrUvbdw3JPRgikC7YoGBsT2ymMg/p2p-circuit
Is 52 bytes, so we could tightly fit the 4 addresses like this.
EDIT: mhh, that wouldn't work for WSS addresses since we need the hostname..

The last optimization would be to send the relay key once per relay, so we would have to send something like (or doesn't exist, this is just an example):
relay1: /ip4/34.132.213.233/tcp/30303/or/tcp/443/wss/p2p/16Uiu2HAmDQugwDHM3YeUp86iGjrUvbdw3JPRgikC7YoGBsT2ymMg
relay2: /ip4/34.132.213.233/tcp/30303/or/tcp/443/wss/p2p/16Uiu2HAmJb2e28qLXxT5kZxVUUoJt72EMzNGXB47Rxx5hw3q4YjS
This could handle up to 4 relays with multiple transport per relay.

Note that realistically, by the EOY, we'll have 4 transports enabled, so we have to be ready for that

@fryorcraken
Copy link
Collaborator

Regarding the usage of the mutliaddrs field, please check the spec:

  • The multiaddresses SHOULD NOT contain a peer id
  • For raw TCP & UDP connections details, EIP-778 pre-defined keys SHOULD be used;
  • To save space, multiaddrs key SHOULD only be used for connection details that cannot be represented using the EIP-778 pre-defined keys.
  • The 300 bytes size limit as defined by EIP-778 still applies; In practice, it is possible to encode 3 multiaddresses in ENR, more or less could be encoded depending on the size of each multiaddress.

Hopefully this answers most questions above.

Do note in the case of secure websocket, the fqdn and the port needs to be provided together at a given port is secured by a given SSL cert which in turns is valid for a given fqdn.

  • It would not make sense to include several WSS in a record: one node should be secure by one SSL cert reachable at one address
  • It would not make sense to include a plain WS address: these are useless

Finally, in case of p2p-circuit, I agree with @Menduist's proposal above with some addition:

  • I assume that you would not have an ENR containing WSS + p2p-circuit: if you have a wss adddress it means you have a port reachable from the internet, so you would not need to have a p2p-circuit value... right?
  • totally agree in trying to deduce as much info from the other ENR fields to not dupe info (e.g. not including your peer id as it can be deduced from secp256k1 field.

@richard-ramos
Copy link
Member Author

@fryorcraken: I assume that you would not have an ENR containing WSS + p2p-circuit: if you have a wss adddress it means you have a port reachable from the internet, so you would not need to have a p2p-circuit value... right?

In desktop i'm not enabling websockets, yet when I connected to a relay node, i got 2 addresses from it:

"/dns4/node-02.gc-us-central1-a.status.prod.statusim.net/tcp/30303/p2p/16Uiu2HAmDQugwDHM3YeUp86iGjrUvbdw3JPRgikC7YoGBsT2ymMg/p2p-circuit/p2p/16Uiu2HAmUVVrJo1KMw4QwUANYF7Ws4mfcRqf9xHaaGP87GbMuY2f",
"/dns4/node-02.gc-us-central1-a.status.prod.statusim.net/tcp/443/wss/p2p/16Uiu2HAmDQugwDHM3YeUp86iGjrUvbdw3JPRgikC7YoGBsT2ymMg/p2p-circuit/p2p/16Uiu2HAmUVVrJo1KMw4QwUANYF7Ws4mfcRqf9xHaaGP87GbMuY2f"

I guess a relay node will give you a p2p-circuit address for all protocols it supports?

@Menduist
Copy link
Contributor

Yes

@richard-ramos
Copy link
Member Author

richard-ramos commented Jan 27, 2023

@fryorcraken , @jm-clius , @Menduist

In waku-org/go-waku#431 i tried to create an ENR record with the following information.

  • IP: 192.168.1.241
  • TCP port: 60000
  • UDP port: 5000
  • Waku2: 0x0F
  • Multiaddrs:
    • /dns4/node-01.gc-us-central1-a.status.prod.statusim.net/tcp/30303/p2p/16Uiu2HAmDQugwDHM3YeUp86iGjrUvbdw3JPRgikC7YoGBsT2ymMg/p2p-circuit
    • /dns4/node-02.gc-us-central1-a.status.prod.statusim.net/tcp/30303/p2p/16Uiu2HAmDQugwDHM3YeUp86iGjrUvbdw3JPRgikC7YoGBsT2ymMg/p2p-circuit

As soon as I add a second it immediately complains about the record being longer than 300 bytes. Interestingly enough, those two circuit relay multiaddresses only represent 100 bytes in the ENR record, so I guess the rest of the bytes is spent into the other fields and the record signature.

For reference, if I use a single circuit relay multiaddress, then it's okay. Here's an example of ENR record with a single circuit relay address:

enr:-QEDuECWyBL4vrhd_Br5hsaCnOrgbgsc8XafODRl7kfJkemZACEW3-7ZkiLwiunhgq1Ip_qp42Q8e0tWdpjIJzPLD2znAYJpZIJ2NIJpcITAqAHxim11bHRpYWRkcnO4ZABiNjFub2RlLTAyLmdjLXVzLWNlbnRyYWwxLWEuc3RhdHVzLnByb2Quc3RhdHVzaW0ubmV0BnZfpQMnACUIAhIhAwtASPpiz5Gq3zuFuWF4MTviQVipi3HaQGytF-4wfbdvogKJc2VjcDI1NmsxoQMrcfD7f9fRK2mCeg0kM9DxbyNlhm5MkuqcSlbPFLBs2YN0Y3CC6mCDdWRwgsNQhXdha3UyDw

It can be decoded in https://enr-viewer.com/

@fryorcraken
Copy link
Collaborator

One of the downside of using ENR was definitely the size limit. We did consider Libp2p signed envelopes too but ENR was preferred as it is more field tested (I feel this should have been documented in the RFC).

Why do your two multiaddrs have the same peer id?

@richard-ramos
Copy link
Member Author

Why do your two multiaddrs have the same peer id?

I copy pasted one of the circuit-relay multiaddresses since I wanted to see if I could store at least 2 in the multiaddr (this because when we use circuit relay with the status.prod fleet nodes, the node receives 4 multiaddresses (2 for each circuit relay node). I guess for the time being I'll have to limit the number of multiaddresses stored in multiaddrs field to 1.

I wonder it makes sense to fork discv5 to increase this size limit to be able to store at least 2 circuit-relay addresses? I do not know what implications could that have

@fryorcraken
Copy link
Collaborator

I wonder it makes sense to fork discv5 to increase this size limit to be able to store at least 2 circuit-relay addresses? I do not know what implications could that have

I think there are few options on the table. Not sure of the implications either:

  1. Increase ENR size limit
  2. Register several ENR for one given node: Which means that split of info is needed between several ENR + consolidation is needed when getting ENR. Would also mean that one node can provide several ENR during discv5 process (is that possible?)
  3. Usage of signed envelope from libp2p instead of ENR.

@jm-clius @kaiserd thoughts? I know that @kaiserd really busy.

@jm-clius
Copy link
Contributor

Usage of signed envelope from libp2p instead of ENR.

I think we should consider adding a discovery mechanism that is more suitable for discovering signed peer records, specifically libp2p rendezvous discovery. IIRC we determined previously that it won't be too much of an effort on either the go-waku or nwaku side to add.

@fryorcraken
Copy link
Collaborator

@jm-clius What are the next step? Should we oipen an issue in vac research or rfc?

@jm-clius
Copy link
Contributor

jm-clius commented Feb 1, 2023

@fryorcraken captured it here: vacp2p/research#176

Basically, if we're roughly agreed on that issue that libp2p rendezvous is a good discovery method to add and will solve the ENR size issue, I think we can go ahead and assign ownership.

I've included a somewhat related suggestion of specifying a pubsubTopic/shard negotiation mechanism as well, which would work independently from discovery mechanisms, but allow us to be more selective in terms of nodes we connect to (with an eye on multiple shard scaling)

@richard-ramos
Copy link
Member Author

In the meantime I have updated the logic for the multiaddrs key in go-waku in this PR: waku-org/go-waku#431 to do the following:

  1. If the node is using WSS, it will only store the wss multiaddress, since it requires a domain name, and that would mean that the node is accessible.
  2. Otherwise, it will attempt to store as many circuit relay addresses as possible without exceeding the 300 bytes limit.
  3. Circuit relay addresses are stored following this format: /dns4/relay_domain_name/tcp/1234/p2p/relayPeerID/p2p-circuit (if using a IP, then it will start by /ip4/relay_ip/, obviously)

@richard-ramos
Copy link
Member Author

Update: I had to limit the number of multiaddresses stored in the ENR to not exceed a hard coded limit, due to crashes being reported in:

@kaiserd
Copy link
Contributor

kaiserd commented Mar 2, 2023

My suggestion for an ad-hoc solution for the Status MVP would be

  • using discv5 only on nodes that are discoverable without NAT whole punching
  • use rendezvous for the rest

Including rendezvous, we have to take sharding into consideration 51/WAKU2-RELAY-SHARDING.
Nodes could simply register their shard participation via REGISTER{my-app, {QmA, AddrA}} where my-app is the respective shard index.

As an alternative, we could increase the ENR size as an ad-hoc solution for the Status MVP.
We could do that because we use a dedicated Waku discovery network (independent of Eth discv5).
Still, we should keep the ENR size as small as possible for efficiency reasons,
and only include absolutely necessary info (e.g. what @Menduist suggested for p2p-circuit).

(I will think about this, have discussions, and open an issue to include the chosen solution in the respective RFC, most likely in: vacp2p/research#167 )

Future work on (hierarchical) capability discovery should solve this issue.
Shard cluster and shard participation would be modeled as a capability and not be part of the ENR.
We could move towards the topic discovery successor of discv5 (once it is specified), or take ideas from that,
and as @Menduist suggested, go towards a libp2p DHT based discovery solution.

@kaiserd
Copy link
Contributor

kaiserd commented Mar 2, 2023

One more idea (ad-hoc for the MVP): add a rendezvous support indicator field to the ENR.
This would connect the two discovery methods, leading to a two-stage approach that allows discovering (almost) every node.
The indicator field would increase the ENR size by a bit, but make p2p-circuit multiaddrs in the ENR obsolete.

@vpavlin
Copy link
Member

vpavlin commented Jul 15, 2023

Closing since Rendezvous is enabled by default when relay is enabled now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants