-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure DHT clients remain discoverable #779
Comments
@jacobheun Some questions to improve my understanding of this issue.
Why do clients rank low in the connection manager ?
Assuming Connection/RT decoupling gets implemented, even the server will mark the client as "missing" and still keep it in it's RT & therefore test it's connectivity to the client on refresh assuming the client refresh hasn't fired yet ?
How would the peer record be used by the server ? Will the peer record pushed by a client C to server S be used when another peers asks S if it knows about C ? If yes, why do we need this considering the fact that the client's re-connection to the server on a refresh adds it to the server's list of connected peers & therefore makes it discoverable ? |
Because as far as the server nodes are concerned they're leeches and not high priority connections. If the server can keep a connection to another server or to a client it should keep a connection to the server since it's more useful.
This is combined with server nodes not adding nodes identified as "dht clients" to their routing tables.
Yes
You are correct.
Given the existence of the second sentence the first is now extraneous, right? As part of our Kademlia search for ourselves we should be connecting to all of our closest peers anyway (to see if they know anyone closer). |
Thanks for the reply.
Would you be able to link me to the issue where we've decided against adding clients to routing tables ? |
Yes it's extraneous, I've removed it from the description. |
@jacobheun is this feature meant to help users with publicly dialable addresses who have decided not to participate in the DHT (e.g. Raspberry Pi behind a router with port forwarding), users without publicly dialable addresses (e.g. standard laptop behind a NAT that may not have MDNS enabled), or both? |
I believe it should be both. Any DHT client should still be discoverable, otherwise we risk losing that data. If it's impossible for the client to be dialed (no intermediary or relay), it probably should avoid using the DHT as a client and instead use a delegate node. |
Are you sure? The current proposal of automatically switching from client to server mode based on whether a node is dialable or not doesn't abide by this strategy. Also, it doesn't seem unreasonable to me that a node that is behind a NAT might want to query the DHT (i.e. be a client) even though it's not able to be a server node. Maybe I'm not sure what you mean by delegate nodes in this context. IIUC delegate nodes in a p2p network are for when resources (e.g. computing, file descriptors, etc.) are low or when certain platforms cannot fully support the protocol (e.g. they only support HTTP, but not arbitrary TCP or QUIC connections), not necessarily when there are connectivity issues.
Given the above, is having a publicly undialable node that is no longer discoverable via the DHT a huge loss? They can be discovered locally via MDNS for the nodes that can actually connect to them. |
This would mean we need to validate the addresses for a peer prior to adding them to the routing table, yes? What do you do when you don't understand an address that is publicly dialable by other nodes? If I can't dial them it doesn't mean nobody else can dial them. |
Discoverability of addresses isn't strictly related to routing tables or even being directly connected to the peer. As an example, everyone could put their addresses to Validating that addresses are public should not be necessary (might need to do some thinking here to confirm). We could have three states we're worrying about:
If we wanted we could enable the first two groups to advertise without allowing the third group to. We may not want to do this approach, but it's feasible.
While it's true that just because I can't dial them nobody else can it's also not clear that they should be advertising their address in the DHT if they're not accessible. It also seems reasonable to me to expect this behavior for provider records. |
@yusefnapora Please can you direct me to a PR where a peer pushes it's signed address on Identify ? Also, when/how are these records consumed by the receiver ?
@aschmahmann |
It seems likely that we'll merge the efficient query PR mostly as is and once the signed peer records PR is merged we'll go back and plug it in.
@aarshkshah1992 per #784 yes for |
Some design notes/concerns based on offline discussions with @yusefnapora and @aschmahmann:
|
Sir @Stebalien
|
As you mentioned Identify will pushed the changed address to all peers we are connected to. I think the idea is that if we have "high confidence" that we're already connected to our K closest peers then we may not need to do anything at all because Identify will have already pushed the changes. AFAIU this is a pretty small optimization since if we're already connected to the K closest peers then sending them a FindNode RPC (Kademlia paper terminology, using to disambiguate from the |
A while ago, @raulk and I were thinking that the DHT would have a "strict mode" flag which would cause it to ignore unsigned addresses and only accept addrs from signed peer records. "Lenient" peers would prefer peer records, but would also accept unsigned addrs. At the time there were two basic approaches we were thinking about:
Option 1 saves a multistream-select roundtrip, but it would need branching / special casing within the query logic of the DHT, which is already not the simplest code to follow. Option 2 could be done a few different ways; we could have one codebase that responds to both protocol IDs and changes behavior depending on which was negotiated, or we could have two completely separate DHT instances, and put a facade in front to unify them into one However, we recently started talking about moving the "strict mode" flag to the peerstore instead of having it be local to the DHT. That would allow the peerstore to accept signed peer records, but also accept unsigned addrs for the same peer afterwards. We haven't implemented that yet, but if we do it will have implications for the DHT, since if the peerstore is in "strict mode", the DHT effectively will be in strict mode also, and we probably want to communicate that to other peers so they know whether to send us unsigned addrs or not. So, to sort of answer your question, we're going to need some kind of "hybrid" / lenient mode during the transition to exchanging peer records, but we haven't fully worked out how to implement it. |
It occurs to me that "strict mode" has a slightly different meaning for the peerstore vs the strict mode we were considering for the DHT. Strict mode for the peerstore means that we will stop accepting unsigned addrs as long as we have any valid signed addrs for the peer, while lenient mode lets us always accept unsigned addrs. Strict mode for the DHT means that we only send and accept signed peer records, and never send or accept unsigned addrs. So the DHT is a bit "stricter" in the sense that a strict peerstore would still accept unsigned addrs up until it received a peer record for the peer, whereas a strict DHT would never accept unsigned addrs, even if we don't already know any addrs for the peer. |
Thinking through this a bit, I don't think we need to increase the refresh rate to ensure that DHT clients can be found. I believe we started down that path before we finished thought through dissociating routing tables from connection state. However, I think we need to do it anyways. Closest peers (self query): not being connected to one's closest nodes is a pretty big issue for network stability. If servers don't know their closest peers, they'll give incorrect information to clients trying to perform queries. Given network churn, I'd rather error on the side of doing this too often. An hour is a long time. If our peers are acting correctly, they should connect to us when they join. However, something might go wrong and waiting an hour is too long. Does this make sense? I haven't thought about this enough to be sure. The rest of the buckets: In general, we want to keep our buckets full. Now that we keep a queue of peers to refill our buckets, this is less of an issue. However, I'm concerned about not being able to recover from a routing table issue for an hour. Example:
At this point, we'll have some peers in each bucket so nothing will trigger an instant refresh. However, our routing table will be mostly empty. Note: Part of the solution here would be to not evict peers from the routing table until we've found a suitable replacement. However, a more frequent refresh rate would also help.
Because DHT clients usually won't be connected to their K closest nodes because they're useless to their K closest nodes (so these nodes will disconnect from them). When a DHT client's addresses change, it doesn't really need to do a self-walk. Really, it just needs to get the top K peers from its routing table, calling |
Yes.
If we force these nodes to actually join the DHT to be findable, they'll end up degrading the performance of the DHT as a whole. That is, we're better off if unreliable and hard-to-dial nodes are selfish. |
Because clients will anyways proactively connect to "missing" DHT servers in their Routing Table thus making them discoverable. Makes complete sense.
So when a peer joins the network, it connects to "bootstrap peers" -> immediately does a self walk thus discovering the closest peers -> a new closer peer will discover us as part of the same dance -> we keep repeating this dance hourly to account for network churn(a close peer we knew of might have died thus creating the possibility of discovering/adding another close peer/candidate) -> we are concerned the current 1 hour duration is too less compared to the intensity of our churn. Hmm.. we can lower the interval but need to be mindful that it comes with increased bandwidth usage.
We do have a RT size threshold to to trigger a refresh. Would increasing the threshold be a better solution than kicking off blanket refreshes more frequently ? We could even add a per bucket threshold to help buckets for higher CPLs stay full. Really, it's the buckets with higher Cpls we should be more concerned about as there will be far fewer peers we discover for them in the normal course of work than the ones with lower CPLs.
You mean K closest peers from the network and not from the RT, right ? |
The bandwidth usage should be pretty minimal given that we already have connections. We just need to send a message to our closest peers asking if they know of any closer peers. The main issue is that these peers will return signed address records... Napkin calculation says: strictly less than 1MiB per interval. If we do the refresh once every 10 minutes, that's less than 2KiB/s on average. Size of a multiaddr: ~8-21 bytes However, we can probably get away with just querying our closest alpha nodes. We don't actually need to query all 20. I wonder if we should just manually do that. If we get closer peers, then we can seed a query with the results from the closest alpha. Assuming an alpha of 3, that would yield: Size of a multiaddr: ~8-21 bytes ~ 256 bytes/s
Yes. However, we need to avoid repeatedly bootstrapping if we're in a small network.
👍
From the RT. That is, DHT servers will frequently kill connections to clients I'm saying that, when we go to do an identify push, we need to re-connect to our 20 closest nodes so they actually get that identify push. |
Design notes
Problem: Servers will prune client connections frequently because they presumably rank low in the connection manager.
Solution:
If we are a DHT client:edit: we should do this regardlessTesting mechanics
Success Criteria
The text was updated successfully, but these errors were encountered: