-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
identify: Report observer addresses of peers that succeeded dial attempts #203
Comments
I'm not sure I understand this issue correctly, but here is my understanding of the Identify operation. As per libp2p spec, In libp2p, Identify protocol implementation keeps a cache of remote peer addresses to provide them when dialing peers, and this is why this list is cleaned up of unreachable addresses. But the observed address is always reported back. In litep2p peer addresses are discovered entirely through Kademlia DHT routing table, without caching the remote peer listen addresses in the Identify protocol implementation. So, IMO we shouldn't modify the Identify protocol implementation in litep2p. If we need to check the reachability of external addresses after applying the "many peers have seen the same address" heuristic, it should be done using a different protocol, similar to AutoNAT. |
Also, the heuristic of not reporting back the failed addresses won't work in case of restricted cone NATs, as in this case the dial attempts of the peer previously dialed by another peer behind NAT will succeed, while no other peers will be able to reach the peer behind NAT using discovered address and port. AutoNAT tries to solve this issue by using different IP to probe the addresses. |
… error reporting (#206) The purpose of this PR is to pave the way for making the Identify protocol more robust, which is currently linked with the low number of peers and connective issues over a long period of time - paritytech/polkadot-sdk#4925 This PR adds a coherent `DialError` that exposes the minimal information users need to know about dial failures. - paritytech/polkadot-sdk#5239 A new litep2p event is added for reporting multiple dial errors that occur on different protocols back to the user: ```rust /// A list of multiple dial failures. ListDialFailures { /// List of errors. /// /// Depending on the transport, the address might be different for each error. errors: Vec<(Multiaddr, DialError)>, }, ``` This event eases the debugging of substrate connectivity issues. At the same time, it can be used in a future PR to inform back to the Identify protocol which self-reported addresses of some peers are unreachable: - #203 ### Next Steps - Add more tests - Warp sync + sync full nodes since this is touching individual transports ### Future Work - The overarching `litep2p::Error` needs a closer look and a refactoring: - #204 - #128 - ConnectionError event for individual transports can be simplified: - #205 - I've observed some inconsistencies in handling TCP vs WebSocket connection timeouts. I believe that we can have another pass and share even more code between them: - #70 --------- Signed-off-by: Alexandru Vasile <[email protected]> Co-authored-by: Dmitry Markin <[email protected]>
Correlate
DialFailure
andListDialFailures
attempts with the Identify response provided to peers.The addresses the node could not dial should be removed from the list of addresses we provide back to the peer.
This ensures the remote peer has a healthy view of its addresses and leads to better connectivity over time.
Libp2p uses a similar approach, caching individual peer addresses and removing the addresses the node failed to dial.
The text was updated successfully, but these errors were encountered: