This repository has been archived by the owner on Feb 12, 2024. It is now read-only.
Cannot retrieve content via the DHT #1984
Labels
exp/wizard
Extensive knowledge (implications, ramifications) required
kind/bug
A bug in existing code (including security flaws)
kind/resolved-in-helia
P0
Critical: Tackled by core team ASAP
status/in-progress
In progress
This is a tracking issue for not being able to get content via the DHT. The backstory is as follows:
IPFS listens to libp2p peer discovery events and opens a connection to a peer whenever this event gets fired (we call this "dialing"). Up until now JS IPFS has only ever discovered ~10 bootstrap nodes, maybe a few more peers on the local network via MDNS and a few dozen or so more peers via websocket/webrtc rendezvous servers (if they were configured).
Dialing a peer on discovery preemptively opens a connection so that when we need to talk to a peer for a specific reason (dialing with a protocol) the backing connection is already there.
When the DHT implementation landed, this caused many many more peer discovery events. By simply participating in the DHT you learn about peers who are "closest" to you and we also do a "random walk" which allows us to discover even more peers.
To give you an idea, we'd gone from discovering less than 100 peers in total to discovering thousands of peers every minute.
Dialing to ALL of those peers concurrently was a big problem. Our CPU usage was going through the roof and not coming back down. We couldn't fetch content and our node would explode after just a few minutes. Not only that, but due to the async nature of the connection process we were dialing the same peers multiple times.
Libp2p has the concept of a connection manager, which should have been dropping connections above a certain threshold but our connection tracking was broken so the connection manager thought it had 80 peers when it reality it had around 3,000.
Fixing this alone wasn't going to fix the main problem so we implemented the concept of a dialing queue, which allowed us to consolidate multiple dials to the same peer and gain some control over the concurrency.
This actually went through a few iterations where we added blacklisting, so that we don't continue to attempt to dial to a peer that we can't dial to.
This was great for freeing up the queue but it came with it's own problems - connection managers on other nodes were closing connections and we were then blacklisting those nodes because the connection was closed with an error. We had to implement a fix to not blacklist connections after initial connection, which sounds simple in hindsight but was not obvious at the time!
By introducing the queuing system and alleviating some of that CPU pressure, we had also created a different problem for ourselves: purposeful dials to peers we already had connections to were being queued behind thousands of dials to new peers that were potentially very slow, or completely undialable. So even if we had a connected peer, communicating with them could take a long time to reach the front of the queue.
Communicating with an existing peer was so slow that we actually ended up losing ALL of our peers over time as the connection managers on other nodes dropped the unused connections (which we then blacklisted 🙄).
FYI, having no peers in a p2p network is a really bad place to be, our node was like a brick, spending all it's time trying to connect to peers that were unavailable or slow.
We eventually came to the conclusion that we needed to somehow distinguish between dials to existing peers and dials to new peers. Dials without a protocol appeared to be typically for establishing a connection whereas dials with a protocol imply some need for specific communication. We called the former "cold calls" and this became a separate de-prioritized queue with a limit to how many calls can be in it. Dials when the queue is full are aborted immediately.
Things got much better, we had a good peer count and reasonable CPU usage but we still had one big problem, fetching content via the DHT was still not working.
We found out via some parallel work on MDNS interop that we had a problem with dialing ourselves. Discovered node addresses can include 127.0.0.1 and local network addresses like 10.x.x.x and 192.168.x.x. Now, these nodes may legitimately also be running on the same computer or local network as your node but may also be running on an entirely different local network and we had an issue where addresses that were actually the same as our node's address weren't being filtered out of the list of addresses to dial for a node.
So we'd dial ourself and the initial connection would be successful but then it would be dropped because our node would realise it was not connected to the node it was hoping to connect to (different peer ID). The problem with this is that we won't blacklist the connection because it was established successfully, but we wouldn't try a different address for the same node because it was established successfully. It means we can never get content from that node and if we ever try again the same thing will happen.
That's not all, we eventually realised that the way bitswap works is as follows: ask the DHT who has the content, make a cold call to that node. Wait for the peer connected event from libp2p, and then dial with the bitswap protocol so that "wantlists" can be exchanged and bits swapped.
Remember I said earlier about the cold call queue having a limit? Well, this is where our assumptions on cold calls break. Even if bitswap finds a peer that has the content in the DHT there's no guarantee that we'll connect to the node. If the queue is full the dial is aborted immediately. Bitswap doesn't care. I mean, it logs the dial error but if that node never gets connected, bitswap can never resolve that item on it's wantlist, unless it gets lucky and some other node that has the content becomes a peer.
Since then, we've been working on adding connection priorities so that we can better distinguish between dials with a purpose (cold call or not) and dials that are are made just because we've discovered a peer. This should allow cold calls from bitswap to be prioritised and hopefully 🤞fix our issue with retrieving content via the DHT.
We've also been working on bringing automatic dials on peer discovery into libp2p so that libp2p can make an informed decision about whether to even try to connect to a peer based on low watermarks for connections and the current state of the queue. This should further improve our resource usage.
It's been a wild ride, but it's worth noting that we've actually done very few changes to the DHT implementation that was merged, this is mostly all changes to
libp2p-switch
to allow it to deal with this many connections - it simply hasn't had to deal with this amount of connections before.I'll post updates here when I can.
The text was updated successfully, but these errors were encountered: