-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPFS loses swarm connection while pinning #5977
Comments
This looks like an issue we fixed recently: libp2p/go-libp2p-kad-dht#237 (comment) Would you be able to build IPFS from master and try reproducing? |
If you provide me the commands for the docker ipfs image, yes gladly :) |
@markg85 you can just fetch the master tag from Docker Hub: |
Ehh, oke. The master ipfs doesn't appear to be able to connect:
|
On which machine are you executing the |
I'm executing the command on the cloud (that id changed) to the local one (that remained as is). I'm trying to build go-ipfs locally now, just to see of that would work as both would be from master. |
Thanks. Just one note: I think your issue could be with the connection manager killing the session. You can try to increase the connection manager limits in the IPFS config. https://github.com/ipfs/go-ipfs/blob/master/docs/config.md |
No i won't. It currently is at the defaults and that already causes the cloud provider to thing that i got hacked due to thousands of connections in mere minutes like i'm attacking someone. I'm guessing that improved greatly with your p2p fixes and the recent bitswap fixes. At least, i hope it did :) |
Note that the connection manager and the swarm dialer limit are distinct. The connection rate (inflight dials) is governed by the swarm (what your cloud provider may be complaining about). That has improved with the DHT fixes. The connection manager is in charge of keeping open connections within bounds. |
I'm sorry, but i can't get this working anymore at all now.
Is there anything i can add in debug logging to help trace this thing? |
@markg85 was kind to pair with me on this. The issue is that despite having a static mapping in his router for IPFS on port 4001, current master was discovering a wrong public port (1024, weird). This led to his address in the DHT being incorrect, and dials failing due to his NAT dropping the incoming traffic. |
@raulk and i paired on IRC to debug this.
While i had port 4001 open and forwarded. It shows port 1025 in this case, which is wrong. |
@markg85 can you post the equivalent output from 0.4.18, please? Thanks again. |
And as i just tested, 0.4.18 has the same issue.
|
Just a friendly reminder. Both my local and remote machine now run 0.4.19! On my remote machine there is no 1024 port. Good! The local machine has a clean IPFS setup, data and config. Please take a look at this. It cases swarm connections to "sometimes" fail and "sometimes" work. |
How can i raise the attention of the right people for this issue? As i have a feeling the ones that need to know about this don't. Which causes new releases to be shipped with the very same bug still present. |
We are working on this but it's just not the only thing we're working on fixing. @raulk is the right person. |
I would suggest marking this a blocker for the next release. |
That's not going to get the problem fixed any faster, just delay other fixes. |
I understand, but do know that this bug prevents making a connection at all. That little side effect alone should make it a quite high priority. On the other side, i have it but others don't seem to be bothered by it at all. So it might just be occurring with some router vendors? Or some other special non-obvious thing. And with just using IPFS (aka, not running commands but just using it to browse the "IPFS internet") there seems to be nothing wrong. |
@markg85 I have the same issue with advertising wrong ports (ipfs id) |
@remmerw That might be something. Or perhaps something that makes investigating it more easy for the devs. In my case however, I've only ever had 1 node running behind the router. Never more. |
@markg85 @remmerw @raulk Seems like I got a pretty similar issue(Local desktop node failed swarm connect to remote cloud node)!
Some clues/findings:
|
@voidao that's likely unrelated to this issue. "Cloud" nodes don't have NAT issues. WRT this issue, the core problem is that IPFS doesn't know how you've configured your router. It has to guess as well as it can. It does this by:
Unfortunately, it doesn't look like either of those are working in this case. I'm going to close this in favor of libp2p/go-libp2p#559 as that's an actionable solution to this issue. |
@Stebalien Thank you for the detailed explanation! It makes sense to me, and I guess it's caused by the router or something else in the NAT environment. |
Hi,
I'm playing with IPFS and pinning. I might have discovered an oddity while pinning and swarm connections.
The setup is as follows.
1 IPFS server on a cloud hosting provider
1 IPFS locally
Both are the latest IPFS version (0.4.18).
Both run with --routing=dhtclient
The server is running with IPFS_PROFILE=server
Locally i added a large folder.
On the cloud i'm pinning that same folder.
On the cloud i'm grepping to see if i'm connected to my local machine:
docker exec ipfs_host ipfs swarm peers | grep CID
Locally in the WEBUI i'm monitoring for traffic to see when it's uploading.
This gives me quite notable gaps: https://i.imgur.com/1whgzx6.png
The server oftentimes quickly reconnects to the peer it is pinning from, but sometimes it takes a LONG while or just doesn't reconnect at all anymore (or so it seems). So long that i manually connect the peer to the swarm again on the server to resume uploading. Like you see in the before linked image. It had a lot of gaps and just ended doing nothing.
Both locally and on the cloud there were no internet connection issues that might have caused this. Also, it's very much repeatable. Just try the same setup yourself and you will probably see the same thing happening.
Also, most gaps happen to be spaced at around 90 seconds intervals. Might be a coincidence as i ended up manually reconnecting over and over again till everything was pinned.
Best regards,
Mark
The text was updated successfully, but these errors were encountered: