-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: mplex error upon reconnection #1485
Comments
Note that this happened because my laptop (running js-waku in the browser) went to sleep. Removing the |
Thanks for reporting!
With this kibana link you should be able get the logs from
Perhaps your |
Summarizing here Tanguy comments: Looking for "Too many connections for peer" in the logs helps to know when this is happening on Its expected, since the maximum number of connections per peer is 1
Note sure how easy would be to reproduce this scenario, but lets say that I'm connected to like 50 nodes and suddenly my internet is gone. Since no disconnection was made, that connection is still being kept as my connection by all 50 peers. So if I go online later in 5 minutes, I wont be able to connect to any of these peers? |
Noted. Will provide
I would not expect a js-waku peer to connection to 50 peers but yes, I agree with you. @danisharora099 is starting to look into peer management on js-waku side. We will have to investigate browser/libp2p behaviour around disconnection/signal loss. On nwaku side, does it make sense to timeout a connection after 10min? Also, what would be the impact if the timeout is reduced? Finally, how does the timeout works? ie, if data is sent over the connection, does it delays the timeout closure? However clever is the connection management on this aspect? |
Update: I can confirm that this does not happen by manually going offline but there's a weird inconsistent bug that is seen: the browser that goes offline can send messages after reconnection but can't send messages or the receiving is random. Steps I followed:
It was now seen that on spamming messages from w2, w1 seemed to receive one message of 5-6 messages sent while w2 seems to receive almost all messages sent. I will post a consistently reproducible set of steps once I find it Update: |
@danisharora099 are you still looking into this? |
(thanks @Ivansete-status for the ping) not particularly this, but we do have some share of issues that track similar work: waku-org/js-waku#2154 (check liveness of a node) also note that we have done some work on js-waku side (like keep-alive management, filter pings, etc) that also go intro improving reliability of this. as for the current status of the original issue and the above repro, @adklempner did some investigations around liveness recently that might be more relevant here now |
Problem
js-waku (js-libp2p) automatically reconnects to remote nodes after disconnection/network access loss.
I can see the re-connection happening but it fails during the identify protocol.
I am still trying to pin down the exact error but would be keen to know if nwaku logs (from the fleet) can be extracted using the local timestamps.
Also keen to know what information I need to provide to help extraction of said logs.
Impact
I am adding
critical
label because disconnection has been the main issue platforms have complained about js-waku.Now that it makes to focus on it (refactor of js-waku), I would like this to be resolved.
To reproduce
If you go to https://examples.waku.org/eth-pm/ stop your internet access (wifi off) and on. You should be able to see that the Waku node tries to reconnect and successful does a libp2p noise handshake but fail during the identify protocol.
Expected behavior
Reconnection is successful
Screenshots/logs
Here are some logs, feel free to let me know what kind of information would be needed to find the nwaku fleet node logs:
nwaku version/commit hash
Whatever is deployed on
/dns4/node-01.ac-cn-hongkong-c.wakuv2.prod.statusim.net/tcp/8000/wss/p2p/16Uiu2HAm4v86W3bmT1BiH6oSPzcsSr24iDQpSN5Qa992BCjjwgrD
at the time of the issue creation.The text was updated successfully, but these errors were encountered: