-
Notifications
You must be signed in to change notification settings - Fork 949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug(Gossipsub over WebRTC) Unpredictable Dial upgrade Timeout error #3665
Comments
To add to the troubleshooting, whenever I have the server dial the browser (which totally shouldn't work, it always fails as the browser is not listening) the chances of a successful handshake increases significantly. It's strange behaviour. Makes me wonder if there's a part of the WebRTC handshake from the server to the browser that doesn't always get called. eprintln!("✔️ Connection Established to {peer_id} in {established_in:?} on {send_back_addr}");
// This dial shouldn't work, but strangely increases the odds of connecting...
let mut res = send_back_addr;
strip_peer_id(&mut res);
eprintln!("📞 Server dialing the browser {res}");
let dial_opts = DialOpts::unknown_peer_id()
.address(res.clone())
.build();
if let Err(e) = swarm.dial(dial_opts) {
println!("❌ (Expected) Dialing error: {e:?}");
} |
Linking here libp2p/universal-connectivity#3 |
Addressing gossipsub upgrade issues here: #3625 |
@thomaseizinger It's definitely a rust-libp2p issue. I've narrowed it down: it is happening on the rust server side, somewhere between:
and
Somehow gossipsub adds the peer, adds a subscriber, but then fails on the upgrade (but only most of the time...) |
coming from rust-libp2p/swarm/src/connection.rs Lines 249 to 264 in fd09835
|
We don't actually perform any upgrade on our side other than negotiating the protocol. Can you run with This might be an issue on the JS end. If the stream upgrade there is not completed within 10s (I think is our default timeout) then this error happens. Is the JS event-loop blocked perhaps? Can you turn up logging for the JS side? |
I think @mxinden has figured it out here libp2p/universal-connectivity#3 (comment) , tracking here #3690 I am so glad, this was really driving me crazy! |
This can probably be closed now, I've not been seeing the issue like it was before. |
Summary
I've been experimenting with the WebRTC transport and successfully got
Ping
to work between:Next step was to escalate to
gossipsub
to build a simple chat demo between the same two systems. After much hacking, sometimes the two connect, but most times they do not.Experiment Repo: https://github.com/DougAnderson444/rdy2serve
Expected behaviour
Launch a rust-server, launch a js-browser instance, have the two connect over WebRTC and exchange chat messages via gossipsub. I was expecting consistent behaviour of rust-libp2p gossipsub.
Actual behaviour
Rust-libp2p gossipsub over WebRTC is unpredictable. Most of the time it doesn't work.... but then all of the sudden it will connect and chat just fine. I found that sometimes after launching 5 or 6 browsers, the connection upgrades, which made me think if there's a minimum mesh peer issue, but other times the single browser window connects just fine.
From what I can tell, when it fails, gossipsub is not upgrading the connection consistently.
Some consistent observations:
[libp2p_gossipsub::handler] Dial upgrade error Timeout
then a[libp2p_swarm] NegotiationTimeout
Typical failure (Server side rust)
Typical Success (Server side rust)
Typical Client side failure (browser JS)
Possible Solution
Troubleshoot the upgrade process, specific for WebRTC process.
Version
0.51.1
Would you like to work on fixing this bug?
Maybe. I've troubleshot as far as I can, willing to help more but I'm at a loss as to what to try next,
The text was updated successfully, but these errors were encountered: