-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: [web3] only ever send RPC socket messages when the socket is open #29195
fix: [web3] only ever send RPC socket messages when the socket is open #29195
Conversation
Codecov Report
@@ Coverage Diff @@
## master solana-labs/solana#29195 +/- ##
=========================================
- Coverage 76.5% 76.3% -0.3%
=========================================
Files 54 56 +2
Lines 3129 3151 +22
Branches 470 475 +5
=========================================
+ Hits 2396 2406 +10
- Misses 568 578 +10
- Partials 165 167 +2 |
What ever happened with this? @steveluscher |
I'm pretty sure that it works @Jac0xb, but I don't have a local repro yet. I asked @laterbreh, @Disperito, and @gallynaut to help me give it a shot, but I think that all of them have moved on. If anyone is reading this and has a repro of the bug still, can you try to upgrade to an experimental release, by running |
Alright, I'm sufficiently bullish on this change that I'm going to ship it tomorrow, even though I haven't been able to scare up someone with a repro to test it out. |
@steveluscher thank you for debugging and working on this PR! I have some questions, now that I am experiencing this too. I would love to help debug this and get it eventually fixed, but to do so, I would like to better understand some context (especially in relation to this PR). FWIW, the issue I am seeing is happening in context of React Native. I think we're having two issues here happening potentially at the same time. The way I see it is that uncaught exceptions are causing the code handling WebSocket connection/subscriptions to enter infinite loop. Checking state for connected and throwing an Error (or rejecting a promise) otherwise sounds like not changing anything but just throwing a custom error instead of the one from an underlying From my debugging, this is the line that is called infinitely as soon as first call throws an error: Was the idea here that it is RpcWebSocket that gets corrupted while there is an error, so it is better to check for the state ahead of calling its If yes, I think a (better?) long-term fix could be to either:
There can be other errors happening with WebSockets as well, so guarding against Also, another thing that I wanted to point out that might be relevant here is - I noticed that you mentioned race conditions in one of the issues. I am currently observing a behavior when this PR doesn't actually work - the code itself is fine, but the |
@@ -36,6 +35,7 @@ import fetchImpl, {Response} from './fetch-impl'; | |||
import {DurableNonce, NonceAccount} from './nonce-account'; | |||
import {PublicKey} from './publickey'; | |||
import {Signer} from './keypair'; | |||
import RpcWebSocketClient from './rpc-websocket'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We create RpcWebSocketClient
like this later:
this._rpcWebSocket = new RpcWebSocketClient(this._rpcWsEndpoint, {
autoconnect: false,
max_reconnects: Infinity,
});
it seems that max_reconnects: Infinity
is a bit misleading here, as it is going to be overwritten by explicitly hardcoded max_reconnects: 5
in the rpc-websocket.ts
file?
Quick follow up to my previous comment, after further debugging:
It looks like there's already a notion of it in the code, it's just that it doesn't seem to work in this case: The reason this doesn't work is that it checks for and that will return false only after the In other words, this will return false-positive results for all the errors that happen while connection was either open or connecting. After doing debugging and console logging different connection states (hell yeah), I discovered that in my particular local scenario, the following happens:
Now any call to And that will enter infinite loop. That's why this PR has no effect (unfortunately) on the original issue. It looks to me that the Now, I started wondering why the AFTER I COMMENTED OUT THIS CODE, EVERYTHING WORKS AS EXPECTEDIt sounds like I may have entered into a state where my connection keeps on being created and automatically closed which causes an error. After commenting out that entire code (disconnect on no subscriptions), the error completely went away! |
Note there is another issue of WebSockets (and TL;DR because this thread is really long now:
This is definitely not the right way to do it, but... I guess with a little bit of help we can make sure that its intent is preserved w/o causing this issue. |
@grabbou, this is awesome debugging. Are you close to being able to share a repro so that I can comb over these findings too? |
Hey, yes, I was taking my time since my workaround was working "just fine" and I was totally overdue with some fun... I will send something shortly! |
I'm pretty sure that elpheria/rpc-websockets#138 fixes the source of corruption of the subscriptions state machine here. |
Problem
Our
rpc-websockets
dependency does not guard against the underlying socket not beingOPEN
before it tries to callsend()
on the socket itself.readyState
of a socket isCONNECTING
and you try to callsend()
, it will fatal.readyState
of a socket isCLOSING
orCLOSED
and you try to callsend()
, it will discard the call silently. A console error results sometime later, but in no event does it fatal your program.All of this can lead to corruption of the internal state of the RPC; it can think that it's connected, able to send messages, and is owed responses, when it is not.
Summary of Changes
rpc-websockets
and wrap thecall()
andnotify()
methods in code that checks thereadyState
before trying to callsend()
.Fixes #25578, #27167, solana-labs/solana-web3.js#1106.