-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: the one where we fix client subscriptions #24473
fix: the one where we fix client subscriptions #24473
Conversation
Codecov Report
@@ Coverage Diff @@
## master solana-labs/solana#24473 +/- ##
===========================================
- Coverage 81.8% 75.1% -6.8%
===========================================
Files 632 38 -594
Lines 167499 2242 -165257
Branches 322 323 +1
===========================================
- Hits 137169 1685 -135484
+ Misses 30217 444 -29773
Partials 113 113 |
What I wouldn't give for a good 'ol fashioned @altano review on this diff, for old time's sake. |
web3.js/src/connection.ts
Outdated
function warnListenerRemovedWithoutActiveSubscription( | ||
id: number, | ||
label: string, | ||
) { | ||
console.warn( | ||
`You removed a(n) ${label} subscription with id \`${id}\` ` + | ||
'where no active subscription having that id could be . ' + | ||
'found. Verify that your code does not try to ' + | ||
'unsubscribe the same listener more than once.', | ||
); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit is from #24607. It shows up here because GitHub doesn't support stacked PRs properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on! Looks really good so far.
web3.js/src/connection.ts
Outdated
filter = sub.filter; | ||
callbacks.forEach(cb => { | ||
try { | ||
cb(...(publishArgs as [any, any])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be any[]
? Some callbacks only have one argument, and future callbacks could have more than 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that initially, but TypeScript squaked:
A spread argument must either have a tuple type or be passed to a rest parameter. ts(2556)
This was a great review, @jstarry. I addressed each one of your comments in a commit on top of ‘fix: special case auto-disposing subscriptions like signatureSubscribe’ so that you can walk through the new stuff one by one. |
web3.js/src/connection.ts
Outdated
@@ -1958,33 +2041,15 @@ export type SignatureSubscriptionCallback = ( | |||
* Signature subscription options | |||
*/ | |||
export type SignatureSubscriptionOptions = { | |||
commitment?: Commitment; | |||
commitment?: Commitment; // Must default to `Connection._commitment` or 'finalized' otherwise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment directed at maintainers? If so, I think we should move this comment (and the others you've copied around) to the args: IWSRequestParams
param on _makeSubscription
. I don't think that the current comment makes it very clear why it must default to some commitment, but that explanation should be read by contributor who calls _makeSubscription
in a new function. In fact, it could generally describe that the client should set explicit defaults for parameters to ensure that subscriptions are hashed correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright! I'll move this around.
God, I hate optional configs.
Alright @jstarry, the review can continue from ‘fix: write documentation explaining how and why to apply a default commitment’ onward, now that I've squashed the heinous bugs that you caught. |
state: 'subscribing', | ||
}; | ||
const serverSubscriptionId: ServerSubscriptionId = | ||
(await this._rpcWebSocket.call(method, args)) as number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While calling this method, the websocket could be closed and all subscriptions are set to 'pending'
, I think we should check for that before setting to 'subscribed'
here (this applies to unsubscribe as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woo, that sent me down a debugging rabbit hole. OK, I got to the bottom of this. Wow. From MDN.
The browser will throw an exception if you call send() when the connection is in the CONNECTING state. If you call send() when the connection is in the CLOSING or CLOSED states, the browser will silently discard the data.
I just presumed that if you invoke call()
when the socket is down, it would throw an exception and process the catch block below that sets the subscription back to pending
.
The ‘bare metal’ socket doesn't throw. It just drops the data, and never calls the callback.
ws = new WebSocket('wss://demo.piesocket.com/v3/channel_1?api_key=VCXCEuvhGcBDP7XhiJJUDvR1e1D3eiVjgZ9VRiaV¬ify_self')
ws.onopen = () => {
ws.close();
console.log('is `CLOSING`', ws.readyState === ws.CLOSING);
try {
ws.send(JSON.stringify('hello'));
console.log('sent, when I expected an error to be thrown');
} catch(e) {
console.error('legit caught an error', e);
}
};
But, the rpc-websockets client always throws if it doesn't receive a reply within a timeout.
https://github.com/elpheria/rpc-websockets/blob/trunk/src/lib/client.ts#L157-L172
That's the bug here. The try
block will only ever continue past await this._rpcWebSocket.call()
if a reply was received, in which case we're all good. The catch
block however might get called because of a disconnection-caused timeout. It's there where I need to make sure that we're still in the subscribing
state before I just hard shunt the subscription back to pending
. And that I can write a test for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright @jstarry, one more commit on top: ‘fix: bail on async…’ should fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure it can still technically happen if we get a reply and the socket is disconnected right after. My line of reasoning is that the async function will yield to the scheduler which could then process a close event before we continue on to setting the subscription to "subscribed". Maybe that's unlikely but I don't think it would hurt to add another if (!isCurrentConnectionStillActive())
check. In fact we could factor out a function called applySubscriptionTransition
which always checks if the current connection is active. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A socket could disconnect while we're literally in the middle of processing the reply, 100%. The thing is that the disconnect happens in a thread outside the JS engine (ie. in the browser's networking stack) so there's no possible way for it to show up on the single JavaScript thread until the next runloop.
The async _rpcWebSocket.call()
can only resume in one of two ways:
- throw (reject) when the reply timeout gets hit.
- resume execution (resolve) because it received a reply.
Presume, now, that _rpcWebSocket.call()
has yielded to the scheduler and is waiting for one of those two things to happen.
- If a close event happens before we get a reply, the only possible thing that could happen next is for the reply timeout to be hit†, which would trigger our
catch
which handles this properly now. - If a reply comes in, then we're definitely not closed, and every statement after the
await
will run synchronously until there are no more statements to evaluate.
Now, imagine that a close happened, in a real world sense, while that code after the await
was running (eg. I yanked out the network cable). The browser will learn of the close, for sure, but there's literally no way for the browser to schedule the close
event on the JS thread until after the reply is done being processed. The browser can fire the close
event into the JS thread all it wants, but it's not going to be able to preempt the code after the await
…
this._subscriptionsByHash[hash] = {
...subscription,
serverSubscriptionId,
state: 'subscribed',
};
this._subscriptionCallbacksByServerSubscriptionId[
serverSubscriptionId
] = subscription.callbacks;
await this._updateSubscriptions();
…by sneaking a call to _wsOnClose
in between those three lines.
JavaScript being single threaded is one of the worst things about JavaScript, but also definitely one of the best things about JavaScript.
† If it's literally possible to receive and process a data frame (reply) after having received a close
event on the JS thread, then everything I said above doesn't matter, I will eat my hat, and websockets are terrible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding from this gist is that even if an async rpc call receives and processes a reply, we might not pick up where we left off until after processing other events in the work queue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh totally, but other events will mostly touch other subscriptions, and the ones that do touch this subscription will perform no actions because it's in the 'subscribing'
state:
case 'subscribing':
case 'unsubscribing':
break;
Receiving a reply corresponds to this event firing in the rpc-websockets
library:
this.socket.addEventListener("message", ({data: message}) => { ... }
All of the code of that event handler fires sync, including this bit at the end:
if (message.error)
this.queue[message.id].promise[1](message.error)
else
this.queue[message.id].promise[0](message.result)
promise[0]
there corresponds to the resolve
of this._rpcWebSockets.call(...)
. When it's called with message.result
(the reply) then all of the code after our await
will run synchronously:
this._subscriptionsByHash[hash] = {
...subscription,
serverSubscriptionId,
state: 'subscribed',
};
this._subscriptionCallbacksByServerSubscriptionId[
serverSubscriptionId
] = subscription.callbacks;
await this._updateSubscriptions();
So essentially you can draw a straight line between the 'message'
event firing and all of that code running before yielding back to the browser. This means that the act of receiving a reply implies two things:
- you were definitely not
CLOSED
orCLOSING
at that moment (otherwise you wouldn't have been able to receive'message'
), and - no
'close'
event can sneak into the JS thread while you're running the reply processing code.
Take a look at these branches of connection-subscriptions.test.ts
:
Subscriptions
The `accountSubscribe` RPC method
attaching the first notification listener
✓ results in a subscription request being made to the RPC
then unsubscribing that listener before the subscription has been acknowledged by the server
once the subscription has been acknowledged by the server
✓ results in the subscription being torn down immediately
then having the socket connection drop unexpectedly
then unsubscribing that listener
upon the socket connection reopening
✓ does not result in a new subscription request being made to the RPC
upon the socket connection reopening
✓ results in a new subscription request being made to the RPC
then upon the prior subscription fataling (eg. because its timeout triggers)
✓ does not result in a new subscription request being made to the RPC
Those exercise what happens when you have an in-flight 'subscribing'
state and one of two things happens:
- the user decides they don't care anymore and call
remove*
- the connection drops
Thoughts? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So essentially you can draw a straight line between the 'message' event firing and all of that code running before yielding back to the browser. This means that the act of receiving a reply implies two things:
Ah, this makes a lot of sense. Looks good then, thanks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there, just did a full pass so should be good to go after this round
state: 'subscribing', | ||
}; | ||
const serverSubscriptionId: ServerSubscriptionId = | ||
(await this._rpcWebSocket.call(method, args)) as number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure it can still technically happen if we get a reply and the socket is disconnected right after. My line of reasoning is that the async function will yield to the scheduler which could then process a close event before we continue on to setting the subscription to "subscribed". Maybe that's unlikely but I don't think it would hurt to add another if (!isCurrentConnectionStillActive())
check. In fact we could factor out a function called applySubscriptionTransition
which always checks if the current connection is active. What do you think?
web3.js/src/connection.ts
Outdated
filter = sub.filter; | ||
callbacks.forEach(cb => { | ||
try { | ||
cb(...(callbackArgs as [any, any])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment explaining that callbackArgs
might not actually match this tuple but we just do this to avoid a typescript issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. What a mess: microsoft/TypeScript#47615
web3.js/src/connection.ts
Outdated
* - Otherwise, if the `Connection::commitment` is set, use that. | ||
* - Otherwise, set it to the RPC server default: `finalized`. | ||
* | ||
* This is extrememly important to ensure that these two fundamentally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* This is extrememly important to ensure that these two fundamentally | |
* This is extremely important to ensure that these two fundamentally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at long last 😉
O no. Famous last words. |
…ds from server subscription ids
…'t fatal (SQUASH THIS)
…mmitment (SQUASH THIS)
* chore: create a first-class type to distinguish client subscription ids from server subscription ids * chore: add fast-stable-stringify as a dependency to web3.js * fix: reimplement the subscription tracker as a state machine * test: updated tests to recognize that signatureUnsubscribe happens synchronously * chore: add sinon-chai so that we can make assertions on calling spies * test: coverage for the full range of subscription state transitions * fix: special case auto-disposing subscriptions like signatureSubscribe * fix: document Subscription type (SQUASH THIS) * fix: strict undefined checks (SQUASH THIS) * fix: naming (SQUASH THIS) * fix: move defaulting to source (SQUASH THIS) * fix: build RPC args first, then produce the subscription topic hash (SQUASH THIS) * fix: dispose handles no longer track whether they've been called (SQUASH THIS) * fix: shore up the auto-disposing sub tests now that double-free doesn't fatal (SQUASH THIS) * fix: write documentation explaining how and why to apply a default commitment (SQUASH THIS) * fix: skip subscriptions that have been deleted when recursing (SQUASH THIS) * fix: bail on async responses when the connection has been recycled (SQUASH THIS) * fix: typo in comment (SQUASH THIS) * chore: comment on why notification callbacks are ts-ignored * chore: start all the new stuff out as private members of the Connection class
Problem
Props to @lwus for bringing to my attention that:
Summary of Changes
This PR represents a pretty massive rewrite of the client subscriptions tracker. Let me lay out the goal of every commit:
ClientSubscriptionId
to disambiguate it from server subscription ids. This makes it more clear when you're unsubscribing a client handle for a given method/params (which there can be many of) and when you're unsubscribing a server handle for a given method/params (which there can only ever be one of)fast-stable-stringify
. We'll use this to identify a given method/params pair by its hash, so that we can pool a multitude of client subscriptions behind a single server subscription. There will be a 1:1 relationship between the hash of a subscription and theServerSubscriptionId
assigned to it. This library comes in hot at ~700 bytes gzipped, so I feel pretty good about that.pending
,subscribing
,subscribed
,unsubscribing
, andunsubscribed
). TheupdateSubscriptions
method is now solely responsible for managing the transitions from one state to the next.signatureSubscribe
. Read the tests to remind yourself of what makes signature subscriptions special in this regard.Test plan
What the mega refactor commit lacks in reviewability, I hope I've made up for in spades through test coverage. The tests in this commit cover everything from ‘what happens if you attach a subscription while you're in the middle of unsubscribing from an existing one’ and ‘what happens if the websocket goes down’ and ‘what happens if the unsubscribe request fatals.’ And all of this, without the need for a live validator. Check out a clean run and read through the test titles: https://gist.github.com/steveluscher/79f0c06c5c4d6648466668d47babf0f6
Let's bake this!
If you're reading this, and you rely on subscriptions in your dApp, please help me beta test this! Install the prerelease version of web3.js by using
yarn add @solana/web3.js@next
ornpm i --save @solana/web3.js@next
.Fixes #24472, #24176