Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: the one where we fix client subscriptions #24473

Merged
merged 20 commits into from
Apr 28, 2022
Merged

fix: the one where we fix client subscriptions #24473

merged 20 commits into from
Apr 28, 2022

Conversation

steveluscher
Copy link
Contributor

@steveluscher steveluscher commented Apr 19, 2022

Problem

Props to @lwus for bringing to my attention that:

  1. If you make two RPC subscriptions for fundamentally the same subscription (ie. same method, same params) only the first subscription's callback will ever fire.
  2. If you start with more than one client subscription for fundamentally the same RPC subscription, and then dispose of one of them, the client will unsubscribe all of them.
  3. If you make two RPC subscriptions for fundamentally the same subscription, we needlessly make two RPC subscription requests to the server, when only one is needed.

Summary of Changes

This PR represents a pretty massive rewrite of the client subscriptions tracker. Let me lay out the goal of every commit:

  • As preparation, we create a type called ClientSubscriptionId to disambiguate it from server subscription ids. This makes it more clear when you're unsubscribing a client handle for a given method/params (which there can be many of) and when you're unsubscribing a server handle for a given method/params (which there can only ever be one of)
  • Then we install fast-stable-stringify. We'll use this to identify a given method/params pair by its hash, so that we can pool a multitude of client subscriptions behind a single server subscription. There will be a 1:1 relationship between the hash of a subscription and the ServerSubscriptionId assigned to it. This library comes in hot at ~700 bytes gzipped, so I feel pretty good about that.
  • Next, is a giant, unreviewable overhaul of the subscription tracker. How to review it: first, take a look at the TypeScript types. I've created a tagged union of all of the possible subscriptions you can make, by method, and another tagged union of all of the states that a subscription can be in (pending, subscribing, subscribed, unsubscribing, and unsubscribed). The updateSubscriptions method is now solely responsible for managing the transitions from one state to the next.
  • The next commit is just a cleanup of the websocket tests and mocks to handle the special case that . Feel free to ignore it.
  • In the next commit we add a library to be able to spy on function calls and make expectations about them in test plans.
  • Read more about the next commit in the ‘test plan’ section below.
  • In the final commit, we augment the state machine to implement handling for ‘auto-disposing’ subscriptions like signatureSubscribe. Read the tests to remind yourself of what makes signature subscriptions special in this regard.

Test plan

What the mega refactor commit lacks in reviewability, I hope I've made up for in spades through test coverage. The tests in this commit cover everything from ‘what happens if you attach a subscription while you're in the middle of unsubscribing from an existing one’ and ‘what happens if the websocket goes down’ and ‘what happens if the unsubscribe request fatals.’ And all of this, without the need for a live validator. Check out a clean run and read through the test titles: https://gist.github.com/steveluscher/79f0c06c5c4d6648466668d47babf0f6

Let's bake this!

If you're reading this, and you rely on subscriptions in your dApp, please help me beta test this! Install the prerelease version of web3.js by using yarn add @solana/web3.js@next or npm i --save @solana/web3.js@next.

Fixes #24472, #24176

@steveluscher steveluscher added the javascript Pull requests that update Javascript code label Apr 19, 2022
@codecov
Copy link

codecov bot commented Apr 20, 2022

Codecov Report

Merging #24473 (17a5714) into master (8ba003a) will decrease coverage by 6.7%.
The diff coverage is n/a.

❗ Current head 17a5714 differs from pull request most recent head b3f08b6. Consider uploading reports for the commit b3f08b6 to get more accurate results

@@             Coverage Diff             @@
##           master   solana-labs/solana#24473       +/-   ##
===========================================
- Coverage    81.8%    75.1%     -6.8%     
===========================================
  Files         632       38      -594     
  Lines      167499     2242   -165257     
  Branches      322      323        +1     
===========================================
- Hits       137169     1685   -135484     
+ Misses      30217      444    -29773     
  Partials      113      113               

@steveluscher steveluscher changed the title The one where we fix client subscriptions fix: the one where we fix client subscriptions Apr 20, 2022
@steveluscher steveluscher marked this pull request as ready for review April 22, 2022 00:03
@steveluscher
Copy link
Contributor Author

What I wouldn't give for a good 'ol fashioned @altano review on this diff, for old time's sake.

Comment on lines 2123 to 2134
function warnListenerRemovedWithoutActiveSubscription(
id: number,
label: string,
) {
console.warn(
`You removed a(n) ${label} subscription with id \`${id}\` ` +
'where no active subscription having that id could be . ' +
'found. Verify that your code does not try to ' +
'unsubscribe the same listener more than once.',
);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit is from #24607. It shows up here because GitHub doesn't support stacked PRs properly.

Copy link
Member

@jstarry jstarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on! Looks really good so far.

web3.js/src/connection.ts Outdated Show resolved Hide resolved
web3.js/src/connection.ts Outdated Show resolved Hide resolved
web3.js/src/connection.ts Outdated Show resolved Hide resolved
web3.js/src/connection.ts Outdated Show resolved Hide resolved
web3.js/src/connection.ts Outdated Show resolved Hide resolved
web3.js/src/connection.ts Outdated Show resolved Hide resolved
web3.js/src/connection.ts Show resolved Hide resolved
web3.js/src/connection.ts Show resolved Hide resolved
web3.js/src/connection.ts Show resolved Hide resolved
filter = sub.filter;
callbacks.forEach(cb => {
try {
cb(...(publishArgs as [any, any]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be any[]? Some callbacks only have one argument, and future callbacks could have more than 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that initially, but TypeScript squaked:

A spread argument must either have a tuple type or be passed to a rest parameter. ts(2556)

@steveluscher
Copy link
Contributor Author

This was a great review, @jstarry. I addressed each one of your comments in a commit on top of ‘fix: special case auto-disposing subscriptions like signatureSubscribe’ so that you can walk through the new stuff one by one.

@steveluscher steveluscher requested a review from jstarry April 26, 2022 01:33
@@ -1958,33 +2041,15 @@ export type SignatureSubscriptionCallback = (
* Signature subscription options
*/
export type SignatureSubscriptionOptions = {
commitment?: Commitment;
commitment?: Commitment; // Must default to `Connection._commitment` or 'finalized' otherwise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment directed at maintainers? If so, I think we should move this comment (and the others you've copied around) to the args: IWSRequestParams param on _makeSubscription. I don't think that the current comment makes it very clear why it must default to some commitment, but that explanation should be read by contributor who calls _makeSubscription in a new function. In fact, it could generally describe that the client should set explicit defaults for parameters to ensure that subscriptions are hashed correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright! I'll move this around.

God, I hate optional configs.

web3.js/src/connection.ts Show resolved Hide resolved
web3.js/src/connection.ts Show resolved Hide resolved
web3.js/src/connection.ts Show resolved Hide resolved
web3.js/src/connection.ts Show resolved Hide resolved
web3.js/src/connection.ts Show resolved Hide resolved
@steveluscher
Copy link
Contributor Author

Alright @jstarry, the review can continue from ‘fix: write documentation explaining how and why to apply a default commitment’ onward, now that I've squashed the heinous bugs that you caught.

state: 'subscribing',
};
const serverSubscriptionId: ServerSubscriptionId =
(await this._rpcWebSocket.call(method, args)) as number;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While calling this method, the websocket could be closed and all subscriptions are set to 'pending', I think we should check for that before setting to 'subscribed' here (this applies to unsubscribe as well).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woo, that sent me down a debugging rabbit hole. OK, I got to the bottom of this. Wow. From MDN.

The browser will throw an exception if you call send() when the connection is in the CONNECTING state. If you call send() when the connection is in the CLOSING or CLOSED states, the browser will silently discard the data.

I just presumed that if you invoke call() when the socket is down, it would throw an exception and process the catch block below that sets the subscription back to pending.

The ‘bare metal’ socket doesn't throw. It just drops the data, and never calls the callback.

ws = new WebSocket('wss://demo.piesocket.com/v3/channel_1?api_key=VCXCEuvhGcBDP7XhiJJUDvR1e1D3eiVjgZ9VRiaV&notify_self')
ws.onopen = () => {
    ws.close();
    console.log('is `CLOSING`', ws.readyState === ws.CLOSING);
    try {
        ws.send(JSON.stringify('hello'));
        console.log('sent, when I expected an error to be thrown');
    } catch(e) {
        console.error('legit caught an error', e);
    }
};

But, the rpc-websockets client always throws if it doesn't receive a reply within a timeout.

https://github.com/elpheria/rpc-websockets/blob/trunk/src/lib/client.ts#L157-L172

That's the bug here. The try block will only ever continue past await this._rpcWebSocket.call() if a reply was received, in which case we're all good. The catch block however might get called because of a disconnection-caused timeout. It's there where I need to make sure that we're still in the subscribing state before I just hard shunt the subscription back to pending. And that I can write a test for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright @jstarry, one more commit on top: ‘fix: bail on async…’ should fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it can still technically happen if we get a reply and the socket is disconnected right after. My line of reasoning is that the async function will yield to the scheduler which could then process a close event before we continue on to setting the subscription to "subscribed". Maybe that's unlikely but I don't think it would hurt to add another if (!isCurrentConnectionStillActive()) check. In fact we could factor out a function called applySubscriptionTransition which always checks if the current connection is active. What do you think?

Copy link
Contributor Author

@steveluscher steveluscher Apr 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A socket could disconnect while we're literally in the middle of processing the reply, 100%. The thing is that the disconnect happens in a thread outside the JS engine (ie. in the browser's networking stack) so there's no possible way for it to show up on the single JavaScript thread until the next runloop.

The async _rpcWebSocket.call() can only resume in one of two ways:

  • throw (reject) when the reply timeout gets hit.
  • resume execution (resolve) because it received a reply.

Presume, now, that _rpcWebSocket.call() has yielded to the scheduler and is waiting for one of those two things to happen.

  • If a close event happens before we get a reply, the only possible thing that could happen next is for the reply timeout to be hit†, which would trigger our catch which handles this properly now.
  • If a reply comes in, then we're definitely not closed, and every statement after the await will run synchronously until there are no more statements to evaluate.

Now, imagine that a close happened, in a real world sense, while that code after the await was running (eg. I yanked out the network cable). The browser will learn of the close, for sure, but there's literally no way for the browser to schedule the close event on the JS thread until after the reply is done being processed. The browser can fire the close event into the JS thread all it wants, but it's not going to be able to preempt the code after the await

this._subscriptionsByHash[hash] = {
  ...subscription,
  serverSubscriptionId,
  state: 'subscribed',
};
this._subscriptionCallbacksByServerSubscriptionId[
  serverSubscriptionId
] = subscription.callbacks;
await this._updateSubscriptions();

…by sneaking a call to _wsOnClose in between those three lines.

JavaScript being single threaded is one of the worst things about JavaScript, but also definitely one of the best things about JavaScript.


† If it's literally possible to receive and process a data frame (reply) after having received a close event on the JS thread, then everything I said above doesn't matter, I will eat my hat, and websockets are terrible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding from this gist is that even if an async rpc call receives and processes a reply, we might not pick up where we left off until after processing other events in the work queue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh totally, but other events will mostly touch other subscriptions, and the ones that do touch this subscription will perform no actions because it's in the 'subscribing' state:

case 'subscribing':
case 'unsubscribing':
  break;

Receiving a reply corresponds to this event firing in the rpc-websockets library:

this.socket.addEventListener("message", ({data: message}) => { ... }

All of the code of that event handler fires sync, including this bit at the end:

if (message.error)
  this.queue[message.id].promise[1](message.error)
else
  this.queue[message.id].promise[0](message.result)

promise[0] there corresponds to the resolve of this._rpcWebSockets.call(...). When it's called with message.result (the reply) then all of the code after our await will run synchronously:

this._subscriptionsByHash[hash] = {
  ...subscription,
  serverSubscriptionId,
  state: 'subscribed',
};
this._subscriptionCallbacksByServerSubscriptionId[
  serverSubscriptionId
] = subscription.callbacks;
await this._updateSubscriptions();

So essentially you can draw a straight line between the 'message' event firing and all of that code running before yielding back to the browser. This means that the act of receiving a reply implies two things:

  • you were definitely not CLOSED or CLOSING at that moment (otherwise you wouldn't have been able to receive 'message'), and
  • no 'close' event can sneak into the JS thread while you're running the reply processing code.

Take a look at these branches of connection-subscriptions.test.ts:

Subscriptions
  The `accountSubscribe` RPC method
    attaching the first notification listener
      ✓ results in a subscription request being made to the RPC
      then unsubscribing that listener before the subscription has been acknowledged by the server
        once the subscription has been acknowledged by the server
          ✓ results in the subscription being torn down immediately
      then having the socket connection drop unexpectedly
        then unsubscribing that listener
          upon the socket connection reopening
            ✓ does not result in a new subscription request being made to the RPC
        upon the socket connection reopening
          ✓ results in a new subscription request being made to the RPC
          then upon the prior subscription fataling (eg. because its timeout triggers)
            ✓ does not result in a new subscription request being made to the RPC

Those exercise what happens when you have an in-flight 'subscribing' state and one of two things happens:

  • the user decides they don't care anymore and call remove*
  • the connection drops

Thoughts? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So essentially you can draw a straight line between the 'message' event firing and all of that code running before yielding back to the browser. This means that the act of receiving a reply implies two things:

Ah, this makes a lot of sense. Looks good then, thanks :)

@steveluscher steveluscher requested a review from jstarry April 28, 2022 01:29
Copy link
Member

@jstarry jstarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there, just did a full pass so should be good to go after this round

web3.js/src/connection.ts Show resolved Hide resolved
state: 'subscribing',
};
const serverSubscriptionId: ServerSubscriptionId =
(await this._rpcWebSocket.call(method, args)) as number;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it can still technically happen if we get a reply and the socket is disconnected right after. My line of reasoning is that the async function will yield to the scheduler which could then process a close event before we continue on to setting the subscription to "subscribed". Maybe that's unlikely but I don't think it would hurt to add another if (!isCurrentConnectionStillActive()) check. In fact we could factor out a function called applySubscriptionTransition which always checks if the current connection is active. What do you think?

web3.js/src/connection.ts Outdated Show resolved Hide resolved
filter = sub.filter;
callbacks.forEach(cb => {
try {
cb(...(callbackArgs as [any, any]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment explaining that callbackArgs might not actually match this tuple but we just do this to avoid a typescript issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. What a mess: microsoft/TypeScript#47615

web3.js/src/connection.ts Show resolved Hide resolved
* - Otherwise, if the `Connection::commitment` is set, use that.
* - Otherwise, set it to the RPC server default: `finalized`.
*
* This is extrememly important to ensure that these two fundamentally
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* This is extrememly important to ensure that these two fundamentally
* This is extremely important to ensure that these two fundamentally

@steveluscher steveluscher requested a review from jstarry April 28, 2022 06:11
jstarry
jstarry previously approved these changes Apr 28, 2022
Copy link
Member

@jstarry jstarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at long last 😉

@steveluscher
Copy link
Contributor Author

at long last 😉

O no. Famous last words.

@mergify mergify bot dismissed jstarry’s stale review April 28, 2022 19:34

Pull request has been modified.

@steveluscher steveluscher merged commit 2e617ba into solana-labs:master Apr 28, 2022
jeffwashington pushed a commit to jeffwashington/solana that referenced this pull request Jun 29, 2022
* chore: create a first-class type to distinguish client subscription ids from server subscription ids

* chore: add fast-stable-stringify as a dependency to web3.js

* fix: reimplement the subscription tracker as a state machine

* test: updated tests to recognize that signatureUnsubscribe happens synchronously

* chore: add sinon-chai so that we can make assertions on calling spies

* test: coverage for the full range of subscription state transitions

* fix: special case auto-disposing subscriptions like signatureSubscribe

* fix: document Subscription type (SQUASH THIS)

* fix: strict undefined checks (SQUASH THIS)

* fix: naming (SQUASH THIS)

* fix: move defaulting to source (SQUASH THIS)

* fix: build RPC args first, then produce the subscription topic hash (SQUASH THIS)

* fix: dispose handles no longer track whether they've been called (SQUASH THIS)

* fix: shore up the auto-disposing sub tests now that double-free doesn't fatal (SQUASH THIS)

* fix: write documentation explaining how and why to apply a default commitment (SQUASH THIS)

* fix: skip subscriptions that have been deleted when recursing (SQUASH THIS)

* fix: bail on async responses when the connection has been recycled (SQUASH THIS)

* fix: typo in comment (SQUASH THIS)

* chore: comment on why notification callbacks are ts-ignored

* chore: start all the new stuff out as private members of the Connection class
@steveluscher steveluscher deleted the the-one-where-we-fix-client-subscriptions branch December 7, 2022 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
javascript Pull requests that update Javascript code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Subscribing/unsubscribing from the same subscription multiple times yields incorrect behavior
2 participants