Skip to content
This repository has been archived by the owner on Oct 19, 2024. It is now read-only.

RRR: Reconnection & Request Reissuance #2181

Merged
merged 17 commits into from
Mar 1, 2023
Merged

Conversation

prestwich
Copy link
Collaborator

@prestwich prestwich commented Feb 23, 2023

Adds automatic reconnection and request-reissuance logic to WebSockets

Motivation

Users keep making a fuss about it and also it's generally a good thing to have :)

Solution

WS now consists of 2 tasks:

  1. WsBackend - manages dispatching raw values to the WebSocket
  2. RequestManager - manages caching request bodies until the request has resolved

WsBackend dispatches requests and routes responses and notifications. It also has a simple ping-based keepalive (when not compiled to wasm), to prevent inactivity from triggering server-side closes

RequestManager holds a BackendDriver, to communicate with the current backend. Reconnection is accomplished by instantiating a new WsBackend and swapping out the manager's BackendDriver

The RequestManager holds copies of all pending requests (as InFlight), and active subscriptions (as ActiveSub). When reconnection occurs, all pending requests are re-dispatched to the new backend, and all active subs are re-subscribed

In order to provide continuity of subscription IDs to the client, the RequestManager also keeps a SubscriptionManager. This struct manages the relationship between the u64 request ID, and U256 server-side subscription ID. It does this by aliasing the server ID to the request ID, and returning the Request ID to the caller (hiding the server ID in the SubscriptionManager internals.) Giving the caller a "fake" subscription id allows the subscription to behave consistently across reconnections

The behavior is accessed by the WsClient frontend, which implements JsonRpcClient. The WsClient is cloneable (no need for an arc 😄). It communicates to the request manager via a channel, and receives notifications in a shared map for the client to retrieve

The RequestManager shuts down and drops when all WsClient instances have been dropped (because all UnboundedSender instances will have dropped).

The WsBackend shuts down when instructed to by the RequestManager or when the RequestManager drops (because the inbound channel will close)

Note on use of mutexes

This work does NOT use async mutexes, as we never need to hold past an await point. Because each use of mutex is fast (simple hashmap insert/remove), we're not concerned about blocking on locking

Limitations:

  • reconnection may cause dropped events in subscriptions, as the new sub may resume at a later height. This seems unavoidable? 🤔
  • faking the subscription ID requires an extra deser and reser on each eth_subscribe call so that the manager can inspect the sub id
  • memory usage will be higher, as we're keeping a copy of in-flight request params while they're in flight, and a permanent copy of the subscription params

Follow-up work:

Because BackendDriver does not contain any WS-specific behavior, we can add a RequestManager in front of IPC channels to add the same reconnection + reissuance logic. This doesn't seem high priority, as WS is much more prone to ephemeral failures than IPC

PR Checklist

  • Added Tests
  • Added Documentation
  • Updated the changelog
  • Breaking changes

@prestwich prestwich added the enhancement New feature or request label Feb 23, 2023
@prestwich prestwich requested a review from gakonst February 23, 2023 21:56
@prestwich prestwich self-assigned this Feb 23, 2023
@prestwich prestwich force-pushed the prestwich/ws-reconnect branch from 7390950 to 22a0e33 Compare February 25, 2023 16:10
@gakonst
Copy link
Owner

gakonst commented Feb 27, 2023

@mattsse PTAL - I'll cut a release after we get this PR in

@prestwich prestwich marked this pull request as ready for review February 27, 2023 23:12
@prestwich
Copy link
Collaborator Author

going to move all previous WS behavior under legacy_ws and add a feature flag to use it

Copy link
Owner

@gakonst gakonst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Much needed feature. Looking for a pass from @mattsse before merging.

ethers-providers/src/rpc/transports/mod.rs Outdated Show resolved Hide resolved
ethers-providers/src/rpc/transports/ws/backend.rs Outdated Show resolved Hide resolved
ethers-providers/src/rpc/transports/ws/mod.rs Show resolved Hide resolved
ethers-providers/src/rpc/transports/ws/manager.rs Outdated Show resolved Hide resolved
@prestwich prestwich force-pushed the prestwich/ws-reconnect branch from 69be2fc to eb6976e Compare February 28, 2023 03:01
@gakonst gakonst changed the title [WIP] RRR: Reconnection & Request Reissuance RRR: Reconnection & Request Reissuance Feb 28, 2023
Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understood this right, a WsClient now has two different tasks?
I haven't looked into it much, but that's kind of weird I guess.
manager, backend, backenddriver is a but confusing.

it would be great to have some docs on the various channel fields because there are like 4 different channels used in this setup.

this now auto-reconnects on any error error caused by ws connection?

ethers-providers/src/rpc/transports/ws/mod.rs Show resolved Hide resolved
ethers-providers/src/rpc/transports/ws/mod.rs Show resolved Hide resolved
Comment on lines +368 to +379
None => if let Err(e) = self.reconnect().await {
break Err(e);
}
Copy link
Collaborator

@mattsse mattsse Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could totally be wrong here because it wasn't obvious to which component to_handle is wired here. but it looks like the other half belongs to WsBackend?

I believe there's an issue with how the WsBackend task resolves, which could cause two reconnects if the task exits because of an error?
first the channel will be None, then the self.backend.error receives the error message and reconnects again?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there's an issue with how the WsBackend task resolves, which could cause two reconnects if the task exits because of an error?

No, only 1 select branch is reachable in any pass through the loop. So if the to_handle channel is dead, the WS will reconnect. Then by the time the loop restarts, the BackendDriver will be different, and the error channel will be fresh and good

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could totally be wrong here because it wasn't obvious to which component to_handle is wired here. but it looks like the other half belongs to WsBackend?

the naming convention is you send a chunk of work to the handler via handler. the handler accepts work that it should handle via to_handle

same convention with dispatcher and to_dispatch. you send to the dispatching task via a dispatcher channel. the dispatcher accepts the requests via to_dispatch

Copy link
Collaborator Author

@prestwich prestwich Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can think of the BackendDriver as control and ownership of the running backend task. It has the ability to shut down that task, send/receive messages from it, and dropping the driver causes the task to shut down automatically

The communication flow is

client ----instructions---> manager
manager -----requests-----> backend
backend <-----WS Protocol-----> server
backend -----PubSubItem----> manager
manager ---sub notifications---> sub channels
manager ---sub channels---> client
manager ---responses----> request future channels

this would go well in a swim lanes but I don't have time to make one rn

@prestwich
Copy link
Collaborator Author

if I understood this right, a WsClient now has two different tasks?

yeah, because reconnection requires the connection to be disposable, there must be >=2 tasks. One for the connection, and 1 for detecting and fixing connection failures.

it would be great to have some docs on the various channel fields because there are like 4 different channels used in this setup.

added comments to all struct fields

this now auto-reconnects on any error error caused by ws connection?

yes, up to a fixed number of retries. It also includes a keepalive loop when not running in wasm

@prestwich prestwich force-pushed the prestwich/ws-reconnect branch from 209d37b to 128c54a Compare March 1, 2023 00:06
@prestwich prestwich force-pushed the prestwich/ws-reconnect branch from 128c54a to 669e4fb Compare March 1, 2023 00:32
Copy link
Owner

@gakonst gakonst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattsse merging cautiously; if anybody experiences issues they can use the legacy-ws feature. This is a bit hard to unit test further, open to ideas here.

#[cfg(not(target_arch = "wasm32"))]
tokio::spawn(fut);
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could add a reconnects(..) builder method to override the default, but can do in followup

@gakonst gakonst merged commit 73636a9 into master Mar 1, 2023
@gakonst gakonst deleted the prestwich/ws-reconnect branch March 1, 2023 01:26
@shrimpynuts
Copy link

@prestwich Is it reasonable to try to keep a subscription through WsClient alive indefinitely?

If yes, would the best way be to just instantiate a new WsClient every time the RequestManger reaches the reconnect limit?

@prestwich
Copy link
Collaborator Author

Is it reasonable to try to keep a subscription through WsClient alive indefinitely?

Yes it's pretty reasonable

If yes, would the best way be to just instantiate a new WsClient every time the RequestManger reaches the reconnect limit?

You should just set an arbitrarily high reconnection count :) u64::max should be fine

jqphu pushed a commit to refract-team/ethers-rs that referenced this pull request Mar 28, 2023
* wip: ws2

* ws2 backend compiles

* refactor: rename PubSubItem and BackendDriver

* feature: dispatch request to end subscription

* refactor: move ws2 to ws, fix reconnection and deser on subs

* chore: improve use of tracing in manager

* refactor: feature legacy_ws to enable backwards compatibility

* nit: mod file ordering

* docs: copy PR description to ws structs

* fixes: remove unused macros file, remove err formats

* docs: add comments to struct fields

* docs: comment client struct fields

* chore: changelog

* fix: unused imports in ws_errors test

* docs: missing comment

Co-authored-by: Georgios Konstantopoulos <[email protected]>

* fix: legacy-ws feature in root crate, hyphen not underscore

* fix: a couple bad imports/exports

---------

Co-authored-by: Georgios Konstantopoulos <[email protected]>
@wminshew
Copy link
Contributor

reconnection may cause dropped events in subscriptions, as the new sub may resume at a later height. This seems unavoidable? 🤔

dropping events seems fine to me but out of curiosity why is it unavoidable? couldn't SubscriptionManager keep note of the most recent block each sub has received an event from & poll from there on re-connect?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants