Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC sample clients should automatically re-connect and re-subscribe if connection is dropped #38

Closed
3 tasks
linuskendall opened this issue Feb 2, 2023 · 6 comments
Assignees

Comments

@linuskendall
Copy link
Contributor

If the connection to the gRPC server is dropped the clients should automatically reconnect and resubmit their SubscriptionRequests.

This means the clients need to cache the Subscribe Request so that they can later reconnect and resubmit it if they get reconnected.

This is important because if the gRPC node goes down, we want to route to backups and we can only do that properly if the client hanldes reconnection.

  • Rust client
  • Golang client
  • Node JS client
@shuimuliang
Copy link
Contributor

Rust version:

some thoughts:
The retry logic should be at the framework level, not in application level.
keep tower crate, make a retry middleware based on it. don’t replace tower with volo so that developers can get a consistent integration experience
retry config should be flexible: a retry “budget” for allowing only a certain amount of retries over time, preventing DDoS attack to the server. Provide an easy to use policy with good defaults.

tower = { git = "https://github.com/tower-rs/tower.git", branch = "lucio/standard-policy2", version = "0.4.12", features = ["retry", "timeout"] }

like timeout middleware:

let channel = Channel::from_static("http://[::1]:50051").connect().await?;
let timeout_channel = Timeout::new(channel, Duration::from_millis(1000));
let mut client = GreeterClient::new(timeout_channel);

@shuimuliang
Copy link
Contributor

give up on tower::retry entirely
and writing our own retry service from scratch, ref git:
https://github.com/linkerd/linkerd2-proxy/tree/main/linkerd/http-retry

reason: Unable to integrate tower retry with tonic because http::Request is not Clone
discussion:
hyperium/tonic#733

@shuimuliang
Copy link
Contributor

default ExponentialBackoff interval

        let mut exp = ExponentialBackoff::default();

        for _ in 0..20 {
            dbg!(exp.current_interval);
            exp.next_backoff();
        }

        // [backoff_retry/src/lib.rs:11] exp.current_interval = 500ms
        // [backoff_retry/src/lib.rs:11] exp.current_interval = 750ms
        // [backoff_retry/src/lib.rs:11] exp.current_interval = 1.125s
        // [backoff_retry/src/lib.rs:11] exp.current_interval = 1.6875s
        // [backoff_retry/src/lib.rs:11] exp.current_interval = 2.53125s
        // [backoff_retry/src/lib.rs:11] exp.current_interval = 3.796875s
        // [backoff_retry/src/lib.rs:11] exp.current_interval = 5.6953125s
        // [backoff_retry/src/lib.rs:11] exp.current_interval = 8.54296875s
        // [backoff_retry/src/lib.rs:11] exp.current_interval = 12.814453125s
        // [backoff_retry/src/lib.rs:11] exp.current_interval = 19.221679687s
        // [backoff_retry/src/lib.rs:11] exp.current_interval = 28.83251953s
        // [backoff_retry/src/lib.rs:11] exp.current_interval = 43.248779295s
        // [backoff_retry/src/lib.rs:11] exp.current_interval = 60s
        // [backoff_retry/src/lib.rs:11] exp.current_interval = 60s

@shuimuliang
Copy link
Contributor

RetryClient implementation in Solana BigTable
https://github.com/solana-labs/solana/blob/e14d0638e78d780558d308d3124243016ad23005/storage-bigtable/src/bigtable.rs#L258

pub struct BigTable<F: FnMut(Request<()>) -> InterceptedRequestResult> {
    client: bigtable_client::BigtableClient<InterceptedService<tonic::transport::Channel, F>>,
    timeout: Option<Duration>,
}

pub struct BigTableConnection {
    access_token: Option<AccessToken>,
    channel: tonic::transport::Channel,
    timeout: Option<Duration>,
}

impl BigTableConnection {
    pub async fn new(timeout: Option<Duration>) -> Result<Self> {
        channel = tonic::transport::Channel::from_shared(format!("http://{}", endpoint)).map_err(|err| Error::InvalidUri(endpoint, err.to_string()))?.connect_lazy();
        Ok(Self{channel, timeout})
    }
    pub fn client(&self) -> {
        let client = BigtableClient::with_interceptor(self.channel.clone());
        BigTable{ client } 
    }
    pub fn put_foo(&self) -> Result<usize> {
        backoff::retry(policy, || async {
            let mut client = self.client();
            Ok(client.foo().await?)
        }
    } 

Screenshot 2023-02-09 at 15 51 00

@shuimuliang shuimuliang linked a pull request Feb 11, 2023 that will close this issue
@shuimuliang
Copy link
Contributor

client test args:

client --endpoint https://ams17.rpcpool.com:443 \
  --x-token 1000000000000000000000000004 \
  --accounts \
  --accounts-account SysvarC1ock11111111111111111111111111111111

@shuimuliang
Copy link
Contributor

test client retry:
190B7498-B4E0-4036-960F-9FCA156A2779

@shuimuliang shuimuliang removed a link to a pull request Feb 14, 2023
@shuimuliang shuimuliang linked a pull request Feb 14, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants