Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to renew the connection upon transient error #304

Closed
wants to merge 1 commit into from

Conversation

da-kami
Copy link
Member

@da-kami da-kami commented Mar 11, 2021

@thomaseizinger this is an attempt to fix #303

I don't think this solution would work yet but I am not exactly sure how to rearrange the code to make it work.
I thought I put a Draft PR up :)

Currently clippy complains about:

warning: value assigned to `stream` is never read
  --> swap/src/kraken.rs:46:41
   |
46 | ...                   stream = connection::new().await?;
   |                       ^^^^^^
   |
   = note: `#[warn(unused_assignments)]` on by default
   = help: maybe it is overwritten before being read?

@da-kami da-kami requested a review from thomaseizinger March 11, 2021 01:56
match e {
// Connection closures and websocket errors will be retried
connection::Error::ConnectionClosed => {
// Try to renew the connection in case of websocket failure
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Try to renew the connection in case of websocket failure
// Try to renew the connection in case of connection failure

@thomaseizinger
Copy link
Contributor

I don't think this solution would work yet but I am not exactly sure how to rearrange the code to make it work.
I thought I put a Draft PR up :)

Currently clippy complains about:

Well clippy is right. You are over-writing the local stream variable and then returning from the closure before ever using stream again. That doesn't do anything.

Generally, I don't see a functional change in this PR. Unless I am mistaken, you just replaced the map_err with a manual match statement but the code path is still the same I think.

I am wondering why the current code doesn't work because connection::Error::Websocket errors should always be mapped to Transient. Is there maybe a codepath in backoff where it completely gives up even with Transient errors?

I am also wondering why we never see the info log about retrying.

@thomaseizinger
Copy link
Contributor

I am wondering why the current code doesn't work because connection::Error::Websocket errors should always be mapped to Transient. Is there maybe a codepath in backoff where it completely gives up even with Transient errors?

Reading backoff's source code, I think I found the problem and posted up a fix here: #305

@da-kami
Copy link
Member Author

da-kami commented Mar 11, 2021

I don't think this solution would work yet but I am not exactly sure how to rearrange the code to make it work.
I thought I put a Draft PR up :)
Currently clippy complains about:

Well clippy is right. You are over-writing the local stream variable and then returning from the closure before ever using stream again. That doesn't do anything.

Generally, I don't see a functional change in this PR. Unless I am mistaken, you just replaced the map_err with a manual match statement but the code path is still the same I think.

I am wondering why the current code doesn't work because connection::Error::Websocket errors should always be mapped to Transient. Is there maybe a codepath in backoff where it completely gives up even with Transient errors?

I am also wondering why we never see the info log about retrying.

😬 Yeah, my solution does not really do much.
Still wondering if the current code makes sense without trying to re-connect. But maybe I just lack the understanding of websocket connections there.

We currently map Connection and WebSocket errors to be Transient:

/// Maps a [`connection::Error`] to a backoff error, effectively defining our
/// retry strategy.
fn to_backoff(e: connection::Error) -> backoff::Error<anyhow::Error> {
use backoff::Error::*;
match e {
// Connection closures and websocket errors will be retried
connection::Error::ConnectionClosed => Transient(anyhow::Error::from(e)),
connection::Error::WebSocket(_) => Transient(anyhow::Error::from(e)),
// Failures while parsing a message are permanent because they most likely present a
// programmer error
connection::Error::Parse(_) => Permanent(anyhow::Error::from(e)),
}
}

But we never re-establish a connection upon those transient error, but use the stream as it was before:

async move {
let mut stream = connection::new().await?;
while let Some(update) = stream.try_next().await.map_err(to_backoff)? {
let send_result = rate_update.send(Ok(update));
if send_result.is_err() {
return Err(backoff::Error::Permanent(anyhow!(
"receiver disconnected"
)));
}
}
Err(backoff::Error::Transient(anyhow!("stream ended")))
}

Wouldn't it make sense to re-connect on the stream if we get a Connection or WebWocket error or is that just not how websocket connections work?

@thomaseizinger
Copy link
Contributor

But we never re-establish a connection upon those transient error, but use the stream as it was before:

That is not true.

All of the above code is within the retry closure. The very first thing that happens is establishing a connection. If we fail, we ? which bails out of the closure with an Err that is either backoff::Transient or backoff::Permanent and defines whether retry will call the closure again.

@da-kami
Copy link
Member Author

da-kami commented Mar 11, 2021

But we never re-establish a connection upon those transient error, but use the stream as it was before:

That is not true.

All of the above code is within the retry closure. The very first thing that happens is establishing a connection. If we fail, we ? which bails out of the closure with an Err that is either backoff::Transient or backoff::Permanent and defines whether retry will call the closure again.

Yeah, you're right. The backoff block actually wraps the complete async block... Forget what I said 😅

@da-kami da-kami closed this Mar 11, 2021
@thomaseizinger thomaseizinger deleted the renew-connection-on-transient-errors branch September 3, 2021 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kraken websocket stream closes connection unexpectedly
2 participants