-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(transport): retry layer #849
Conversation
crates/transport/src/layers/retry.rs
Outdated
return should_retry_json_rpc_error(&resp); | ||
} | ||
|
||
// some providers send invalid JSON RPC in the error case (no `id:u64`), but the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how very kind of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be true ONLY of http, right? as in WS we would REQUIRE the ID in order to determine which response got rate limited?
does that imply that we should resolve this in the HTTP transports by writing the ID into the resp if it's missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually i like this idea more now. basically try this as a fallback deser in the http transports, and insert the ID if it works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what we've been in using in foundry
we could rethink some of the decisions here, but overall this has been working well, for example there's an argument for removing the cups budgeting entirely and instead entirely rely on ratelimit error responses
there are a few native retry types in tower like https://tower-rs.github.io/tower/tower/retry/budget/index.html#reexport.TpsBudget
but they don't really fit our use case
crates/transport/src/layers/retry.rs
Outdated
fn should_retry_json_rpc_error(error: &ErrorPayload) -> bool { | ||
let ErrorPayload { code, message, .. } = error; | ||
// alchemy throws it this way | ||
if *code == 429 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excuse me this is insane
this is a good inclusion. the amount of stupid wrangling makes me sad |
crates/transport/src/layers/retry.rs
Outdated
return should_retry_json_rpc_error(&resp); | ||
} | ||
|
||
// some providers send invalid JSON RPC in the error case (no `id:u64`), but the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be true ONLY of http, right? as in WS we would REQUIRE the ID in order to determine which response got rate limited?
does that imply that we should resolve this in the HTTP transports by writing the ID into the resp if it's missing?
Co-authored-by: Matthias Seitz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the batch retry logic is very hard to follow, not immediately obvious what this does.
I suggest we don't do any of this at first, just to make some progress on the pr and retry the entire call
crates/transport/src/error.rs
Outdated
@@ -110,3 +111,65 @@ impl HttpError { | |||
false | |||
} | |||
} | |||
|
|||
/// Extension trait to implement methods for [`RpcError<TransportErrorKind, E>`]. | |||
pub trait RpcErrorExt { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created this to implement is_retryable_err
and backoff_hint
methods specifically on RpcError<TransportErrorKind>
and not the RpcError<E, ErrResp>
generic type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but this is only implemented for RpcError?
could we at least make this pub(crate)?
crates/transport/src/layers/retry.rs
Outdated
for r in batch_res { | ||
if r.is_error() { | ||
batch_errs.push(r); | ||
} else { | ||
batch_success.push(r.clone()); | ||
// Remove corresponding request from the batch | ||
req.remove_by_id(&r.id); | ||
|
||
if req.is_empty() { | ||
let response_packet = | ||
ResponsePacket::from(batch_success.clone()); | ||
batch_success.clear(); | ||
batch_errs.clear(); | ||
|
||
return Ok(response_packet); | ||
} | ||
} | ||
} | ||
|
||
let e = batch_errs.first().unwrap().payload.as_error().unwrap().clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this do?
batch handling looks very complex,
this also only uses the first batch_err, but there could be many
crates/transport/src/layers/retry.rs
Outdated
if !batch_success.is_empty() { | ||
// Join batch_success and batch_errs | ||
let mut batch_calls = batch_success; | ||
batch_calls.append(&mut batch_errs); | ||
return Ok(ResponsePacket::from(batch_calls)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this mess with the request ordering?
@mattsse ptal. I've made some changes to maintain the ordering of the responses corresponding to the incoming requests. Few highlights:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the batch retry logic is a bit complex and not easy to follow, imo this feature isn't super important. I'd rather make some progress on this pr and exclude this entirely, because this has been open for a while now.
it turns out batch requests don't necessarily have ordering guarantees:
https://docs.alchemy.com/reference/batch-requests#what-is-a-batch-request
so we can actually drastically simplify this, by doing a partition over success/errors and only retry.
but I'd like to do that separately and get this included without batch retry support
crates/transport/src/error.rs
Outdated
@@ -110,3 +111,65 @@ impl HttpError { | |||
false | |||
} | |||
} | |||
|
|||
/// Extension trait to implement methods for [`RpcError<TransportErrorKind, E>`]. | |||
pub trait RpcErrorExt { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but this is only implemented for RpcError?
could we at least make this pub(crate)?
crates/transport/src/layers/retry.rs
Outdated
if r.is_error() { | ||
batch_responses.insert(r.id.clone(), r.clone()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks redundant?
0583277
to
2d6f86b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Motivation
Port foundry's layer into the transport crate.
Had quite a few requests for a retry layer from users, and also, this would remove a bunch of code from the Foundry.
Closes #717
Closes #140
Solution
Port from Foundry, still WIP.
Closes #288
PR Checklist