-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set promise on exceptions in dispatch_method_once
#8519
Conversation
src/v/rpc/rpc_server.cc
Outdated
@@ -104,7 +104,13 @@ rpc_server::send_reply(ss::lw_shared_ptr<server_context_impl> ctx, netbuf buf) { | |||
|
|||
ss::future<> rpc_server::send_reply_skip_payload( | |||
ss::lw_shared_ptr<server_context_impl> ctx, netbuf buf) { | |||
co_await ctx->conn->input().skip(ctx->get_header().payload_size); | |||
try { | |||
co_await ctx->conn->input().skip(ctx->get_header().payload_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could you add a comment that we might expect an exception here if the connection is closed? Otherwise it's not immediately obvious why this is necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, will add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it might make sense to handle this back in rpc_server::dispatch_method_once
?
src/v/rpc/rpc_server.cc
Outdated
ctx->body_parse_exception(std::current_exception()); | ||
co_return; | ||
} | ||
ctx->signal_body_parse(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this was signaled after the send_reply()
finished. I imagine this reordering means that we allow waiters to proceed before responding to clients. Is that important to this fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not particularly important to the fix. And yeah, it would allow for waiters to continue before the reply is sent. In this case it would allow for more requests to be read from the connection before a reply is sent.
We currently have three code paths in dispatch_method_once
- For unsupported version
- For unknown methods
- For known methods
For 1 & 2 we currently only signal waiters after a reply is sent.
For 3 though we signal the waiters right after consuming & parsing all input for the request. Long before calling the service method and sending the reply.
The goal of this change was to have 1 & 2 match the behavior of 3. Though there may be a case for them to be different. If you think it's out of scope for this PR though I can start a separate PR for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the explanation! It makes sense that we'd be able to / want to signal early. I'm also not sure if this was originally implemented intentionally, so it might be worth poking others familiar in the area, but the rationale for standardizing on 3 makes sense to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC: @dotnwat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the explanation! It makes sense that we'd be able to / want to signal early.
i agree with @andrwng that it makes sense, and i think you are doing the right thing here @ballard26.
would allow for waiters to continue before the reply is sent.
is this optimization the only reason for changing the order of operations?
I'm also not sure if this was originally implemented intentionally,
however, i think we need to be very conservative in the rpc server because we've had countless issues with race conditions and haven't had time to revamp/rearch rpc. so anything that potentially alters the scheduling is a red flag. in this case it seems completely benign, but its also a rare error case so the optmiization doesn't seem worth the cost of working through managing the concerns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point about the issues in the past with race conditions. I'll go ahead and revert things to handle the exception with a then_wrapped
in the rpc_server::dispatch_method_once
in order to preserve the scheduling order. The optimization I made does seem outside the scope of a PR to fix an issue.
2eb18d8
to
9c53bc3
Compare
2259fea
9c53bc3
to
2259fea
Compare
src/v/rpc/rpc_server.cc
Outdated
// If the connection is closed then this can throw an exception. | ||
// In that case we want to catch and forward it to avoid a broken | ||
// promise. | ||
co_await ctx->conn->input().skip(ctx->get_header().payload_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it might make sense to handle this back in rpc_server::dispatch_method_once ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a comment here https://github.com/redpanda-data/redpanda/pull/8519/files#r1091520039 about why I'm handling it here. Basically if I handle it in rpc_server::dispatch_method_once
with a finally
or similar I can only set the promise after the reply is sent or an exception is thrown. Which is why I moved the handling here. Happy to move the handling to rpc_server::dispatch_method_once
if desired though.
56edc56
to
dae07f4
Compare
src/v/rpc/service.h
Outdated
// `permanent_memory_reservation` will return an exception. | ||
// We intercept it here to avoid a broken promise. | ||
ctx.body_parse_exception(e); | ||
std::rethrow_exception(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it is more idiomatic here outside a co-routine context to return ss::make_exception_future<>(e)
. the reason is that seastar will end up having to catch this again and repackage it into an exceptional future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, switching to it now.
Changes the logic of `dispatch_method_once` to avoid a broken promise when the connection is closed before the input is skipped for unknown methods and unknown versions. This ensures that we are always setting the body_parse promise after reading in all the input and before we send a reply.
dae07f4
to
e47ab0d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
restarted ci. seems like some transient issue in ci. |
ok restarted again. seems the ci fix has landed now |
/backport v22.3.x |
Failed to run cherry-pick command. I executed the below command:
|
@ballard26 , could you look into backporting this when you have a chance? |
/backport v22.3.x |
Changes the logic of
dispatch_method_once
to avoid a broken promise when the connection is closed before the input is skipped for unknown methods and unknown versions. This ensures that we are always setting the body_parse promise after reading in all the input and before we send a reply.Fixes #8518, Fixes #8074
Backports Required
UX Changes
Release Notes