Set promise on exceptions in `dispatch_method_once` #8519

ballard26 · 2023-01-31T03:24:41Z

Changes the logic of dispatch_method_once to avoid a broken promise when the connection is closed before the input is skipped for unknown methods and unknown versions. This ensures that we are always setting the body_parse promise after reading in all the input and before we send a reply.

Fixes #8518, Fixes #8074

Backports Required

UX Changes

Release Notes

none

andrwng · 2023-01-31T06:28:20Z

src/v/rpc/rpc_server.cc

@@ -104,7 +104,13 @@ rpc_server::send_reply(ss::lw_shared_ptr<server_context_impl> ctx, netbuf buf) {

 ss::future<> rpc_server::send_reply_skip_payload(
  ss::lw_shared_ptr<server_context_impl> ctx, netbuf buf) {
-    co_await ctx->conn->input().skip(ctx->get_header().payload_size);
+    try {
+        co_await ctx->conn->input().skip(ctx->get_header().payload_size);


nit: could you add a comment that we might expect an exception here if the connection is closed? Otherwise it's not immediately obvious why this is necessary

Good catch, will add

i think it might make sense to handle this back in rpc_server::dispatch_method_once ?

andrwng · 2023-01-31T06:36:04Z

src/v/rpc/rpc_server.cc

+        ctx->body_parse_exception(std::current_exception());
+        co_return;
+    }
+    ctx->signal_body_parse();


Previously this was signaled after the send_reply() finished. I imagine this reordering means that we allow waiters to proceed before responding to clients. Is that important to this fix?

Not particularly important to the fix. And yeah, it would allow for waiters to continue before the reply is sent. In this case it would allow for more requests to be read from the connection before a reply is sent.

We currently have three code paths in dispatch_method_once

For unsupported version

For unknown methods

For known methods

For 1 & 2 we currently only signal waiters after a reply is sent.
For 3 though we signal the waiters right after consuming & parsing all input for the request. Long before calling the service method and sending the reply.

The goal of this change was to have 1 & 2 match the behavior of 3. Though there may be a case for them to be different. If you think it's out of scope for this PR though I can start a separate PR for it.

I see, thanks for the explanation! It makes sense that we'd be able to / want to signal early. I'm also not sure if this was originally implemented intentionally, so it might be worth poking others familiar in the area, but the rationale for standardizing on 3 makes sense to me

CC: @dotnwat

I see, thanks for the explanation! It makes sense that we'd be able to / want to signal early.

i agree with @andrwng that it makes sense, and i think you are doing the right thing here @ballard26.

would allow for waiters to continue before the reply is sent.

is this optimization the only reason for changing the order of operations?

I'm also not sure if this was originally implemented intentionally,

however, i think we need to be very conservative in the rpc server because we've had countless issues with race conditions and haven't had time to revamp/rearch rpc. so anything that potentially alters the scheduling is a red flag. in this case it seems completely benign, but its also a rare error case so the optmiization doesn't seem worth the cost of working through managing the concerns?

Fair point about the issues in the past with race conditions. I'll go ahead and revert things to handle the exception with a then_wrapped in the rpc_server::dispatch_method_once in order to preserve the scheduling order. The optimization I made does seem outside the scope of a PR to fix an issue.

dotnwat · 2023-01-31T23:07:54Z

src/v/rpc/rpc_server.cc

+        // If the connection is closed then this can throw an exception.
+        // In that case we want to catch and forward it to avoid a broken
+        // promise.
+        co_await ctx->conn->input().skip(ctx->get_header().payload_size);


i think it might make sense to handle this back in rpc_server::dispatch_method_once ?

I made a comment here https://github.com/redpanda-data/redpanda/pull/8519/files#r1091520039 about why I'm handling it here. Basically if I handle it in rpc_server::dispatch_method_once with a finally or similar I can only set the promise after the reply is sent or an exception is thrown. Which is why I moved the handling here. Happy to move the handling to rpc_server::dispatch_method_once if desired though.

dotnwat · 2023-02-01T00:58:04Z

src/v/rpc/service.h

+              // `permanent_memory_reservation` will return an exception.
+              // We intercept it here to avoid a broken promise.
+              ctx.body_parse_exception(e);
+              std::rethrow_exception(e);


nit: it is more idiomatic here outside a co-routine context to return ss::make_exception_future<>(e). the reason is that seastar will end up having to catch this again and repackage it into an exceptional future.

Good point, switching to it now.

Changes the logic of `dispatch_method_once` to avoid a broken promise when the connection is closed before the input is skipped for unknown methods and unknown versions. This ensures that we are always setting the body_parse promise after reading in all the input and before we send a reply.

dotnwat

nice!

dotnwat · 2023-02-01T06:37:25Z

restarted ci. seems like some transient issue in ci.

dotnwat · 2023-02-01T16:00:22Z

ok restarted again. seems the ci fix has landed now

ballard26 · 2023-02-03T21:35:55Z

In ducktape-build-release-clang-amd64-1-0 the failure appears to be #8589
In ducktape-build-debug-clang-amd64-1-0 the failure is #8621

vshtokman · 2023-02-09T17:35:21Z

/backport v22.3.x

vbotbuildovich · 2023-02-09T17:36:12Z

Failed to run cherry-pick command. I executed the below command:

git cherry-pick -x e47ab0dccb51a1a07e351219c94eb30575d42d82

Workflow run logs.

vshtokman · 2023-02-28T15:13:57Z

@ballard26 , could you look into backporting this when you have a chance?

vshtokman · 2023-04-28T17:26:33Z

/backport v22.3.x

ballard26 requested review from dotnwat and andrwng January 31, 2023 03:24

github-actions bot added the area/redpanda label Jan 31, 2023

piyushredpanda requested a review from michael-redpanda January 31, 2023 05:28

andrwng reviewed Jan 31, 2023

View reviewed changes

ballard26 force-pushed the broken-promise-rpc branch from 2eb18d8 to 9c53bc3 Compare January 31, 2023 07:03

ballard26 requested a review from andrwng January 31, 2023 07:04

andrwng previously approved these changes Jan 31, 2023

View reviewed changes

michael-redpanda previously approved these changes Jan 31, 2023

View reviewed changes

ballard26 dismissed stale reviews from michael-redpanda and andrwng via 2259fea January 31, 2023 22:53

ballard26 force-pushed the broken-promise-rpc branch from 9c53bc3 to 2259fea Compare January 31, 2023 22:53

ballard26 requested review from andrwng and michael-redpanda January 31, 2023 22:54

dotnwat reviewed Jan 31, 2023

View reviewed changes

ballard26 requested a review from dotnwat January 31, 2023 23:25

ballard26 force-pushed the broken-promise-rpc branch 3 times, most recently from 56edc56 to dae07f4 Compare February 1, 2023 00:27

dotnwat reviewed Feb 1, 2023

View reviewed changes

ballard26 force-pushed the broken-promise-rpc branch from dae07f4 to e47ab0d Compare February 1, 2023 01:49

ballard26 requested a review from dotnwat February 1, 2023 01:55

dotnwat approved these changes Feb 1, 2023

View reviewed changes

andrwng approved these changes Feb 1, 2023

View reviewed changes

michael-redpanda approved these changes Feb 1, 2023

View reviewed changes

dotnwat merged commit db61371 into redpanda-data:dev Feb 7, 2023

This was referenced Apr 28, 2023

[v22.3.x] Broken promise (via BadLogLines) in ManyPartitionsTest.test_many_partitions #10460

Closed

[v22.3.x] Set promise on exceptions in dispatch_method_once #10461

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set promise on exceptions in `dispatch_method_once` #8519

Set promise on exceptions in `dispatch_method_once` #8519

ballard26 commented Jan 31, 2023 •

edited

Loading

andrwng Jan 31, 2023

ballard26 Jan 31, 2023

dotnwat Jan 31, 2023

andrwng Jan 31, 2023

ballard26 Jan 31, 2023 •

edited

Loading

andrwng Jan 31, 2023 •

edited

Loading

ballard26 Jan 31, 2023

dotnwat Jan 31, 2023 •

edited

Loading

ballard26 Feb 1, 2023 •

edited

Loading

dotnwat Jan 31, 2023

ballard26 Jan 31, 2023 •

edited

Loading

dotnwat Feb 1, 2023

ballard26 Feb 1, 2023

dotnwat left a comment

dotnwat commented Feb 1, 2023

dotnwat commented Feb 1, 2023

ballard26 commented Feb 3, 2023 •

edited

Loading

vshtokman commented Feb 9, 2023

vbotbuildovich commented Feb 9, 2023

vshtokman commented Feb 28, 2023

vshtokman commented Apr 28, 2023

Set promise on exceptions in dispatch_method_once #8519

Set promise on exceptions in dispatch_method_once #8519

Conversation

ballard26 commented Jan 31, 2023 • edited Loading

Backports Required

UX Changes

Release Notes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ballard26 Jan 31, 2023 • edited Loading

Choose a reason for hiding this comment

andrwng Jan 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dotnwat Jan 31, 2023 • edited Loading

Choose a reason for hiding this comment

ballard26 Feb 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ballard26 Jan 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dotnwat left a comment

Choose a reason for hiding this comment

dotnwat commented Feb 1, 2023

dotnwat commented Feb 1, 2023

ballard26 commented Feb 3, 2023 • edited Loading

vshtokman commented Feb 9, 2023

vbotbuildovich commented Feb 9, 2023

vshtokman commented Feb 28, 2023

vshtokman commented Apr 28, 2023

Set promise on exceptions in `dispatch_method_once` #8519

Set promise on exceptions in `dispatch_method_once` #8519

ballard26 commented Jan 31, 2023 •

edited

Loading

ballard26 Jan 31, 2023 •

edited

Loading

andrwng Jan 31, 2023 •

edited

Loading

dotnwat Jan 31, 2023 •

edited

Loading

ballard26 Feb 1, 2023 •

edited

Loading

ballard26 Jan 31, 2023 •

edited

Loading

ballard26 commented Feb 3, 2023 •

edited

Loading