kafka: Abort fetch and list_offsets when client disconnects #12021

BenPope · 2023-07-11T12:54:49Z

kafka: Abort fetch and list_offsets (timequery) when the client disconnects for timely resource cleanup.

Add an abort_source to connection_context
- This needs to be sharded as some requests span arbitrary shards
Abort the abort_source when the socket is disconnected
Wire up the abort_source to the request_context and into:
- fetch
- list_offsets (timequery)

The primary goal is to improve the situation for #11799

Testing can be done with something like:

kcat -G foo -X fetch.wait.max.ms=60000

And the kill the client either gracefully or will kill -9.

Backports Required

Release Notes

Improvements

kafka: Abort fetch and list_offsets (timequery) when the client disconnects for timely resource cleanup.

BenPope · 2023-07-11T18:15:00Z

/cdt tests/rptest/scale_tests/tiered_storage_reader_stress_test.py

BenPope · 2023-07-13T09:18:21Z

/cdt tests/rptest/scale_tests/tiered_storage_reader_stress_test.py

BenPope · 2023-07-13T15:57:28Z

Changes in force push:

When wait_for_input_shutdown, use an exception type that satisfies net::is_disconnect_exception
The last commit

emaxerrno · 2023-07-13T18:18:35Z

src/v/kafka/server/handlers/fetch.cc

@@ -94,7 +94,9 @@ static ss::future<read_result> read_from_partition(
      kafka_read_priority(),
      std::nullopt,
      std::nullopt,
-      std::nullopt);
+      config.abort_source.has_value()


why would config not always have one

why would config not always have one

A good question. Ultimately when all the places and the tests are wired up, it should not be optional.

emaxerrno · 2023-07-13T18:20:32Z

src/v/ssx/abort_source.h

+
+namespace ssx {
+
+class sharded_abort_source {


surprised this isn't already in seastar.

application and controller both have a manual version of this, that might be worth replacing.

emaxerrno · 2023-07-13T18:21:16Z

src/v/storage/types.cc

@@ -60,6 +60,11 @@ std::ostream& operator<<(std::ostream& o, const log_reader_config& cfg) {
    o << ", over_budget:" << cfg.over_budget;
    o << ", strict_max_bytes:" << cfg.strict_max_bytes;
    o << ", skip_batch_cache:" << cfg.skip_batch_cache;
+    o << ", abortable:" << cfg.abort_source.has_value();
+    o << ", aborted:"
+      << (cfg.abort_source.has_value()


right unclear to me why we'd ever have a config w/out an abort source. doesn't make sense to be optional, no?

There are unit/fixture tests that don't wire up the correct lifecycle, that should be fixed of course, but the optionality already existed.

michael-redpanda

👍

VladLazar

You'll have to include the other two commits from my branch: 116eb35 and 2b8df02 in order to fix the remote partition assertion from triggering on abort.

michael-redpanda

lgtm

piyushredpanda · 2023-07-19T02:55:27Z

/ci-repeat 1

BenPope · 2023-07-24T13:14:01Z

Rebased due to conflict in: #12304

BenPope · 2023-08-01T03:47:21Z

The failing tests are:

Module: rptest.tests.adjacent_segment_merging_test
Class:  AdjacentSegmentMergingTest
Method: test_reupload_of_local_segments
Arguments:
{
  "acks": 0,
  "cloud_storage_type": whatevs
}

Which must be handled before this is merged. I can't reproduce it locally. The other acks are fine.

andrwng · 2023-08-01T19:46:54Z

The failing tests are:
Module: rptest.tests.adjacent_segment_merging_test
Class:  AdjacentSegmentMergingTest
Method: test_reupload_of_local_segments
Arguments:
{
  "acks": 0,
  "cloud_storage_type": whatevs
}
Which must be handled before this is merged. I can't reproduce it locally. The other acks are fine.

I reopened #8457 and noted what looks to be happening. It doesn't look related to this PR (with acks=0 we end up producing less than required to reupload).

`wait_for_input_shutdown` allows detection of disconnection of the peer via TCP keepalive. Signed-off-by: Ben Pope <[email protected]>

The `sharded_abort_source` is subscribed to the `net::server::abort_source`, so that it is aborted when the server is aborted, but can also be manually aborted earlier. Some requests such as fetch require results from several partitions, which live on disparate cores; a sharded abort source enables that. Signed-off-by: Ben Pope <[email protected]>

Abort the `abort_source `when a peer disconnects. Use an exception type that satisfies `net::is_disconnect_exception` Signed-off-by: Ben Pope <[email protected]>

Expose the `connection_context::abort_source` through an interface in case a new lifetime needs to exist on the request at a future time. Signed-off-by: Ben Pope <[email protected]>

Signed-off-by: Ben Pope <[email protected]>

`remote_partition` can in theory receive an abort at any time. The most common case would be a client disconnect. Previously, this was handled by returning the reader to the `materialized_segment_state`. This is dangerous since the abort may catch us at a bad time and return a reader in an undefined state. We've seen this cause issues when the reader gets reused.

An instance of partition_record_batch_reader_impl can be aborted by two abort sources: 1. The abort source threaded in from the Kafka layer via `log_reader_config` 2. The abort source of `remote_partition` itself. In both cases, the reader should exit. This patch achieves this by monitoring both abort sources in the read loop and inserting strategical checks between scheduling points. This approach is not ideal, but `partition_record_batch_reader_impl` and `remote_segment_batch_reader` are very closely coupled which makes it difficult to approach things differently.

BenPope · 2023-08-03T10:11:40Z

Rebased due to minor conflict in includes in fetch.h due to #12460

BenPope · 2023-08-03T13:28:14Z

CI Failures:

StephanDollberg · 2023-08-04T10:59:20Z

src/v/cloud_storage/remote_partition.cc

@@ -342,6 +355,8 @@ class partition_record_batch_reader_impl final
                try {
                    auto result = co_await _reader->read_some(
                      deadline, *_ot_state);
+                    throw_on_external_abort();


Wouldn't we really want to pass the abort source further down to read_some? Otherwise if we get stuck on any of the reads we will never reach this?

Not saying this isn't an improvement but more interested in a general discussion as I am currently looking into a different case where we "leak" background futures that are stuck on reads/writes under load. It feels like we need more abort_sources in general and also need to make sure we pass them all the way down.

That makes sense to me, I had assumed the remote_segment_batch_consumer took note of its abort_source, but it does not appear to.

Maybe something like this:

diff --git a/src/v/cloud_storage/remote_segment.cc b/src/v/cloud_storage/remote_segment.cc index 87b3d9939b..c9e3fec90b 100644 --- a/src/v/cloud_storage/remote_segment.cc +++ b/src/v/cloud_storage/remote_segment.cc @@ -1270,6 +1270,12 @@ public: co_return stop_parser::yes; } + if ( + _config.abort_source.has_value() + && _config.abort_source.value().get().abort_requested()) { + co_return stop_parser::yes; + } + co_return stop_parser::no; }

This PR adds a connection-level abort_source, it's possible that a request-level abort_source would also be useful, which could chain onto the connection-level one.

Most of the time the abort_source will be triggered when the underlying remote_segment_batch_readfer will be waiting inside the remote_segment::hydrate call and it won't be able to quit early. The latency there is in the ballpark of 1s. So I agree that it should be handled one level below.

I think it's actually desirable to finish the hydration even if the request got aborted. In all likelihood, the client will retry, so if we cancel the hydration we'd just be throwing away work.

I totally agree. The hydration should continue. But inside the read_some call there is a call to hydrate which will block the fiber from being aborted. I'm not sure moving the reader to the evicted list will solve this. The reader will move but the fiber that actually reads data for the fetch request will be stuck until the segment is downloaded. The hydrate method itself is actually just waiting on a future. The hydration is done in the background so I think it should be possible to abort it without canceling hydration. The background hydration loop should be prepared to this and handle broken_promise exception.

Maybe for this PR it's fine. But we should consider propagating abort source further.

Maybe since you're destroying the reader it will get unstuck.

StephanDollberg · 2023-08-07T09:35:08Z

src/v/kafka/server/connection_context.h

@@ -349,6 +380,7 @@ class connection_context final
    sequence_id _next_response;
    sequence_id _seq_idx;
    map_t _responses;
+    ssx::sharded_abort_source _as;


Does having a ss::sharded object per connection context make the connection context/connection creation quite expensive as it forces lots of cross cpu talk?

Not sure it matters but just wondering.

To some extent, yes.

Cross-core calls occur:

start()

stop()

request_abort()

The trade-off is that every core has a local abort_source to query for:

abort_requested()

check()

subscribe()

It might be possible to create the per-core abort_source lazily:

Futurize the call to local()

If the local one is constructed, return it

Else cross-core call to request to construct it on the owning core

Maybe lock a mutex

Cross-core call (back to local) to actually construct it

It's a fair bit of complexity, and makes the interface more awkward.

Lazin

LGTM overall. I'm not sure about shared abort source since it will create an instance on every shard + it will have to interact with all those fibers per fetch request.

BenPope · 2023-08-07T18:19:58Z

LGTM overall. I'm not sure about shared abort source since it will create an instance on every shard + it will have to interact with all those fibers per fetch request.

Aside from starting and stopping the connection, it only interacts cross-core when aborting. If a fetch fetches from multiple partitions, on different cores, they have a local instance to query. On the data path, there's almost nothing to do, other than check a local bool to see if the abort_source has been aborted.

vbotbuildovich · 2023-08-08T05:08:44Z

/backport v23.2.x

vbotbuildovich · 2023-08-08T05:08:45Z

/backport v23.1.x

vbotbuildovich · 2023-08-08T05:09:39Z

Failed to run cherry-pick command. I executed the commands below:

git checkout -b backport-pr-12021-v23.2.x-431 remotes/upstream/v23.2.x
git cherry-pick -x 6f20ed23a6f70700655b94b8d0ddd645d6804393 f88fd9b5b441d383637c85a092803f896ae254e1 090951f344e41235710e27c9f4c9e88ab7c9916d ecb5c2fd4bf6ed8d93c1798afaf4d0620ed8db9f 3b5da87d2b85a30856d0f71ce5798fcd91382c1b c79a582bf57045b019906de8a969b638489c3b5b 28bd70b0ad4282a960c9ffd72392ce405c953713 7318818322c6ee60125c30f7ce5e39b1a45079ac 7e18eeb3c339c58880da204d4fe96c914e7f3c55 40b6fc348e4a9178ede0e5ccdcf1f285fb3ace33 99920be7aeadbdae39dd4df3e6a9ae30d368da01 bb46b94635e6cd2ff6973a7c7ffcfcb635dd6686

Workflow run logs.

vbotbuildovich · 2023-08-08T05:09:41Z

Failed to run cherry-pick command. I executed the commands below:

git checkout -b backport-pr-12021-v23.1.x-730 remotes/upstream/v23.1.x
git cherry-pick -x 6f20ed23a6f70700655b94b8d0ddd645d6804393 f88fd9b5b441d383637c85a092803f896ae254e1 090951f344e41235710e27c9f4c9e88ab7c9916d ecb5c2fd4bf6ed8d93c1798afaf4d0620ed8db9f 3b5da87d2b85a30856d0f71ce5798fcd91382c1b c79a582bf57045b019906de8a969b638489c3b5b 28bd70b0ad4282a960c9ffd72392ce405c953713 7318818322c6ee60125c30f7ce5e39b1a45079ac 7e18eeb3c339c58880da204d4fe96c914e7f3c55 40b6fc348e4a9178ede0e5ccdcf1f285fb3ace33 99920be7aeadbdae39dd4df3e6a9ae30d368da01 bb46b94635e6cd2ff6973a7c7ffcfcb635dd6686

Workflow run logs.

dotnwat · 2023-08-14T23:44:14Z

src/v/kafka/server/server.cc

+    std::exception_ptr eptr;
    try {
+        co_await ctx->start();
        co_await ctx->process();
    } catch (...) {
-        auto eptr = std::current_exception();
+        eptr = std::current_exception();
+    }
+    if (!eptr) {
+        co_return co_await ctx->stop();


@rockwotj did you tell me about an upstream seastar utility for dealing with this pattern?

ss::coroutine::as_future

auto fut = co_await ss::coroutine::as_future(ctx->start()) if (!fut.failed()) { fut = co_await ss::coro::as_future(ctx->process()); } if (fut.failed()) { co_return co_await ctx->stop(); }

ahhh right, thanks for the reminder.

dotnwat

good stuff

BenPope self-assigned this Jul 11, 2023

github-actions bot added the area/redpanda label Jul 11, 2023

BenPope mentioned this pull request Jul 12, 2023

cloud_storage: cancel/limit reader fibers when clients timeout RPCs #11799

Open

BenPope requested review from jcsp, dotnwat and michael-redpanda July 12, 2023 09:02

BenPope marked this pull request as ready for review July 12, 2023 23:46

BenPope force-pushed the kafka_abort_source branch from 69f7e08 to 4e04139 Compare July 13, 2023 04:52

BenPope force-pushed the kafka_abort_source branch from 4e04139 to f805dcc Compare July 13, 2023 15:54

emaxerrno reviewed Jul 13, 2023

View reviewed changes

michael-redpanda previously approved these changes Jul 13, 2023

View reviewed changes

VladLazar reviewed Jul 14, 2023

View reviewed changes

BenPope dismissed michael-redpanda’s stale review via 7fc2cab July 14, 2023 12:19

BenPope force-pushed the kafka_abort_source branch from f805dcc to 7fc2cab Compare July 14, 2023 12:19

michael-redpanda previously approved these changes Jul 14, 2023

View reviewed changes

BenPope dismissed michael-redpanda’s stale review via 988c77c July 24, 2023 13:13

BenPope force-pushed the kafka_abort_source branch from 7fc2cab to 988c77c Compare July 24, 2023 13:13

BenPope requested review from michael-redpanda and VladLazar and removed request for jcsp July 24, 2023 13:16

net/connection: Expose _fd.wait_for_input_shutdown

6f20ed2

`wait_for_input_shutdown` allows detection of disconnection of the peer via TCP keepalive. Signed-off-by: Ben Pope <[email protected]>

BenPope and others added 10 commits August 3, 2023 10:32

k/s: Wire connection::wait_for_input_shutdown to abort

ecb5c2f

Abort the `abort_source `when a peer disconnects. Use an exception type that satisfies `net::is_disconnect_exception` Signed-off-by: Ben Pope <[email protected]>

kafka/request_context: Expose the connection_context::abort_source

3b5da87

Expose the `connection_context::abort_source` through an interface in case a new lifetime needs to exist on the request at a future time. Signed-off-by: Ben Pope <[email protected]>

kafka/h/fetch: Stop fetching when abort_requested

c79a582

Signed-off-by: Ben Pope <[email protected]>

storage: Improve log_reader_config logging

28bd70b

Signed-off-by: Ben Pope <[email protected]>

kafka/h/fetch: Wire up abort_source to log_reader_config

7318818

Signed-off-by: Ben Pope <[email protected]>

kafka/h/list_offsets: Wire up abort_source to timequery_config

7e18eeb

Signed-off-by: Ben Pope <[email protected]>

cloud_storage: add missing gate to segment reader

40b6fc3

BenPope force-pushed the kafka_abort_source branch from 988c77c to bb46b94 Compare August 3, 2023 10:11

StephanDollberg reviewed Aug 4, 2023

View reviewed changes

BenPope requested a review from StephanDollberg August 7, 2023 09:19

StephanDollberg approved these changes Aug 7, 2023

View reviewed changes

Lazin approved these changes Aug 7, 2023

View reviewed changes

piyushredpanda merged commit 0393353 into redpanda-data:dev Aug 8, 2023
25 checks passed

This was referenced Aug 8, 2023

[v23.2.x] kafka: Abort fetch and list_offsets when client disconnects #12649

Closed

[v23.1.x] kafka: Abort fetch and list_offsets when client disconnects #12650

Closed

dotnwat reviewed Aug 14, 2023

View reviewed changes

This was referenced Aug 23, 2023

CI Failure (ERROR log from aborted Kafka conn) in ManyPartitionsTest.test_many_partitions #12951

Closed

[v23.2.x]] kafka: Abort fetch and list_offsets when client disconnects #12981

Merged

VladLazar mentioned this pull request Sep 8, 2023

[v23.2.x] rptest: tweak cloud storage reader stress test #13288

Merged

kafka: Abort fetch and list_offsets when client disconnects #12021

kafka: Abort fetch and list_offsets when client disconnects #12021

Conversation

BenPope commented Jul 11, 2023 • edited Loading

Backports Required

Release Notes

Improvements

BenPope commented Jul 11, 2023

BenPope commented Jul 13, 2023

BenPope commented Jul 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michael-redpanda left a comment

Choose a reason for hiding this comment

VladLazar left a comment

Choose a reason for hiding this comment

michael-redpanda left a comment

Choose a reason for hiding this comment

piyushredpanda commented Jul 19, 2023

BenPope commented Jul 24, 2023

BenPope commented Aug 1, 2023 • edited Loading

andrwng commented Aug 1, 2023

BenPope commented Aug 3, 2023

BenPope commented Aug 3, 2023

Choose a reason for hiding this comment

BenPope Aug 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenPope Aug 7, 2023 • edited Loading

Choose a reason for hiding this comment

Lazin left a comment

Choose a reason for hiding this comment

BenPope commented Aug 7, 2023 • edited Loading

vbotbuildovich commented Aug 8, 2023

vbotbuildovich commented Aug 8, 2023

vbotbuildovich commented Aug 8, 2023

vbotbuildovich commented Aug 8, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dotnwat left a comment

Choose a reason for hiding this comment

BenPope commented Jul 11, 2023 •

edited

Loading

BenPope commented Aug 1, 2023 •

edited

Loading

BenPope Aug 4, 2023 •

edited

Loading

BenPope Aug 7, 2023 •

edited

Loading

BenPope commented Aug 7, 2023 •

edited

Loading