Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v23.2.x]] kafka: Abort fetch and list_offsets when client disconnects #12981

Merged
merged 13 commits into from
Aug 24, 2023

Conversation

BenPope
Copy link
Member

@BenPope BenPope commented Aug 24, 2023

Backport of PR #12021
Backport of PR #12955
Fixes: #12649

BenPope and others added 13 commits August 24, 2023 08:54
`wait_for_input_shutdown` allows detection of disconnection
of the peer via TCP keepalive.

Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit 6f20ed2)
Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit f88fd9b)
The `sharded_abort_source` is subscribed to the
`net::server::abort_source`, so that it is aborted when the server
is aborted, but can also be manually aborted earlier.

Some requests such as fetch require results from several partitions,
which live on disparate cores; a sharded abort source enables that.

Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit 090951f)

Conflicts:
  src/v/kafka/server/connection_context.cc (due to coroutineization on dev)
Abort the `abort_source `when a peer disconnects.

Use an exception type that satisfies `net::is_disconnect_exception`

Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit ecb5c2f)
Expose the `connection_context::abort_source` through an interface in
case a new lifetime needs to exist on the request at a future time.

Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit 3b5da87)
Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit c79a582)
Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit 28bd70b)
Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit 7318818)

Conflicts:
  src/v/kafka/server/handlers/fetch.h (includes)
`remote_partition` can in theory receive an abort at any time. The most
common case would be a client disconnect. Previously, this was handled
by returning the reader to the `materialized_segment_state`. This is
dangerous since the abort may catch us at a bad time and return a reader
in an undefined state. We've seen this cause issues when the reader gets
reused.

(cherry picked from commit 99920be)
An instance of partition_record_batch_reader_impl can be aborted by two
abort sources:
1. The abort source threaded in from the Kafka layer via
   `log_reader_config`
2. The abort source of `remote_partition` itself.

In both cases, the reader should exit. This patch achieves this by
monitoring both abort sources in the read loop and inserting strategical
checks between scheduling points. This approach is not ideal, but
`partition_record_batch_reader_impl` and `remote_segment_batch_reader`
are very closely coupled which makes it difficult to approach things
differently.

(cherry picked from commit bb46b94)
This allows us to differentiate the reason for an abort and also avoid
"expected" error logs.

Fixes redpanda-data#12722
Fixes redpanda-data#12723

Co-Authored-By: Ben Pope <[email protected]>
(cherry picked from commit 7cdb6c0)

Conflicts:
  src/v/net/connection.cc (includes)
@BenPope BenPope added the kind/backport PRs targeting a stable branch label Aug 24, 2023
@BenPope BenPope self-assigned this Aug 24, 2023
@BenPope BenPope added this to the v23.2.x-next milestone Aug 24, 2023
@BenPope
Copy link
Member Author

BenPope commented Aug 24, 2023

/cdt
rp_version=build

@piyushredpanda piyushredpanda modified the milestones: v23.2.x-next, v23.2.7 Aug 24, 2023
@BenPope BenPope merged commit 3ad3652 into redpanda-data:v23.2.x Aug 24, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda kind/backport PRs targeting a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants