Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backport] [v23.1.x] tx/compaction: ensure last batch in a segment is not compacted way #13687

Merged
merged 5 commits into from
Sep 28, 2023

Conversation

bharathv
Copy link
Contributor

Backport of #13609

With this PR, control batches are not discarded but are compacted in the same manner as other data batches. This approach prevents the elimination of the last batch within a segment when it happens to be a control batch. If the final batch in a segment is discarded, and there are no subsequent data batches, it could lead to clients being unable to advance their consumed offset until the LSO, creating the perception of a stuck consumer unable to make progress.

However, this situation is not problematic when the last batch is an aborted data batch. This is because we can guarantee the presence of user consumable batches following it (in the next segment), specifically the abort control batch (or maybe other interleaved committed batches), ensuring continuous consumer progress.

Fixes #13639

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

Bug Fixes

  • Fixes a case when a control batch is the last user batch in a segment and the log and is being compacted away. This gave a perception of consumption hang from a client POV but in reality there is nothing to consume after that point. Not discarding such batches lets the consumer offset make progress and reach LSO.

control batches (tx_commit/tx_abort) are raft_data batches too and may
accidentally overwrite previous user raft_data batches with same key.
Ensure that doesn't happen

(cherry picked from commit 675ed6e)
Currently we always discard control batches. This is a problem when the
batch being discarded is the last batch in a segment. In some edgecases
where there is no user data after the control batch (empty new segment),
consumers may be stuck behind LSO and repeatedly trying to poll for data
when there is none. This may look like an infinite loop from a consumer
POV but its just that there is no data after that offset as it got
compacted away. This situation is not possible in a regular self
compaction becuase we always retain the last batch.

With this commit we use regular compaction process for control batches,
so the resulting segment may end up with some of them and guaranteed to
retain the last batch. These are just filtered away on the client side
but gurantees the consumed offset makes progress.

(cherry picked from commit 55d2ae2)
Undos a check in verifier that was relaxed in 3e4a479.
Adds a test that ensures consumers can make progress when last batch
in a compacted segment is a control batch

(cherry picked from commit 0f0bfdf)
@bharathv bharathv merged commit 2158fb3 into redpanda-data:v23.1.x Sep 28, 2023
7 checks passed
@bharathv bharathv deleted the v231x-compaction branch September 28, 2023 01:36
@BenPope BenPope added this to the v23.1.19 milestone Oct 10, 2023
@BenPope BenPope added the kind/backport PRs targeting a stable branch label Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda kind/backport PRs targeting a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants