[KAFKA-8522] Streamline tombstone and transaction marker removal #10914

mattwong949 · 2021-06-22T01:21:31Z

This is rebased PR for #7884 and #9915.

This PR aims to remove tombstones that persist indefinitely due to low throughput. Previously, deleteHorizon was calculated from the segment's last modified time.

In this PR, the deleteHorizon will now be tracked in the baseTimestamp of RecordBatches. After the first cleaning pass that finds a record batch with tombstones, the record batch is recopied with deleteHorizon flag and a new baseTimestamp that is the deleteHorizonMs. The records in the batch are rebuilt with relative timestamps based on the deleteHorizonMs that is recorded. Later cleaning passes will be able to remove tombstones more accurately on their deleteHorizon due to the individual time tracking on record batches.

KIP 534: https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds

co author: @ConcurrencyPractitioner

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

mattwong949 · 2021-06-22T01:27:27Z

@junrao @hachikuji Could you help take a review pass? I know Jun has reviewed before, but since we've rebased several times I think it would be helpful to look over again

checkstyle/checkstyle.xml

clients/src/main/java/org/apache/kafka/common/record/DefaultRecordBatch.java

mattwong949 · 2021-07-12T22:43:34Z

checkstyle/suppressions.xml

@@ -57,7 +57,7 @@
    <suppress checks="ParameterNumber"
              files="DefaultRecordBatch.java"/>
    <suppress checks="ParameterNumber"
-              files="Sender.java"/>


the same suppression is on L54

clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java

core/src/main/scala/kafka/log/LogCleaner.scala

clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java

clients/src/main/java/org/apache/kafka/common/record/MemoryRecordsBuilder.java

clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java

clients/src/main/java/org/apache/kafka/common/record/MemoryRecordsBuilder.java

clients/src/test/java/org/apache/kafka/common/record/MemoryRecordsTest.java

core/src/main/scala/kafka/log/LogCleaner.scala

core/src/main/scala/kafka/log/Log.scala

junrao

@mattwong949 : Thanks for the PR. Just a few minor comments below.

clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java

core/src/main/scala/kafka/log/LogCleaner.scala

clients/src/test/java/org/apache/kafka/common/record/MemoryRecordsTest.java

core/src/main/scala/kafka/log/LogCleaner.scala

clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java

core/src/test/scala/unit/kafka/log/LogCleanerTest.scala

clients/src/test/java/org/apache/kafka/common/record/MemoryRecordsTest.java

clients/src/main/java/org/apache/kafka/common/record/DefaultRecordBatch.java

core/src/main/scala/kafka/log/LogCleaner.scala

hachikuji · 2021-08-31T22:48:09Z

core/src/main/scala/kafka/log/LogCleaner.scala

@@ -522,13 +523,13 @@ private[log] class Cleaner(val id: Int,
    val cleanableHorizonMs = log.logSegments(0, cleanable.firstUncleanableOffset).lastOption.map(_.lastModified).getOrElse(0L)

    // group the segments and clean the groups
-    info("Cleaning log %s (cleaning prior to %s, discarding tombstones prior to %s)...".format(log.name, new Date(cleanableHorizonMs), new Date(deleteHorizonMs)))
+    info("Cleaning log %s (cleaning prior to %s, discarding legacy tombstones prior to %s)...".format(log.name, new Date(cleanableHorizonMs), new Date(legacyDeleteHorizonMs)))


Might not be very clear what a "legacy tombstone" means. Would it be fair to call this an upper bound on the deletion horizon?

core/src/main/scala/kafka/log/LogCleanerManager.scala

hachikuji · 2021-08-31T23:26:07Z

core/src/main/scala/kafka/log/LogCleanerManager.scala

+            // therefore, we should take advantage of this fact and remove tombstones if we can
+            // under the condition that the log's latest delete horizon is less than the current time
+            // tracked
+            ltc.log.latestDeleteHorizon != RecordBatch.NO_TIMESTAMP && ltc.log.latestDeleteHorizon <= time.milliseconds()


When the broker is initialized, log.latestDeleteHorizon will be NO_TIMESTAMP. We need at least one run to trigger before we can initialize the value. Is there another condition we can rely on in order to ensure that the cleaning still occurs?

Related to this, I am a bit concerned about the extra cleaning due to this. If we have just one tombstone record, this can force a round of cleaning on idle partitions. An alternative way is to clean the number of total surviving records and tombstone records during cleaning. We only trigger a cleaning if #tombstone/#totalRecords > minCleanableRatio. @hachikuji What do you think?

@junrao Yeah, that's an interesting idea. Do you think it would be possible to make it a size-based comparison?

Yes, ideally, we want to do size based estimate. I just not sure how accurate we can estimate size given batching and compression.

It seems like whether we track the delete horizon or the # of tombstones we will need to checkpoint some state. Otherwise we will be forced to perform a pass after every broker restart. Could we track the delete horizon upon each log append, when we clean the log, and when we have to recover the log?

I'm not sure where a checkpoint should be stored given our current checkpoint file formats and the need to support downgrades.

We could store some additional stats related to tombstone in the logcleaner checkpoint file. It seems that to support downgrade, we can't change the version number since the existing code expects the version in the file to match that in the code.

@junrao @hachikuji @lbradstreet I've removed the logic for tracking the latestDeleteHorizon and the deleteHorizon-triggered cleaning in grabFilthiestCompactedLog since this part of the PR is not a part of KIP-534

core/src/main/scala/kafka/log/LogCleanerManager.scala

core/src/main/scala/kafka/log/LogCleaner.scala

junrao

@mattwong949 : Thanks for the updated PR. A couple of more comments.

clients/src/test/java/org/apache/kafka/common/record/MemoryRecordsTest.java

junrao · 2021-09-02T16:32:05Z

core/src/main/scala/kafka/log/LogCleanerManager.scala

+            // therefore, we should take advantage of this fact and remove tombstones if we can
+            // under the condition that the log's latest delete horizon is less than the current time
+            // tracked
+            ltc.log.latestDeleteHorizon != RecordBatch.NO_TIMESTAMP && ltc.log.latestDeleteHorizon <= time.milliseconds()


Related to this, I am a bit concerned about the extra cleaning due to this. If we have just one tombstone record, this can force a round of cleaning on idle partitions. An alternative way is to clean the number of total surviving records and tombstone records during cleaning. We only trigger a cleaning if #tombstone/#totalRecords > minCleanableRatio. @hachikuji What do you think?

core/src/main/scala/kafka/log/LogCleanerManager.scala

junrao

@mattwong949 : Thanks for the updated PR. A few more comments below.

core/src/main/scala/kafka/log/LogCleaner.scala

junrao · 2021-09-15T00:21:09Z

core/src/main/scala/kafka/log/LogCleaner.scala

+        if (batch.isControlBatch)
+          discardBatchRecords = canDiscardBatch && batch.deleteHorizonMs().isPresent && batch.deleteHorizonMs().getAsLong <= currentTime
+        else
+          discardBatchRecords = canDiscardBatch


This is an existing issue. The following comment on line 1136 seems out of place since the code does that check is inside isBatchLastRecordOfProducer() below.

// We may retain a record from an aborted transaction if it is the last entry // written by a given producerId.

makes sense. I've removed that comment on 1136 since the case is mentioned in isBatchLastRecordOfProducer

junrao · 2021-09-15T00:30:23Z

core/src/main/scala/kafka/log/LogCleaner.scala

+            BatchRetention.DELETE
+          else
+            BatchRetention.DELETE_EMPTY
+        new RecordFilter.BatchRetentionResult(batchRetention, canDiscardBatch)


It seems that containsMarkerForEmptyTxn should only be set to canDiscardBatch if this batch is a control batch?

hmm yeah I think you are right. I'll change to canDiscardBatch && batch.isControlBatch

core/src/main/scala/kafka/log/LogCleaner.scala

core/src/main/scala/kafka/log/LogCleanerManager.scala

junrao

@mattwong949 : Thanks for the updated PR. Just a minor comment below.

@hachikuji : Do you have any more comments?

core/src/main/scala/kafka/log/LogCleaner.scala

junrao

@mattwong949 : Thanks for the updated PR. LGTM

…che#10914) This PR aims to remove tombstones that persist indefinitely due to low throughput. Previously, deleteHorizon was calculated from the segment's last modified time. In this PR, the deleteHorizon will now be tracked in the baseTimestamp of RecordBatches. After the first cleaning pass that finds a record batch with tombstones, the record batch is recopied with deleteHorizon flag and a new baseTimestamp that is the deleteHorizonMs. The records in the batch are rebuilt with relative timestamps based on the deleteHorizonMs that is recorded. Later cleaning passes will be able to remove tombstones more accurately on their deleteHorizon due to the individual time tracking on record batches. KIP 534: https://cwiki.apache.org/confluence/display/KAFKA/KIP-534%3A+Retain+tombstones+and+transaction+markers+for+approximately+delete.retention.ms+milliseconds Co-authored-by: Ted Yu <[email protected]> Co-authored-by: Richard Yu <[email protected]>

…`deleteHorizonMs` in batch format (#11694) This PR updates the documentation and tooling to match #10914, which added support for encoding `deleteHorizonMs` in the record batch schema. The changes include adding the new attribute and updating field names. We have also updated stale references to the old `FirstTimestamp` field in the code and comments. Finally, In the `DumpLogSegments` tool, when record batch information is printed, it will also include the value of `deleteHorizonMs` is (e.g. `OptionalLong.empty` or `OptionalLong[123456]`). Reviewers: Vincent Jiang <[email protected]>, Kvicii <[email protected]>, Jason Gustafson <[email protected]>

tedyu and others added 5 commits January 15, 2021 13:58

[KAKFA-8522] Streamline tombstone and transaction marker removal

96d128a

Merge branch 'trunk' into trunk

6c105a0

Merge branch 'trunk' into trunk

0506bda

Merge branch 'trunk' of github.com:apache/kafka into trunk

147803f

missed a couple fixes on the merge

e179f92

mattwong949 mentioned this pull request Jun 22, 2021

[KAKFA-8522] Streamline tombstone and transaction marker removal #9915

Closed

3 tasks

hachikuji reviewed Jul 7, 2021

View reviewed changes

mattwong949 added 2 commits July 7, 2021 16:44

Merge branch 'trunk' of github.com:apache/kafka into KAKFA-8522

44c50bd

fix nit, remove firstTimestamp(), revert checkstyle

8403960

hachikuji reviewed Jul 8, 2021

View reviewed changes

clients/src/main/java/org/apache/kafka/common/record/DefaultRecordBatch.java Outdated Show resolved Hide resolved

clients/src/main/java/org/apache/kafka/common/record/DefaultRecordBatch.java Outdated Show resolved Hide resolved

mattwong949 added 3 commits July 12, 2021 15:04

suppress checkstyle

6731c4d

address comments - use OptionalLong, rename constants

bb47b16

improve test for correct timestamps as delta for delete horizon

c293392

mattwong949 commented Jul 12, 2021

View reviewed changes

vincent81jiang reviewed Jul 29, 2021

View reviewed changes

clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java Outdated Show resolved Hide resolved

vincent81jiang reviewed Jul 29, 2021

View reviewed changes

core/src/main/scala/kafka/log/LogCleaner.scala Outdated Show resolved Hide resolved

vincent81jiang reviewed Jul 29, 2021

View reviewed changes

clients/src/main/java/org/apache/kafka/common/record/MemoryRecords.java Outdated Show resolved Hide resolved

address comments

798a5f7

hachikuji reviewed Aug 3, 2021

View reviewed changes

mattwong949 added 3 commits August 3, 2021 13:19

wip addressing comments

3f4dc87

Add tests to MemoryRecordsBuilder for direct deleteHorizon value set

7a13fc1

add in latestDeleteHorizon again for review

9bc5cb6

mattwong949 requested a review from hachikuji August 5, 2021 17:23

mattwong949 changed the title ~~[KAKFA-8522] Streamline tombstone and transaction marker removal~~ [KAFKA-8522] Streamline tombstone and transaction marker removal Aug 10, 2021

mattwong949 added 2 commits August 17, 2021 18:14

Merge branch 'trunk' of github.com:apache/kafka into KAKFA-8522

6fa401d

Trigger build

cd77df9

junrao reviewed Aug 31, 2021

View reviewed changes

hachikuji reviewed Aug 31, 2021

View reviewed changes

Merge branch 'trunk' of github.com:apache/kafka into KAKFA-8522

d036c08

address Jun's comments, log reason for compaction

43698e0

junrao reviewed Sep 2, 2021

View reviewed changes

lbradstreet reviewed Sep 6, 2021

View reviewed changes

core/src/main/scala/kafka/log/LogCleanerManager.scala Outdated Show resolved Hide resolved

lbradstreet reviewed Sep 6, 2021

View reviewed changes

core/src/main/scala/kafka/log/LogCleanerManager.scala Show resolved Hide resolved

mattwong949 added 6 commits September 9, 2021 14:43

addressing comments

05fea42

address Jason's comment on extra var in shouldRetainRecord

217624a

move legacyDeleteHorizonMs computation to doClean

890c355

remove DeleteHorizon cleaning pass and LogCleaningReason

9f69673

remove logic for delethorizon-triggered cleaning

6f2f2b7

address last comments on code comments, readability, logging

2477f34

mattwong949 requested review from junrao and hachikuji September 10, 2021 22:39

junrao reviewed Sep 15, 2021

View reviewed changes

address Jun's comments

b113c47

junrao reviewed Sep 15, 2021

View reviewed changes

core/src/main/scala/kafka/log/LogCleaner.scala Outdated Show resolved Hide resolved

fix comment

cf7b0a7

junrao approved these changes Sep 16, 2021

View reviewed changes

junrao merged commit 6c80643 into apache:trunk Sep 16, 2021

mattwong949 mentioned this pull request Jan 20, 2022

MINOR: deleteHorizonMs update to documentation and DumpLogSegments tool #11694

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KAFKA-8522] Streamline tombstone and transaction marker removal #10914

[KAFKA-8522] Streamline tombstone and transaction marker removal #10914

mattwong949 commented Jun 22, 2021 •

edited

Loading

mattwong949 commented Jun 22, 2021

mattwong949 Jul 12, 2021

junrao left a comment

hachikuji Aug 31, 2021

hachikuji Aug 31, 2021

junrao Sep 2, 2021

hachikuji Sep 2, 2021

junrao Sep 3, 2021

lbradstreet Sep 6, 2021

junrao Sep 8, 2021

mattwong949 Sep 10, 2021

junrao left a comment

junrao Sep 2, 2021

junrao left a comment

junrao Sep 15, 2021

mattwong949 Sep 15, 2021

junrao Sep 15, 2021

mattwong949 Sep 15, 2021

junrao left a comment

junrao left a comment

[KAFKA-8522] Streamline tombstone and transaction marker removal #10914

[KAFKA-8522] Streamline tombstone and transaction marker removal #10914

Conversation

mattwong949 commented Jun 22, 2021 • edited Loading

Committer Checklist (excluded from commit message)

mattwong949 commented Jun 22, 2021

Choose a reason for hiding this comment

junrao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junrao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junrao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junrao left a comment

Choose a reason for hiding this comment

junrao left a comment

Choose a reason for hiding this comment

mattwong949 commented Jun 22, 2021 •

edited

Loading