Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][broker][client] PIP-192 PIP-215 Added TopicCompactionStrategy for StrategicTwoPhaseCompactor and TableView. #18195

Merged
merged 5 commits into from
Dec 21, 2022

Conversation

heesung-sn
Copy link
Contributor

Master Issue: #18099

Motivation

This PR supports TopicCompactionStrategy in StrategicTwoPhaseCompactor and TableView for the internal system topics, as proposed in PIP-215.

This PR does not expose StrategicTwoPhaseCompactor for customer topics yet. We want to expose this feature to users after proven to be stable on the new system topic in the new broker load balancer(in PIP-192) first.

Modifications

This PR adds the following classes to implement `StrategicTwoPhaseCompactor.

  • Added StrategicTwoPhaseCompactor that extends TwoPhaseCompactor.
  • Added TopicCompactionStrategy interface.
  • Added TopicCompactionStrategy logic in StrategicTwoPhaseCompactor and TableViewImpl.
  • Added CompactionReaderImpl to scan topic messages in StrategicTwoPhaseCompactor. It cumulatively acknowledges the read messages at the end of StrategicTwoPhaseCompactor.
  • Added SubscriptionMode and SubscriptionInitialPosition parameters in ReaderConfigurationData since the CompactionReaderImpl reader's subscription needs to be durable and needs to read from the earliest position.
  • Added RawBatchMessageContainerImpl to batch and serialize messages in StrategicTwoPhaseCompactor.

This PR updates the TableViewImpl to use TopicCompactionStrategy in its data K,V map update logic.

  • Added the topicCompactionStrategy member variable in TableViewConfigurationData
  • Added the listen() interface in TableView to provide an option to call the listener actions only for the tail messages.
  • Updated TableViewImpl and its handleMessage() function to consider TopicCompactionStrategy when updating the data K,V map.

This PR updated the compaction test classes to reuse the test cases.

This PR updated the modifiers of the parent classes of the added classes to access the member variables and member functions.

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • Extended the compaction tests to cover the same logic passes with the StrategicTwoPhaseCompactor.
  • Added other unit tests for the added classes.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • [] Dependencies (add or upgrade a dependency)
  • The public API (table-view.listen() addition and reader and table-view configuration update)
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

This PR does not enable strategic compaction for the customer topics. We will update the doc when enabling this compaction for the customer topics.

Matching PR in forked repository

PR in forked repository: heesung-sn#12

@heesung-sn heesung-sn force-pushed the pip-215 branch 2 times, most recently from 2880ab0 to a53b13c Compare November 5, 2022 00:49
Comment on lines 373 to 380
addToCompactedLedger(lh, message, topic, outstanding)
.whenComplete((res, exception2) -> {
outstanding.release();
if (exception2 != null) {
promise.completeExceptionally(exception2);
return;
}
});
phaseTwoLoop(topic, reader, lh, outstanding, promise);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
addToCompactedLedger(lh, message, topic, outstanding)
.whenComplete((res, exception2) -> {
outstanding.release();
if (exception2 != null) {
promise.completeExceptionally(exception2);
return;
}
});
phaseTwoLoop(topic, reader, lh, outstanding, promise);
addToCompactedLedger(lh, message, topic, outstanding)
.whenComplete((res, exception2) -> {
outstanding.release();
if (exception2 != null) {
promise.completeExceptionally(exception2);
return;
}
phaseTwoLoop(topic, reader, lh, outstanding, promise);
});

If I understand correctly, the logic should be like this, right?

Copy link
Contributor Author

@heesung-sn heesung-sn Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. We don't want to wait for each addToCompactedLedger call to complete for high concurrency (this concurrency is limited by the semaphore, outstanding).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

@heesung-sn heesung-sn force-pushed the pip-215 branch 2 times, most recently from b4ee527 to b27bdfb Compare November 7, 2022 19:31
@codecov-commenter
Copy link

codecov-commenter commented Nov 7, 2022

Codecov Report

Merging #18195 (ab2e743) into master (22866bd) will increase coverage by 0.76%.
The diff coverage is 69.18%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #18195      +/-   ##
============================================
+ Coverage     45.92%   46.69%   +0.76%     
- Complexity    10104    10533     +429     
============================================
  Files           680      709      +29     
  Lines         66758    69358    +2600     
  Branches       7147     7441     +294     
============================================
+ Hits          30660    32388    +1728     
- Misses        32680    33334     +654     
- Partials       3418     3636     +218     
Flag Coverage Δ
unittests 46.69% <69.18%> (+0.76%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...n/java/org/apache/pulsar/compaction/Compactor.java 80.76% <ø> (ø)
...rg/apache/pulsar/compaction/TwoPhaseCompactor.java 73.73% <ø> (+0.92%) ⬆️
...a/org/apache/pulsar/client/impl/TableViewImpl.java 0.00% <0.00%> (ø)
...pulsar/client/impl/TableViewConfigurationData.java 35.71% <50.00%> (+2.38%) ⬆️
.../pulsar/client/impl/BatchMessageContainerImpl.java 55.95% <66.66%> (-1.02%) ⬇️
...lsar/client/impl/RawBatchMessageContainerImpl.java 73.13% <73.13%> (ø)
.../pulsar/compaction/StrategicTwoPhaseCompactor.java 76.19% <76.19%> (ø)
...pache/pulsar/client/impl/CompactionReaderImpl.java 90.00% <90.00%> (ø)
...java/org/apache/pulsar/client/impl/ReaderImpl.java 38.94% <100.00%> (+0.64%) ⬆️
...lsar/client/impl/conf/ReaderConfigurationData.java 81.39% <100.00%> (+0.90%) ⬆️
... and 90 more

this.cryptoKeyReader = cryptoKeyReader;
}

public ByteBuf toByteBuf() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some comments to describe which exceptions will throw in this method? Since this method is public, some others develop might use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

I will add the comments when resolving comments from others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@heesung-sn heesung-sn changed the title [improve][broker] PIP-215 Added TopicCompactionStrategy for StrategicTwoPhaseCompactor and TableView. [improve][broker][client] PIP-215 Added TopicCompactionStrategy for StrategicTwoPhaseCompactor and TableView. Nov 17, 2022
* batched into single batch message:
* [(k1, v1), (k2, v1), (k3, v1), (k1, v2), (k2, v2), (k3, v2), (k1, v3), (k2, v3), (k3, v3)]
*/
public class RawBatchMessageContainerImpl extends BatchMessageContainerImpl {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this is just a need for a bulk message container based on the maximum number of batches, and I feel that extending the BatchMessageContainerImpl will become too heavy(There are various producer-related properties and operations in the BatchMessageContainer implementation), and it also adds to the complexity of the original BatchMessageContainer.

Maybe a simple batch container could be re-implemented. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to reuse the batching and payload serialization logic from the parent class.

I do not think this PR adds significant complexity to the parent class.
It just checks if the producer is null or not.

We could pass a mock producer if adding the producer null-check is not desirable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, sorry, I miss some code.

We could pass a mock producer if adding the producer null-check is not desirable.

I'm leaning towards this.

other, I wonder why batch processing is needed here? Is it because it is known that there is a performance bottleneck here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I am not fully convinced this mock producer is better than the null check. Can you explain why this could be better? The base class sets producer later too, it also has potential that producer could be null.

Obviously, We dont want to create ledger entry per message. especially this message payload is very small.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is that developers will miss this later when they modify it. Using mock producer is transfer behavior to RawBatchMessageContainerImpl .

I revisited the code, and the current implementation is not quite capable of the mock producer.

like this:

producer.client.getMemoryLimitController().releaseMemory(msg.getUncompressedSize()
+ batchAllocatedSizeBytes);

if (producer != null) {
ProducerImpl.LAST_SEQ_ID_PUSHED_UPDATER.getAndUpdate(producer, prev -> Math.max(prev, msg.getSequenceId()));
}

BWT: In the original implementation, BatchContainer coupled with the producer I didn't think was very good. We should make BatchContiner independent, Leave things like releaseMemory to the producer. (This is off topic and does not affect this PR)

@heesung-sn
Copy link
Contributor Author

Rebased this pip-215 branch on top of the master.

@codelipenghui codelipenghui merged commit 05e6f5e into apache:master Dec 21, 2022
@heesung-sn heesung-sn changed the title [improve][broker][client] PIP-215 Added TopicCompactionStrategy for StrategicTwoPhaseCompactor and TableView. [improve][broker][client] PIP-192 PIP-215 Added TopicCompactionStrategy for StrategicTwoPhaseCompactor and TableView. Feb 12, 2023
@heesung-sn heesung-sn deleted the pip-215 branch April 2, 2024 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker doc-not-needed Your PR changes do not impact docs ready-to-test type/feature The PR added a new feature or issue requested a new feature type/PIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants