-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PIP-215: Configurable TopicCompactionStrategy for StrategicTwoPhaseCompactor and TableView #18099
Comments
Raised a local-fork PR for the implementation reference: heesung-sn#12 |
Raised a PR to the apache/master branch. #18195 |
heesung-sn
changed the title
PIP-215: Configurable Topic Compaction Strategy
PIP-215: Configurable TopicCompactionStrategy for StrategicTwoPhaseCompactor and TableView
Nov 3, 2022
PIP Discussion email: https://lists.apache.org/thread/m721nc0vwzo3wxg0tv3tprfc6z7xs1tj |
This was referenced Nov 17, 2022
The issue had no activity for 30 days, mark with Stale label. |
This was referenced Mar 15, 2023
Demogorgon314
pushed a commit
that referenced
this issue
Mar 29, 2023
Master Issue: Master Issue: #16691, #18099 ### Motivation Raising a PR to implement Master Issue: #16691, #18099 We want to reduce unload frequencies from flaky traffic. ### Modifications This PR - Introduced a config `loadBalancerSheddingConditionHitCountThreshold` to further restrict shedding conditions based on the hit count. - Normalized offload traffic - Lowered the default `loadBalanceSheddingDelayInSeconds` value from 600 to 180, as 10 mins are too long. 3 mins can be long enough to catch the new load after unloads. - Changed the config `loadBalancerBundleLoadReportPercentage` to `loadBalancerMaxNumberOfBundlesInBundleLoadReport` to make the topk bundle count absolute instead of relative. - Renamed `loadBalancerNamespaceBundleSplitConditionThreshold` to `loadBalancerNamespaceBundleSplitConditionHitCountThreshold` to be consistent with `*ConditionHitCountThreshold`. - Renamed `loadBalancerMaxNumberOfBrokerTransfersPerCycle ` to `loadBalancerMaxNumberOfBrokerSheddingPerCycle`. - Added LoadDataStore cleanup logic in BSC monitor. - Added `msgThroughputEMA` in BrokerLoadData to smooth the broker throughput info. - Updated Topk bundles sorted in a ascending order (instead of descending) - Update some info logs to only show in the debug mode. - Added load data tombstone upon Own, Releasing, Splitting - Added the bundle ownership(isOwned) check upon split and unload. - Added swap unload logic
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Motivation
Currently, the Topic compaction logic implemented in
TwoPhaseCompactor
only compacts messages to the last one(with the same key).Here, we want to configure Topic compaction with different strategies. For example, to support the Conflict State Resolution(Race Conditions) in PIP-192 (#16691), we need to compact messages with
the first valid states
.Goal
Create another Topic compactor,
StrategicTwoPhaseCompactor
, where we can configure a compaction strategy,TopicCompactionStrategy
Update the
TableViewConfigurationData
to load and consider theTopicCompactionStrategy
when updating the internal key-value map inTableView
.Add
TopicCompactionStrategy
in Topic-level Policy to selectively runStrategicTwoPhaseCompactor
orTwoPhaseCompactor
when executing compaction.Do not change the default behaviors of topic compaction and table views. Enable this feature only TopicCompactionStrategy is configured .
Make a conservative release. Initially use this strategic compaction feature only for the internal system topics. Do not expose until proven to be stable and requested by pulsar users.
API Changes
Implementation
StrategicTwoPhaseCompactor will have two phases.
First Phase:
Using the
CompactionReader<T>
, instead ofRawReader
, it will iterate each message and compact messages with the same keys by following themerge()
inTopicCompactionStrategy
.The CompactionReader will be added to the pulsar-broker only(not in the pulsar-client).
Second Phase:
The compacted messages will be written to a ledger.
When updating the internal key-value map, it will follow the same compaction logic defined in
TopicCompactionStrategy
.When running the compaction, it will look up the
TopicCompactionStrategy
in the Topic-level Policy and runStrategicTwoPhaseCompactor
, if configured. By default, it should runTwoPhaseCompactor
.Alternatives
Why not resolve conflict by a single broker(leader broker) using two topics : un-compacted and competed(pre-filter)?
produces compacted messages to the compacted topic(resolve conflicts by the single writer).
in the worst case, when
there are many conflicting messages, this PIP can incur more repeated
custom compaction than the alternative as individual consumers need to
compact messages (topic compaction and table views). However, one of the
advantages of this proposal is that pub/sub is faster since it uses a
single topic. For example, in PIP-192, if the "bundle assignment" broadcast
is fast enough, conflicting bundle assignment requests can be
reduced(broadcast filter effect).
Anything else?
No response
The text was updated successfully, but these errors were encountered: