Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][ml] Filter out deleted entries before read entries from ledger. #21739

Merged
merged 19 commits into from
Jan 19, 2024

Conversation

dao-jun
Copy link
Member

@dao-jun dao-jun commented Dec 16, 2023

PIP: I'm not sure if the PR needs a PIP

Motivation

In #19035 we introduced skipCondition before read entries from ledger.

It filters out delay-messages for the purpose of optimizing performance: to avoid reading unnecessary entries to save memory usage, network usage and reduce latency of reading entries.

As the same purpose, this PR wants to filter-out deleted entries before reading entries from ledger to optimize performance.

This PR won't change the semantics of readEntries , it just made some enhancements.

It may brings huge performance optimization in the individualDelete mode, especially when bookie client metadata loss, bookie auto-recover disabled and some bookie nodes down.

Modifications

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: dao-jun#4

Copy link

@dao-jun Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@github-actions github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Dec 16, 2023
@dao-jun dao-jun added category/performance Performance issues fix or improvements ready-to-test area/ML labels Dec 19, 2023
@dao-jun dao-jun changed the title [improve][broker] Filter out deleted entries before read entries from ledger. [improve][ml] Filter out deleted entries before read entries from ledger. Dec 19, 2023
@dao-jun
Copy link
Member Author

dao-jun commented Dec 19, 2023

/pulsarbot run-failure-checks

@dao-jun dao-jun requested a review from coderzc December 20, 2023 05:57
@Technoboy- Technoboy- added this to the 3.3.0 milestone Dec 22, 2023
@codelipenghui
Copy link
Contributor

It may brings huge performance optimization in the individualDelete mode, especially when bookie client metadata loss, bookie auto-recover disabled and some bookie nodes down.

Could you please explain more about this line? I don't understand why it pertains to bookie client metadata loss, the disabled auto-recovery and bookies unavailability.

@dao-jun
Copy link
Member Author

dao-jun commented Jan 5, 2024

@codelipenghui @coderzc PTAL

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (22838ea) 36.46% compared to head (276358b) 73.58%.

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #21739       +/-   ##
=============================================
+ Coverage     36.46%   73.58%   +37.11%     
- Complexity    12390    32381    +19991     
=============================================
  Files          1725     1861      +136     
  Lines        131701   138596     +6895     
  Branches      14401    15187      +786     
=============================================
+ Hits          48027   101985    +53958     
+ Misses        77254    28700    -48554     
- Partials       6420     7911     +1491     
Flag Coverage Δ
inttests 24.14% <0.00%> (+0.02%) ⬆️
systests 23.71% <0.00%> (+0.04%) ⬆️
unittests 72.86% <100.00%> (+40.81%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...che/bookkeeper/mledger/impl/ManagedCursorImpl.java 79.22% <100.00%> (+30.88%) ⬆️
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java 80.53% <ø> (+32.69%) ⬆️
...he/bookkeeper/mledger/impl/ReadOnlyCursorImpl.java 95.00% <100.00%> (+95.00%) ⬆️

... and 1435 files with indirect coverage changes

@dao-jun dao-jun merged commit c66167b into apache:master Jan 19, 2024
47 checks passed
@coderzc
Copy link
Member

coderzc commented Jan 20, 2024

approve

@codelipenghui
Copy link
Contributor

I have added the cherry-pick labels since this improvement fixed the issue which can impact the bookies if the cursor rewind to the earlier position (the mark delete position can't move forward due to individual acks).

codelipenghui pushed a commit that referenced this pull request Feb 8, 2024
codelipenghui pushed a commit that referenced this pull request Feb 8, 2024
codelipenghui pushed a commit that referenced this pull request Feb 8, 2024
@dao-jun dao-jun deleted the dev/skip_deleted_entries branch February 24, 2024 07:00
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Mar 1, 2024
…ger. (apache#21739)

(cherry picked from commit c66167b)
(cherry picked from commit 84ed73e)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Mar 6, 2024
…ger. (apache#21739)

(cherry picked from commit c66167b)
(cherry picked from commit 84ed73e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants