Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][tiered storage] Reduce cpu usage when offloading the ledger #15063

Merged
merged 20 commits into from
Jun 6, 2022

Conversation

zymap
Copy link
Member

@zymap zymap commented Apr 7, 2022


Motivation

When offloading a ledger, the BlockAwareSegmentInputStreamImpl will
wrap the ledger handler and make it can stream output. Then the JCloud
will read the stream as the payload and upload to the storage.
In the JCloud implementation, it read the stream with a buffer
https://github.com/apache/jclouds/blob/36f351cd18925d2bb27bf7ad2c5d75e555da377a/core/src/main/java/org/jclouds/io/ByteStreams2.java#L68

In the current offload implementation, the read will call multiple times
to construct the buffer and then return the data.
After implement the read(byte[] b, int off, int len), the cpu usage reduced
almost 10%.

Modifications

  • Add read(byte[] b, int off, int len) implementation in the BlockAwareSegmentInputStreamImpl

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API: (yes / no)
  • The schema: (yes / no / don't know)
  • The default values of configurations: (yes / no)
  • The wire protocol: (yes / no)
  • The rest endpoints: (yes / no)
  • The admin cli options: (yes / no)
  • Anything that affects deployment: (yes / no / don't know)

Documentation

Check the box below or label this PR directly.

Need to update docs?

  • doc-required
    (Your PR needs to update docs and you will update later)

  • no-need-doc
    (Please explain why)

  • doc
    (Your PR contains doc changes)

  • doc-added
    (Docs have been already added)

---

*Motivation*

When offloading a ledger, the BlockAwareSegmentInputStreamImpl will
wrap the ledger handler and make it can stream output. Then the JCloud
will read the stream as the payload and upload to the storage.
In the JCloud implementation, it read the stream with a buffer
https://github.com/apache/jclouds/blob/36f351cd18925d2bb27bf7ad2c5d75e555da377a/core/src/main/java/org/jclouds/io/ByteStreams2.java#L68

In the current offload implementation, the read will call multiple times
to construct the buffer and then return the data.
After implement the read(byte[] b, int off, int len), the cpu usage reduced
almost 10%.

*Modifications*

- Add read(byte[] b, int off, int len) implementation in the BlockAwareSegmentInputStreamImpl
@zymap zymap added type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages area/tieredstorage release/2.9.3 release/2.10.1 labels Apr 7, 2022
@zymap zymap requested review from hangc0276 and codelipenghui April 7, 2022 08:25
@zymap zymap self-assigned this Apr 7, 2022
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Apr 7, 2022
@zymap
Copy link
Member Author

zymap commented Apr 7, 2022

Before the change:
Screen Shot 2022-04-07 at 15 47 58

After the change:
Screen Shot 2022-04-07 at 15 47 42

@dave2wave dave2wave requested a review from lhotari April 11, 2022 14:50
@zymap
Copy link
Member Author

zymap commented Apr 13, 2022

@hangc0276 @horizonzy Could you please take another look?

@hangc0276 hangc0276 requested review from eolivelli and merlimat April 13, 2022 08:53
@zymap
Copy link
Member Author

zymap commented Apr 21, 2022

/pulsarbot run-failure-checks

@horizonzy
Copy link
Member

horizonzy commented May 6, 2022

@horizonzy All the required tests are passed. Why do I need to rebase to the latest master?

https://github.com/apache/pulsar/pull/15063/checks?check_run_id=6224983508. The first test have fixed recently, I'm not sure is it a new situation.

@zymap
Copy link
Member Author

zymap commented May 6, 2022

@horizonzy rebased

@zymap
Copy link
Member Author

zymap commented May 9, 2022

ping @eolivelli

}

if (!entriesByteBuf.isEmpty()
&& bytesReadOffset + entriesByteBuf.get(0).readableBytes() <= blockSize) {
Copy link
Member

@horizonzy horizonzy May 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needn't check every time, maybe it just need check then entriesByteBuf's 0 element changes.

@horizonzy
Copy link
Member

Add a test for hybrid read way (batch and 1 byte), it should work well.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work

Copy link
Member

@horizonzy horizonzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@horizonzy
Copy link
Member

👍

@mattisonchao
Copy link
Member

Hi @zymap

Sorry I made a mistake just now that the PR was re-opened.

@zymap zymap merged commit 938ab7b into apache:master Jun 6, 2022
codelipenghui pushed a commit to codelipenghui/incubator-pulsar that referenced this pull request Jun 7, 2022
…apache#15063)

* [imporve][tiered storage] Reduce cpu usage when offloading the ledger
---

*Motivation*

When offloading a ledger, the BlockAwareSegmentInputStreamImpl will
wrap the ledger handler and make it can stream output. Then the JCloud
will read the stream as the payload and upload to the storage.
In the JCloud implementation, it read the stream with a buffer
https://github.com/apache/jclouds/blob/36f351cd18925d2bb27bf7ad2c5d75e555da377a/core/src/main/java/org/jclouds/io/ByteStreams2.java#L68

In the current offload implementation, the read will call multiple times
to construct the buffer and then return the data.
After implement the read(byte[] b, int off, int len), the cpu usage reduced
almost 10%.

*Modifications*

- Add read(byte[] b, int off, int len) implementation in the BlockAwareSegmentInputStreamImpl

(cherry picked from commit 938ab7b)
@codelipenghui codelipenghui added this to the 2.11.0 milestone Jun 7, 2022
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Jun 7, 2022
…apache#15063)

* [imporve][tiered storage] Reduce cpu usage when offloading the ledger
---

*Motivation*

When offloading a ledger, the BlockAwareSegmentInputStreamImpl will
wrap the ledger handler and make it can stream output. Then the JCloud
will read the stream as the payload and upload to the storage.
In the JCloud implementation, it read the stream with a buffer
https://github.com/apache/jclouds/blob/36f351cd18925d2bb27bf7ad2c5d75e555da377a/core/src/main/java/org/jclouds/io/ByteStreams2.java#L68

In the current offload implementation, the read will call multiple times
to construct the buffer and then return the data.
After implement the read(byte[] b, int off, int len), the cpu usage reduced
almost 10%.

*Modifications*

- Add read(byte[] b, int off, int len) implementation in the BlockAwareSegmentInputStreamImpl

(cherry picked from commit 938ab7b)
mattisonchao pushed a commit that referenced this pull request Jun 7, 2022
…#15063)

* [imporve][tiered storage] Reduce cpu usage when offloading the ledger
---

*Motivation*

When offloading a ledger, the BlockAwareSegmentInputStreamImpl will
wrap the ledger handler and make it can stream output. Then the JCloud
will read the stream as the payload and upload to the storage.
In the JCloud implementation, it read the stream with a buffer
https://github.com/apache/jclouds/blob/36f351cd18925d2bb27bf7ad2c5d75e555da377a/core/src/main/java/org/jclouds/io/ByteStreams2.java#L68

In the current offload implementation, the read will call multiple times
to construct the buffer and then return the data.
After implement the read(byte[] b, int off, int len), the cpu usage reduced
almost 10%.

*Modifications*

- Add read(byte[] b, int off, int len) implementation in the BlockAwareSegmentInputStreamImpl

(cherry picked from commit 938ab7b)
@mattisonchao mattisonchao added the cherry-picked/branch-2.9 Archived: 2.9 is end of life label Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tieredstorage cherry-picked/branch-2.9 Archived: 2.9 is end of life cherry-picked/branch-2.10 doc-not-needed Your PR changes do not impact docs release/2.9.3 release/2.10.1 type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants