-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the performance of TextSource by reducing how many byte[]s are copied (fixes #23193) #23196
Conversation
R: @bhisevishal |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
…e copied (fixes apache#23193) This makes TextSource take about 2.3x less CPU resources during decoding. Before this change: ``` TextSourceBenchmark.benchmarkTextSource thrpt 5 0.248 ± 0.029 ops/s ``` After this change: ``` TextSourceBenchmark.benchmarkHadoopLineReader thrpt 5 0.465 ± 0.064 ops/s TextSourceBenchmark.benchmarkTextSource thrpt 5 0.575 ± 0.059 ops/s ```
Run Java PreCommit |
Run Java_Examples_Dataflow PreCommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Please link the issue by adding "Fixes #23193" in the description.
* <p>We're reading data from inChannel, but the head of the stream may be already buffered in | ||
* buffer, so we have several cases: 1. No newline characters are in the buffer, so we need to | ||
* copy everything and read another buffer from the stream. 2. An unambiguously terminated line | ||
* is in buffer, so we just create currentValue 3. Ambiguously terminated line is in buffer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could this be an HTML ordered list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Its already part of the description, what am I missing? |
You have it in the title, but not in the description. There's some documentation here: https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue |
That is an annoying limitation. Thanks for pointing it out. |
Run Spotless PreCommit |
Run Java PreCommit |
…te[]s are copied (fixes apache#23193) (apache#23196)" This reverts commit 30a48f0
This makes TextSource take about 2.3x less CPU resources during decoding.
Before this change:
After this change:
Fixes #23193
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.