Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-2357: Modest refactor of CapacityByteArrayOutputStream #1160

Closed
wants to merge 1 commit into from

Conversation

fengjiajie
Copy link
Contributor

@fengjiajie fengjiajie commented Sep 29, 2023

Optimization for the CapacityByteArrayOutputStream class:

  1. The functionality of currentSlabIndex is the same as currentSlab.position(), so there is no need to maintain the currentSlabIndex variable.
  2. When writing an array of length equal to the remaining capacity of the buffer, there is no need to expand to a new buffer.
  3. If the addSlab operation has already implemented safeguards using Math.addExact to prevent overflow of bytesAllocated and bytesUsed, it is unnecessary to perform additional checks during the write operation.

Make sure you have checked all steps below.

Jira

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

@fengjiajie fengjiajie changed the title [PARQUET-2357] Modest refactor of CapacityByteArrayOutputStream PARQUET-2357: Modest refactor of CapacityByteArrayOutputStream Sep 29, 2023
Change-Id: I3f73a44f2b764c710727c76b06824710058d6b21
@fengjiajie
Copy link
Contributor Author

Hi, @Fokko

There is a test case failure, same one in another pull request(#1164). I think it may pass if re-run. However, I don't have the permission to trigger the run. I tried pushing again, but it didn't trigger a re-run.
Could you please take a look? Thank you very much.

2023-10-02T11:22:48.6797210Z [INFO] Results:
2023-10-02T11:22:48.6894266Z [INFO] 
2023-10-02T11:22:48.6895143Z [ERROR] Failures: 
2023-10-02T11:22:48.6895878Z [ERROR]   TestParquetWriter.testParquetFileWithBloomFilterWithFpp:342
2023-10-02T11:22:48.6896575Z [INFO] 
2023-10-02T11:22:48.6897791Z [ERROR] Tests run: 404, Failures: 1, Errors: 0, Skipped: 1

@fengjiajie fengjiajie closed this Oct 3, 2023
@wgtmac
Copy link
Member

wgtmac commented Oct 12, 2023

Hi, @Fokko

There is a test case failure, same one in another pull request(#1164). I think it may pass if re-run. However, I don't have the permission to trigger the run. I tried pushing again, but it didn't trigger a re-run. Could you please take a look? Thank you very much.

2023-10-02T11:22:48.6797210Z [INFO] Results:
2023-10-02T11:22:48.6894266Z [INFO] 
2023-10-02T11:22:48.6895143Z [ERROR] Failures: 
2023-10-02T11:22:48.6895878Z [ERROR]   TestParquetWriter.testParquetFileWithBloomFilterWithFpp:342
2023-10-02T11:22:48.6896575Z [INFO] 
2023-10-02T11:22:48.6897791Z [ERROR] Tests run: 404, Failures: 1, Errors: 0, Skipped: 1

This test case unfortunately employs a random number for generating bloom filter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants