Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-2354: Fix race condition in CharsetValidator #1154

Conversation

findepi
Copy link
Member

@findepi findepi commented Sep 26, 2023

The CharsetValidator has a static singleton instance at BinaryTruncator.DEFAULT_UTF8_TRUNCATOR.validator, so it can be accessed from multiple threads. Before the change, all threads would operate on a shared "dummy buffer" for decoding.

Jira

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason: addresses concurrency issue

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

@findepi
Copy link
Member Author

findepi commented Sep 26, 2023

cc @raunaqmorarka

@findepi findepi force-pushed the findepi/fix-race-condition-in-charsetvalidator-a3bdcf branch from 4837979 to 3b0b88b Compare September 26, 2023 08:58
@findepi findepi changed the title Fix race condition in CharsetValidator PARQUET-2354 Fix race condition in CharsetValidator Sep 26, 2023
@findepi findepi changed the title PARQUET-2354 Fix race condition in CharsetValidator PARQUET-2354: Fix race condition in CharsetValidator Sep 26, 2023
The `CharsetValidator` has a static singleton instance at
`BinaryTruncator.DEFAULT_UTF8_TRUNCATOR.validator`, so it can be
accessed from multiple threads. Before the change, all threads would
operate on a shared "dummy buffer" for decoding.
@findepi findepi force-pushed the findepi/fix-race-condition-in-charsetvalidator-a3bdcf branch from 3b0b88b to 51db447 Compare September 26, 2023 09:00
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch @findepi

@findepi
Copy link
Member Author

findepi commented Sep 27, 2023

Thank you @wgtmac @Fokko for your review!

BTW i failed to run the failing test locally because brew installs too new thrift version. Can one of you please help me understand what do i need to do to fix it?

@Fokko
Copy link
Contributor

Fokko commented Sep 27, 2023

So with Thrift they bumped the minimal version of Java to 11, so Parquet was unable to upgrade since we still support Java 8. With Thrift 0.19 they backported Java 8 support on my request, let me fix #1138 and then you should be able to test locally.

@wgtmac wgtmac merged commit c844d26 into apache:master Sep 28, 2023
9 checks passed
@findepi findepi deleted the findepi/fix-race-condition-in-charsetvalidator-a3bdcf branch September 28, 2023 19:53
@findepi
Copy link
Member Author

findepi commented Sep 28, 2023

@Fokko thank you for explaining the thrift problem!

@wgtmac thank you for the merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants