Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-2361: Reduce failure rate of unit test #1170

Merged
merged 1 commit into from
Oct 19, 2023

Conversation

fengjiajie
Copy link
Contributor

@fengjiajie fengjiajie commented Oct 14, 2023

Reduce failure rate of unit test testParquetFileWithBloomFilterWithFpp

Change-Id: Ic230f197b0996333a082bb05bd201963d05d862e

[INFO] Results:
[INFO] 
Error:  Failures: 
Error:    TestParquetWriter.testParquetFileWithBloomFilterWithFpp:342
[INFO]  

Multiple different PR triggered this failure:

  1. Bump jmh.version from 1.21 to 1.36 #1062
  2. https://github.com/apache/parquet-mr/actions/runs/5420924489/job/14680382407
  3. https://github.com/apache/parquet-mr/actions/runs/6336014897
  4. https://github.com/apache/parquet-mr/actions/runs/6381223319
  5. https://github.com/apache/parquet-mr/actions/runs/6394826826/job/17357106390

I found two issues that may cause the failures to occur easily:

  1. This may be a bug. The 'distinctStrings' is used to generate test files, but it is cleared in the first round. Then it is used for probability testing, so in the next round of generating 'fpp' files, it uses the data from the previous round of testing, and the length of the filled strings (length 10 for test not in bloomfilter) is also different from the initial one (strings length 12 for build bloomfilter). 
  2. The number of test iterations is insufficient, resulting in an unstable probability.

Make sure you have checked all steps below.

Jira

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

@fengjiajie
Copy link
Contributor Author

After this modification, the unit tests are still failing. It seems that we need to explore other approaches, such as fixing the random seed or increasing the tolerance over 10% (current fpp * 1.1)

int falsePositive = 0;
Set<String> distinctStringsForProbe = new HashSet<>();
while (distinctStringsForProbe.size() < testBloomFilterCount) {
String str = RandomStringUtils.randomAlphabetic(randomStrLen - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason this cannot be randomStrLen by the way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amousavigourabi The purpose of unit testing is to ensure that the false positive rate of the bloom filter meets expectations. The current approach involves generating a bloom filter using many strings of length 12. Any data with a length other than 12 is guaranteed to not exist in the original data. For this data (length != 12), if the bloom filter returns true, it is considered a false positive. The false positive rate is then calculated by examining these cases. Alternatively, if we want to use strings of length 12 for testing, we need to randomly generate strings and check if they exist in the original data. Only the ones that do not exist can be used to test the false positive rate.

@amousavigourabi
Copy link
Contributor

@fengjiajie As the test is now evaluating the false positive rate with significantly more samples than what we use to build the filter, or are provided as NDV, might it not be the case that this will increase the false positive rate of the bloom filter to more than the FPP? Perhaps we could try increasing the NDV, or maybe an adaptive bloom filter might be more appropriate? WDYT?

@amousavigourabi
Copy link
Contributor

@fengjiajie As the test is now evaluating the false positive rate with significantly more samples than what we use to build the filter, or are provided as NDV, might it not be the case that this will increase the false positive rate of the bloom filter to more than the FPP? Perhaps we could try increasing the NDV, or maybe an adaptive bloom filter might be more appropriate? WDYT?

Nevermind, I don't think this should be an issue. I'll be running the test locally in a loop to see if I can reproduce the flake and check how often it might occur.

@amousavigourabi
Copy link
Contributor

@fengjiajie As the test is now evaluating the false positive rate with significantly more samples than what we use to build the filter, or are provided as NDV, might it not be the case that this will increase the false positive rate of the bloom filter to more than the FPP? Perhaps we could try increasing the NDV, or maybe an adaptive bloom filter might be more appropriate? WDYT?

Nevermind, I don't think this should be an issue. I'll be running the test locally in a loop to see if I can reproduce the flake and check how often it might occur.

I got 4 failures in 10k runs. This would mean 4 failures in 1250 full actions runs. These failures were all very slightly out of the expected range for the 0.01 fpp case. Given that we take 200k samples, this might indicate a flaw in the code, as 200k samples of some i.i.d. random variable ~ Bern(0.01) really should not have over 2200 hits that often, if ever. If we want to just fix this test I suggest raising the tolerance to 15%. That should keep it from failing.

Change-Id: Ic230f197b0996333a082bb05bd201963d05d862e
@fengjiajie
Copy link
Contributor Author

@fengjiajie As the test is now evaluating the false positive rate with significantly more samples than what we use to build the filter, or are provided as NDV, might it not be the case that this will increase the false positive rate of the bloom filter to more than the FPP? Perhaps we could try increasing the NDV, or maybe an adaptive bloom filter might be more appropriate? WDYT?

Nevermind, I don't think this should be an issue. I'll be running the test locally in a loop to see if I can reproduce the flake and check how often it might occur.

I got 4 failures in 10k runs. This would mean 4 failures in 1250 full actions runs. These failures were all very slightly out of the expected range for the 0.01 fpp case. Given that we take 200k samples, this might indicate a flaw in the code, as 200k samples of some i.i.d. random variable ~ Bern(0.01) really should not have over 2200 hits that often, if ever. If we want to just fix this test I suggest raising the tolerance to 15%. That should keep it from failing.

@amousavigourabi I agree with increasing fault tolerance. Thank you very much for your review and testing.

@wgtmac wgtmac merged commit 354ddeb into apache:master Oct 19, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants