-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet arrow-writer tests take very long to run (over 60 seconds) #3318
Comments
This appears to have started with the change for bloom filter writing in fa1f611 / #3284 Runtime: 63 Minutes: The parquet test job https://github.com/apache/arrow-rs/actions/runs/3650327791/jobs/6166171296 on fa1f611 Runtime: 6 Minutes: The parquet test job https://github.com/apache/arrow-rs/actions/runs/3650326441/jobs/6166167726 on 7d21397 (the previous commit ) |
I think this appears to be a fairly serious performance regression bug and should be investigated prior to releasing 29.0.0 |
I think it is due to some test change I made in ##3284. It should be only limited to test runtime not a general performance issue in parquet writer. I will find some time to refactor the tests and see if it helps. |
I tried to run the standard release verification script for 29.0.0 but it completely saturated my CPU and I killed the test after 10 minutes and it hadn't succeeded in finishing. I suspect similar bad things would happen to others who might try to verify the release |
Looking at a profile, it looks like the bloom filter writer is possibly in need of some optimisation Fortunately I think we can probably just tweak the tests slightly to not enable the bloom filter except for the tests that actually test it. This will unblock the release, although we will want to optimise this in future. I will work on this |
Disable bloom filters for most tests
Describe the bug
Many arrow writer tests take more than 60 seconds to run "has been running for over 60 seconds"
To Reproduce
cargo test -p parquet --features=arrow
Expected behavior
The tests should pass quickly
Additional context
I believe it also affected the codecov jobs that now start timing out - #3302
https://github.com/apache/arrow-rs/actions/runs/3650327783/jobs/6166169908
The text was updated successfully, but these errors were encountered: