-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Cap parallelism on local parquet reader #3310
Conversation
CodSpeed Performance ReportMerging #3310 will improve performances by ×2.2Comparing Summary
Benchmarks breakdown
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3310 +/- ##
==========================================
+ Coverage 77.00% 77.34% +0.33%
==========================================
Files 696 696
Lines 86039 84849 -1190
==========================================
- Hits 66256 65628 -628
+ Misses 19783 19221 -562
|
src/daft-parquet/src/semaphore.rs
Outdated
// Only increase permits if compute time is significantly higher than IO time, | ||
// and waiting time is not too high. | ||
if compute_ratio > Self::COMPUTE_THRESHOLD && wait_ratio < Self::WAIT_THRESHOLD { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some ideas to consider:
- Maybe we don't need to consider IO time and just consider the wait time.
- Can this semaphore become generic and used for other I/O code, as well as the local executor? Would be cool if we could dynamically adjust degree of operator parallelism as well.
- Can we decrease the permit count in addition to increase?
- Can we add memory pressure to the semaphore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgot submitting my review from yesterday lol
Implement a parallelism cap on remote parquet tasks, and use compute runtime instead of rayon (swordfish reads only). Follow on from #3310 which implemented it for local. Benchmarks in comments below --------- Co-authored-by: Colin Ho <[email protected]> Co-authored-by: EC2 Default User <[email protected]> Co-authored-by: Colin Ho <[email protected]>
Implement a dynamically parallel local streaming parquet reader.
Background
The current streaming local parquet reader, while fast and streaming, has some problems:
This leads to unnecessarily high memory usage, and it potentially starves downstream tasks.
Solution
Instead of launching all tasks at once, we can cap the number of parallel tasks based on certain factors:
Results
Most glaringly, the benefits of these are in memory usage of streaming queries, for example:
The new implementation hits a peak of 300mb, while the old goes over 1gb.
Another example, where we stream the entire file, but the consumption is slow:
The new implementation hits a peak of 1.2gb, while the old goes over 3gb.
To maintain perfomance parity, I also wrote some benchmarks for parquet files with differing rows / cols / row groups, the results show that the new implementation is pretty much on par, with some slight differences.
On reading a tpch sf-1 lineitem table though: the results are pretty much the same: (~0.2s)