[FEAT] Cap parallelism on local parquet reader #3310

colin-ho · 2024-11-18T13:58:47Z

Implement a dynamically parallel local streaming parquet reader.

Background

The current streaming local parquet reader, while fast and streaming, has some problems:

It reads and deserializes ALL row groups and ALL columns in parallel.
It does not respect downstream back-pressure (the crossbeam channels are all bounded by max chunks, it's free to fill it up).

This leads to unnecessarily high memory usage, and it potentially starves downstream tasks.

Solution

Instead of launching all tasks at once, we can cap the number of parallel tasks based on certain factors:

Number of CPUs
Number of Columns.

Results

Most glaringly, the benefits of these are in memory usage of streaming queries, for example:

next(daft.read_parquet("data/tpch-dbgen/1_0/1/parquet/lineitem").iter_partitions()) # read lineitem tpch sf1

The new implementation hits a peak of 300mb, while the old goes over 1gb.

Another example, where we stream the entire file, but the consumption is slow:

for _ in daft.read_parquet("/Users/colinho/Desktop/Daft/z/daft_tpch_100g_32part_64RG.parquet").iter_partitions():
    time.sleep(0.1)

The new implementation hits a peak of 1.2gb, while the old goes over 3gb.

To maintain perfomance parity, I also wrote some benchmarks for parquet files with differing rows / cols / row groups, the results show that the new implementation is pretty much on par, with some slight differences.

On reading a tpch sf-1 lineitem table though: the results are pretty much the same: (~0.2s)

codspeed-hq · 2024-11-18T14:09:08Z

CodSpeed Performance Report

Merging #3310 will improve performances by ×2.2

_{Comparing colin/dynamic-parquet (6fed790) with main (6d30e30)}

Summary

⚡ 2 improvements
✅ 15 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`colin/dynamic-parquet`	Change
⚡	`test_iter_rows_first_row[100 Small Files]`	340.5 ms	212.2 ms	+60.44%
⚡	`test_show[100 Small Files]`	34.9 ms	15.6 ms	×2.2

codecov · 2024-11-18T14:48:46Z

Codecov Report

Attention: Patch coverage is 88.18182% with 26 lines in your changes missing coverage. Please review.

Project coverage is 77.34%. Comparing base (6d30e30) to head (6fed790).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/daft-parquet/src/stream_reader.rs	88.58%	25 Missing ⚠️
src/daft-parquet/src/read.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3310      +/-   ##
==========================================
+ Coverage   77.00%   77.34%   +0.33%     
==========================================
  Files         696      696              
  Lines       86039    84849    -1190     
==========================================
- Hits        66256    65628     -628     
+ Misses      19783    19221     -562

Files with missing lines	Coverage Δ
src/daft-parquet/src/read.rs	`75.22% <0.00%> (-0.08%)`	⬇️
src/daft-parquet/src/stream_reader.rs	`89.85% <88.58%> (+1.57%)`	⬆️

... and 10 files with indirect coverage changes

colin-ho · 2024-11-22T15:08:22Z

src/daft-parquet/src/semaphore.rs

+            // Only increase permits if compute time is significantly higher than IO time,
+            // and waiting time is not too high.
+            if compute_ratio > Self::COMPUTE_THRESHOLD && wait_ratio < Self::WAIT_THRESHOLD {


Some ideas to consider:

Maybe we don't need to consider IO time and just consider the wait time.

Can this semaphore become generic and used for other I/O code, as well as the local executor? Would be cool if we could dynamically adjust degree of operator parallelism as well.

Can we decrease the permit count in addition to increase?

Can we add memory pressure to the semaphore?

samster25

forgot submitting my review from yesterday lol

benchmarking/parquet/test_local.py

src/common/runtime/src/lib.rs

src/daft-parquet/src/semaphore.rs

src/daft-parquet/src/stream_reader.rs

colin-ho · 2024-12-04T17:29:29Z

Tested this new implementation on TPCH SF1:

Implement a parallelism cap on remote parquet tasks, and use compute runtime instead of rayon (swordfish reads only). Follow on from #3310 which implemented it for local. Benchmarks in comments below --------- Co-authored-by: Colin Ho <[email protected]> Co-authored-by: EC2 Default User <[email protected]> Co-authored-by: Colin Ho <[email protected]>

dynamic parquet reading

f107208

github-actions bot added the enhancement New feature or request label Nov 18, 2024

table new with size

0ecb1ad

benchmarking test

1fc2019

colin-ho marked this pull request as ready for review November 18, 2024 15:48

more comments

460b060

colin-ho requested a review from samster25 November 18, 2024 23:52

desmondcheongzx mentioned this pull request Nov 21, 2024

Swordfish performance issues for large machines with lots of CPUs #3389

Open

colin-ho commented Nov 22, 2024

View reviewed changes

samster25 reviewed Dec 3, 2024

View reviewed changes

benchmarking/parquet/test_local.py Outdated Show resolved Hide resolved

src/common/runtime/src/lib.rs Outdated Show resolved Hide resolved

src/daft-parquet/src/semaphore.rs Outdated Show resolved Hide resolved

set parallelism as ratio of threads -> cols

7e4e5f5

samster25 approved these changes Dec 4, 2024

View reviewed changes

src/daft-parquet/src/stream_reader.rs Outdated Show resolved Hide resolved

src/daft-parquet/src/stream_reader.rs Outdated Show resolved Hide resolved

src/daft-parquet/src/stream_reader.rs Outdated Show resolved Hide resolved

EC2 Default User and others added 5 commits December 4, 2024 07:51

Merge branch 'main' into colin/dynamic-parquet

95cb37b

try join all

beef9fd

style

dd13b0e

style

404ac64

no need spawn detached

6fed790

colin-ho merged commit de4fe50 into main Dec 4, 2024
44 checks passed

colin-ho deleted the colin/dynamic-parquet branch December 4, 2024 17:30

colin-ho mentioned this pull request Dec 5, 2024

feat(parquet): Limit parallel tasks in remote parquet reader #3490

Merged

colin-ho changed the title ~~[FEAT] Dynamically parallel local parquet reader~~ [FEAT] Cap parallelism on local parquet reader Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Cap parallelism on local parquet reader #3310

[FEAT] Cap parallelism on local parquet reader #3310

colin-ho commented Nov 18, 2024 •

edited

Loading

codspeed-hq bot commented Nov 18, 2024 •

edited

Loading

codecov bot commented Nov 18, 2024 •

edited

Loading

colin-ho Nov 22, 2024 •

edited

Loading

samster25 left a comment

colin-ho commented Dec 4, 2024

[FEAT] Cap parallelism on local parquet reader #3310

[FEAT] Cap parallelism on local parquet reader #3310

Conversation

colin-ho commented Nov 18, 2024 • edited Loading

Background

Solution

Results

codspeed-hq bot commented Nov 18, 2024 • edited Loading

CodSpeed Performance Report

Merging #3310 will improve performances by ×2.2

Summary

Benchmarks breakdown

codecov bot commented Nov 18, 2024 • edited Loading

Codecov Report

colin-ho Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

samster25 left a comment

Choose a reason for hiding this comment

colin-ho commented Dec 4, 2024

colin-ho commented Nov 18, 2024 •

edited

Loading

codspeed-hq bot commented Nov 18, 2024 •

edited

Loading

codecov bot commented Nov 18, 2024 •

edited

Loading

colin-ho Nov 22, 2024 •

edited

Loading