Table Scan Performance Tests #497

sdd · 2024-07-28T21:48:57Z

This PR adds some performance testing capabilities. It includes the following features:

docker-compose environment that includes containers for Minio, Spark, HAProxy and the Iceberg REST Catalog
Uses HAProxy to simulate real-world latency and bandwidth constraints of connections to services like S3
Includes scripting to create an Iceberg table in the performance testing environment and populate it with data from the widely-used NYC Taxi dataset
Adds a justfile for ease of creating, initialising, starting, stopping and tearing down the performance testing environment
Adds some Criterion benchmarks that use the performance testing environment to test the performance of TableScan.plan_files in four different representative scenarios
Adds some Criterion benchmarks that use the performance testing environment to test the performance of TableScan.to_arrow in four different representative scenarios

The performance tests can be set up and ran by running just perf-run. This will trigger the following actions before actually running the tests. It checks each item to see if it actually needs to be run, skipping if already done on a previous run:

Download NYC taxi data parquets
Spin up docker containers
Create a table
Insert test data from the parquets

sdd · 2024-08-13T19:19:03Z

@Xuanwo and @liurenjie1024: This is now passing and ready for review.

Xuanwo

Thanks a lot for driving this work!

justfile

liurenjie1024

Thanks @sdd for this pr. I just skimmed through it and got your points here. I have some concerns with this approach, for example, I feel this approach is difficult to maintain and extend to other cases. I'm more interested in integrated with datafusion to do such thing, like integration tests and benchmark. What do you think?

…nd execute

…d page

…n measuring performance of row group filtering and row selection

…TSEC-2021-0145

sdd force-pushed the perf-suite branch from a04a701 to a1b6c9b Compare July 29, 2024 22:33

sdd mentioned this pull request Jul 31, 2024

Scan does not work as expected #495

Closed

sdd force-pushed the perf-suite branch from c25a1a9 to db2c8eb Compare August 2, 2024 06:43

sdd mentioned this pull request Aug 2, 2024

Concurrent table scans #373

Merged

sdd force-pushed the perf-suite branch 5 times, most recently from 6d0a7ee to 56f068e Compare August 9, 2024 23:25

sdd mentioned this pull request Aug 9, 2024

Concurrent data file fetching and parallel RecordBatch processing #515

Merged

sdd force-pushed the perf-suite branch from 56f068e to 0c2a071 Compare August 9, 2024 23:58

sdd changed the title ~~feat: performance testing harness and perf tests for scan file plan~~ feat: performance testing harness and perf tests for scan file plan and execution Aug 9, 2024

sdd force-pushed the perf-suite branch from 0c2a071 to 15303ce Compare August 10, 2024 06:30

sdd changed the title ~~feat: performance testing harness and perf tests for scan file plan and execution~~ Table Scan Performance tests Aug 10, 2024

sdd changed the title ~~Table Scan Performance tests~~ Table Scan Performance Tests Aug 10, 2024

sdd force-pushed the perf-suite branch from 15303ce to ba65345 Compare August 12, 2024 18:58

sdd marked this pull request as ready for review August 13, 2024 19:18

sdd force-pushed the perf-suite branch 2 times, most recently from f90d2d4 to a00b32a Compare August 15, 2024 20:35

Xuanwo reviewed Aug 16, 2024

View reviewed changes

justfile Show resolved Hide resolved

This was referenced Aug 16, 2024

Table Scan: Add Row Group Skipping #558

Merged

Table Scan: Add Row Selection Filtering #565

Merged

liurenjie1024 reviewed Aug 21, 2024

View reviewed changes

sdd added 4 commits October 4, 2024 07:15

feat: performance testing harness and perf tests for scan file plan a…

be4512d

…nd execute

docs: add comment with link to original NYC taxi dataset file downloa…

8296f01

…d page

test: update perf test table config and benches to be more useful whe…

1a5b26a

…n measuring performance of row group filtering and row selection

deps: update criterion to mitigate https://rustsec.org/advisories/RUS…

7ac2c0d

…TSEC-2021-0145

sdd force-pushed the perf-suite branch from 3c80df5 to 7ac2c0d Compare October 4, 2024 06:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table Scan Performance Tests #497

Table Scan Performance Tests #497

sdd commented Jul 28, 2024 •

edited

Loading

sdd commented Aug 13, 2024

Xuanwo left a comment

liurenjie1024 left a comment

Table Scan Performance Tests #497

Are you sure you want to change the base?

Table Scan Performance Tests #497

Conversation

sdd commented Jul 28, 2024 • edited Loading

sdd commented Aug 13, 2024

Xuanwo left a comment

Choose a reason for hiding this comment

liurenjie1024 left a comment

Choose a reason for hiding this comment

sdd commented Jul 28, 2024 •

edited

Loading