feat(tpcds-benchmarking): Add basic tpcds benchmarking for local testing #3509

raunakab · 2024-12-06T18:32:39Z

Overview

This PR enables TPC-DS benchmarking on your local computer.

If the TPC-DS parquet data does not exist, it will be created for you (inside of benchmarking/tpcds/data). You can configure the location of this output via the --tpcds-gen-folder argument provided by the script.

Usage

For running with the native runner, run:

DAFT_RUNNER=native python -m benchmarking.tpcds --questions "3"

For executing with the ray runner, run:

DAFT_RUNNER=ray python -m benchmarking.tpcds --questions "3"

You can also specify different scale-factors (e.g., --scale-factor 0.5) and dry-run modes (e.g., --dry-run).

codspeed-hq · 2024-12-06T18:40:42Z

CodSpeed Performance Report

Merging #3509 will degrade performances by 57.41%

_{Comparing feat/tpcds (7f44c02) with main (6390afa)}

Summary

❌ 1 regressions
✅ 16 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`main`	`feat/tpcds`	Change
❌	`test_iter_rows_first_row[100 Small Files]`	127.5 ms	299.4 ms	-57.41%

codecov · 2024-12-06T18:56:15Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.54%. Comparing base (092c354) to head (7f44c02).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3509      +/-   ##
==========================================
- Coverage   77.55%   77.54%   -0.01%     
==========================================
  Files         709      709              
  Lines       86288    86286       -2     
==========================================
- Hits        66917    66911       -6     
- Misses      19371    19375       +4

see 4 files with indirect coverage changes

benchmarking/tpcds/__main__.py

benchmarking/tpcds/datagen.py

benchmarking/tpcds/__main__.py

universalmind303 · 2024-12-06T20:24:33Z

Usage
python -m benchmarking.tpcds \
  --scale-factor 1
  --tpcds-gen-folder data/tpcds
The --scale-factor and --tpcds-gen-folder variables are not needed (default to 0.01 and data/tpcds, respectively).

not necessary, but IMO it'd be nice to add this to the Makefile similar to our dsdgen make command: make dsdgen SCALE_FACTOR=1 OUTPUT_DIR=data/tpcds

something like:

make bench_tpcds SF=1 TPCDS_DIR=data/tpcds

This test currently fails as we underestimate by 2x Co-authored-by: Jay Chia <[email protected]@users.noreply.github.com>

benchmarking/tpcds/__main__.py

raunakab requested review from universalmind303 and jaychia December 6, 2024 18:32

github-actions bot added the feat label Dec 6, 2024

jaychia approved these changes Dec 6, 2024

View reviewed changes

universalmind303 reviewed Dec 6, 2024

View reviewed changes

benchmarking/tpcds/__main__.py Outdated Show resolved Hide resolved

jaychia and others added 2 commits December 9, 2024 16:38

test: Add more size estimation tests from our s3 bucket (#3514)

7595982

This test currently fails as we underestimate by 2x Co-authored-by: Jay Chia <[email protected]@users.noreply.github.com>

Merge branch 'main' into feat/tpcds

1fef660

raunakab force-pushed the feat/tpcds branch from 03136a5 to 1fef660 Compare December 10, 2024 00:39

raunakab enabled auto-merge (squash) December 10, 2024 00:41

raunakab disabled auto-merge December 10, 2024 00:42

raunakab commented Dec 10, 2024

View reviewed changes

benchmarking/tpcds/__main__.py Outdated Show resolved Hide resolved

Raunak Bhagat added 3 commits December 9, 2024 16:43

Remove commented out type

200bad5

Update how dry-run is treated

0852ae8

Merge branch 'main' into feat/tpcds

7f44c02

raunakab enabled auto-merge (squash) December 10, 2024 00:49

raunakab merged commit ba46d07 into main Dec 10, 2024
42 of 43 checks passed

raunakab deleted the feat/tpcds branch December 10, 2024 01:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tpcds-benchmarking): Add basic tpcds benchmarking for local testing #3509

feat(tpcds-benchmarking): Add basic tpcds benchmarking for local testing #3509

raunakab commented Dec 6, 2024 •

edited

Loading

codspeed-hq bot commented Dec 6, 2024 •

edited

Loading

codecov bot commented Dec 6, 2024 •

edited

Loading

universalmind303 commented Dec 6, 2024 •

edited

Loading

Usage

feat(tpcds-benchmarking): Add basic tpcds benchmarking for local testing #3509

feat(tpcds-benchmarking): Add basic tpcds benchmarking for local testing #3509

Conversation

raunakab commented Dec 6, 2024 • edited Loading

Overview

Usage

codspeed-hq bot commented Dec 6, 2024 • edited Loading

CodSpeed Performance Report

Merging #3509 will degrade performances by 57.41%

Summary

Benchmarks breakdown

codecov bot commented Dec 6, 2024 • edited Loading

Codecov Report

universalmind303 commented Dec 6, 2024 • edited Loading

Usage

raunakab commented Dec 6, 2024 •

edited

Loading

codspeed-hq bot commented Dec 6, 2024 •

edited

Loading

codecov bot commented Dec 6, 2024 •

edited

Loading

universalmind303 commented Dec 6, 2024 •

edited

Loading