Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(tpcds-benchmarking): Add basic tpcds benchmarking for local testing #3509

Merged
merged 5 commits into from
Dec 10, 2024

Conversation

raunakab
Copy link
Contributor

@raunakab raunakab commented Dec 6, 2024

Overview

This PR enables TPC-DS benchmarking on your local computer.

If the TPC-DS parquet data does not exist, it will be created for you (inside of benchmarking/tpcds/data). You can configure the location of this output via the --tpcds-gen-folder argument provided by the script.

Usage

For running with the native runner, run:

DAFT_RUNNER=native python -m benchmarking.tpcds --questions "3"

For executing with the ray runner, run:

DAFT_RUNNER=ray python -m benchmarking.tpcds --questions "3"

You can also specify different scale-factors (e.g., --scale-factor 0.5) and dry-run modes (e.g., --dry-run).

Copy link

codspeed-hq bot commented Dec 6, 2024

CodSpeed Performance Report

Merging #3509 will degrade performances by 57.41%

Comparing feat/tpcds (7f44c02) with main (6390afa)

Summary

❌ 1 regressions
✅ 16 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main feat/tpcds Change
test_iter_rows_first_row[100 Small Files] 127.5 ms 299.4 ms -57.41%

Copy link

codecov bot commented Dec 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.54%. Comparing base (092c354) to head (7f44c02).
Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3509      +/-   ##
==========================================
- Coverage   77.55%   77.54%   -0.01%     
==========================================
  Files         709      709              
  Lines       86288    86286       -2     
==========================================
- Hits        66917    66911       -6     
- Misses      19371    19375       +4     

see 4 files with indirect coverage changes

benchmarking/tpcds/__main__.py Outdated Show resolved Hide resolved
benchmarking/tpcds/__main__.py Outdated Show resolved Hide resolved
benchmarking/tpcds/__main__.py Outdated Show resolved Hide resolved
benchmarking/tpcds/__main__.py Outdated Show resolved Hide resolved
benchmarking/tpcds/__main__.py Outdated Show resolved Hide resolved
benchmarking/tpcds/datagen.py Show resolved Hide resolved
@universalmind303
Copy link
Contributor

universalmind303 commented Dec 6, 2024

Usage

python -m benchmarking.tpcds \
  --scale-factor 1
  --tpcds-gen-folder data/tpcds

The --scale-factor and --tpcds-gen-folder variables are not needed (default to 0.01 and data/tpcds, respectively).

not necessary, but IMO it'd be nice to add this to the Makefile similar to our dsdgen make command: make dsdgen SCALE_FACTOR=1 OUTPUT_DIR=data/tpcds

something like:

make bench_tpcds SF=1 TPCDS_DIR=data/tpcds

jaychia and others added 2 commits December 9, 2024 16:38
This test currently fails as we underestimate by 2x

Co-authored-by: Jay Chia <[email protected]@users.noreply.github.com>
@raunakab raunakab enabled auto-merge (squash) December 10, 2024 00:41
@raunakab raunakab disabled auto-merge December 10, 2024 00:42
@raunakab raunakab enabled auto-merge (squash) December 10, 2024 00:49
@raunakab raunakab merged commit ba46d07 into main Dec 10, 2024
42 of 43 checks passed
@raunakab raunakab deleted the feat/tpcds branch December 10, 2024 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants