Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: tpch + tpcds GHA launcher #3619

Merged
merged 18 commits into from
Jan 2, 2025
Merged

feat: tpch + tpcds GHA launcher #3619

merged 18 commits into from
Jan 2, 2025

Conversation

raunakab
Copy link
Contributor

@raunakab raunakab commented Dec 19, 2024

Overview

This PR adds a "tpch" and "tpcds" launcher to the available tools. Allows you to easily scale up a ray-cluster and run queries against it.

Usage

In order to run tpcds, run the following:

uv run tools/tpch.py --scale-factor=2 --num-partitions=2 --questions='1-10'

In order to run tpcds, run the following:

uv run tools/tpcds.py --scale-factor=100 --questions='1-10'

As always, if you want help, run uv run tools/tpch.py --help or uv run tools/tpcds.py --help.

@github-actions github-actions bot added the feat label Dec 19, 2024
@raunakab raunakab marked this pull request as ready for review December 19, 2024 18:15
@raunakab raunakab mentioned this pull request Dec 19, 2024
Copy link

codspeed-hq bot commented Dec 19, 2024

CodSpeed Performance Report

Merging #3619 will improve performances by 50.73%

Comparing tpcds-wrapper (423367c) with main (e59581c)

Summary

⚡ 1 improvements
✅ 26 untouched benchmarks

Benchmarks breakdown

Benchmark main tpcds-wrapper Change
test_show[100 Small Files] 23.8 ms 15.8 ms +50.73%

Copy link

codecov bot commented Dec 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.99%. Comparing base (e59581c) to head (30cdb7b).
Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #3619   +/-   ##
=======================================
  Coverage   77.99%   77.99%           
=======================================
  Files         720      720           
  Lines       88794    88796    +2     
=======================================
+ Hits        69252    69258    +6     
+ Misses      19542    19538    -4     

see 4 files with indirect coverage changes

@universalmind303
Copy link
Contributor

@raunakab It doesn't look like the prompt properly gets printed unless you have a really wide terminal:

image

@raunakab
Copy link
Contributor Author

raunakab commented Dec 19, 2024

@raunakab It doesn't look like the prompt properly gets printed unless you have a really wide terminal:

image

@universalmind303 Oh that's strange. I can throw in an edit there soon. If you want to get by that for now, just type in a y (yes) or an n (no).

Copy link
Contributor

@universalmind303 universalmind303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing i'm worried about is discovery for this.

I know I'm not going to remember uv run tools/tpcds.py --scale-factor=100 --questions='1-10' --cluster-profile='medium-x86'

does uv have any built in discovery for scripts?

@raunakab
Copy link
Contributor Author

one thing i'm worried about is discovery for this.

I know I'm not going to remember uv run tools/tpcds.py --scale-factor=100 --questions='1-10' --cluster-profile='medium-x86'

does uv have any built in discovery for scripts?

Hmm, that is a good point. This might be something that @samster25 might know about. I'll try to see if something can be fashioned to help with discoverability.

@universalmind303
Copy link
Contributor

One other improvement that could be made. When I run the command, there's not a easy to use output, and I need to go dig through the logs to find out what even happened.

image

is it possible to customize the "build summary" with basic information about the run

@raunakab
Copy link
Contributor Author

raunakab commented Dec 19, 2024

One other improvement that could be made. When I run the command, there's not a easy to use output, and I need to go dig through the logs to find out what even happened.

image is it possible to customize the "build summary" with basic information about the run

@universalmind303 Yes, that is a point that I found annoying. I'm currently working on that right now.

My current thought is to produce an output CSV file which can be downloaded and viewed. It would list the queries, how long each one took, and any failures observed.

@raunakab raunakab changed the title feat: tpcds GHA launcher feat: tpch + tpcds GHA launcher Dec 19, 2024
@raunakab
Copy link
Contributor Author

@universalmind303, here is another PR which aims to make the outputs of runs nicer to visualize: #3625.

The first run is still running right now, but you should be able to see an output.csv file uploaded to GitHub for you to download and view.

The run is here:
https://github.com/Eventual-Inc/Daft/actions/runs/12420945783

@jaychia
Copy link
Contributor

jaychia commented Dec 20, 2024

WRT discoverability, once we have more concrete workflows we can start organizing things as uv tools

https://docs.astral.sh/uv/guides/tools/#running-tools

We can probably have a daft-bench tool which is its own CLI that can be invoked from uv. Can include things such as data generation, running benchmarks etc.

Copy link
Contributor

@jaychia jaychia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, some comments

tools/tpcds.py Outdated Show resolved Hide resolved
tools/tpcds.py Outdated Show resolved Hide resolved
tools/tpcds.py Show resolved Hide resolved
tools/utils.py Outdated Show resolved Hide resolved
tools/utils.py Outdated Show resolved Hide resolved
tools/utils.py Outdated Show resolved Hide resolved
tools/utils.py Outdated Show resolved Hide resolved
@raunakab
Copy link
Contributor Author

raunakab commented Jan 2, 2025

Example of successful TPCH run (invoked via uv run tools/tpch.py --scale-factor=2 --num-partitions=2 --questions='1,2':
https://github.com/Eventual-Inc/Daft/actions/runs/12588718493

Example of successful TPC-DS run (invoked via uv run tools/tpcds.py --scale-factor=100 --questions='1,2':
https://github.com/Eventual-Inc/Daft/actions/runs/12588712812

@raunakab raunakab merged commit 39bb62c into main Jan 2, 2025
49 of 51 checks passed
raunakab added a commit that referenced this pull request Jan 2, 2025
# Overview

This PR adds a "tpch" and "tpcds" launcher to the available tools.
Allows you to easily scale up a ray-cluster and run queries against it.

## Usage

In order to run tpcds, run the following:

```sh
uv run tools/tpch.py --scale-factor=2 --num-partitions=2 --questions='1-10'
```

In order to run tpcds, run the following:

```sh
uv run tools/tpcds.py --scale-factor=100 --questions='1-10'
```

As always, if you want help, run `uv run tools/tpch.py --help` or `uv
run tools/tpcds.py --help`.
@raunakab raunakab deleted the tpcds-wrapper branch January 2, 2025 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants