Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Add ability to array-ify args and run multiple jobs #3584

Merged
merged 17 commits into from
Dec 17, 2024
Merged

Conversation

raunakab
Copy link
Contributor

Overview

Previously, the run-cluster workflow only ran one ray-job-submission. This PR extends the ability to be able to run any arbitrary array of job submissions by enabling us to pass an array into the entrypoint_args input param. This then splits the command into its multiple pieces and submits them all.

Example Usage

gh workflow run run-cluster.yaml \
    --ref $current_branch \
    -f working_dir="." \
    -f daft_wheel_url="https://github-actions-artifacts-bucket.s3.us-west-2.amazonaws.com/builds/54428e3738e96764af60cfdd8a0e4a41717ec9f9/getdaft-0.3.0.dev0-cp38-abi3-manylinux_2_31_x86_64.whl" \
    -f entrypoint_script="benchmarking/tpcds/ray_entrypoint.py" \
    -f entrypoint_args="[\"--tpcds-gen-folder='gendata' --question='1'\", \"--tpcds-gen-folder='gendata' --question='2'\"]"

The above invocation runs TPC-DS queries 1 and 2.

@github-actions github-actions bot added the ci label Dec 16, 2024
.github/ci-scripts/job_runner.py Outdated Show resolved Hide resolved
.github/ci-scripts/job_runner.py Outdated Show resolved Hide resolved
@raunakab
Copy link
Contributor Author

There are some hardcoded values in the runner script (that is run on the runner node, not the ray-head node).

However, we don't have a mechanism to pass values along to the runner script. I'm thinking of creating one named runner_scripts_args. Thoughts @jaychia?

Copy link

codspeed-hq bot commented Dec 16, 2024

CodSpeed Performance Report

Merging #3584 will degrade performances by 37.6%

Comparing ci/run-cluster (5b5a9f9) with main (e148248)

Summary

❌ 1 regressions
✅ 26 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main ci/run-cluster Change
test_iter_rows_first_row[100 Small Files] 142.9 ms 229.1 ms -37.6%

Copy link
Contributor

@jaychia jaychia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be generating TPC-DS data for every run-cluster command.

Not sure what's going on there, also you're generating this on the runner, but we need the data to be available in S3?

benchmarking/test.py Outdated Show resolved Hide resolved
benchmarking/tpcds/ray_entrypoint.py Outdated Show resolved Hide resolved
.github/workflows/run-cluster.yaml Outdated Show resolved Hide resolved
.github/ci-scripts/job_runner.py Outdated Show resolved Hide resolved
.github/ci-scripts/job_runner.py Outdated Show resolved Hide resolved
@raunakab
Copy link
Contributor Author

Copy link

codecov bot commented Dec 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.82%. Comparing base (6c21917) to head (ff89642).
Report is 3 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3584      +/-   ##
==========================================
+ Coverage   77.79%   77.82%   +0.02%     
==========================================
  Files         716      716              
  Lines       87991    88243     +252     
==========================================
+ Hits        68455    68673     +218     
- Misses      19536    19570      +34     

see 3 files with indirect coverage changes

@raunakab raunakab requested a review from jaychia December 17, 2024 00:36
Copy link
Contributor

@jaychia jaychia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we point to a successful job run as well?

.github/ci-scripts/job_runner.py Show resolved Hide resolved
.github/workflows/run-cluster.yaml Show resolved Hide resolved
@raunakab
Copy link
Contributor Author

Example of a successful run (run on just 1 argument, --question=3 --scale-factor=100):
https://github.com/Eventual-Inc/Daft/actions/runs/12366475502

@raunakab
Copy link
Contributor Author

Example of a successful run (run on multiple arguments, ["--question=1 --scale-factor=100", "--question=3 --scale-factor=100"]):
https://github.com/Eventual-Inc/Daft/actions/runs/12380369256

@raunakab raunakab enabled auto-merge (squash) December 17, 2024 20:12
@raunakab raunakab merged commit b7ea62b into main Dec 17, 2024
40 of 41 checks passed
@raunakab raunakab deleted the ci/run-cluster branch December 17, 2024 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants