-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start setting up tpch planning benchmarks #8665
Conversation
Field::new("l_discount", DataType::Float64, false), | ||
Field::new("l_tax", DataType::Float64, false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may not be important for this benchmark, but financial amounts should be decimal rather than float. We have these schema definitions already in https://github.com/apache/arrow-datafusion/blob/main/benchmarks/src/tpch/mod.rs#L45 that should match the TPC-H specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must have messed something up when i was cleaning up and creating PR, i had actually copied from that exact location. Thx for catching, will fix.
let q1_sql = std::fs::read_to_string("../../benchmarks/queries/q1.sql").unwrap(); | ||
c.bench_function("physical_plan_tpch_q1", |b| { | ||
b.iter(|| physical_plan(&ctx, &q1_sql)) | ||
}); | ||
|
||
let q12_sql = std::fs::read_to_string("../../benchmarks/queries/q12.sql").unwrap(); | ||
c.bench_function("physical_plan_tpch_q12", |b| { | ||
b.iter(|| physical_plan(&ctx, &q12_sql)) | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is going to add all tpch queries here, I'm wondering if we should create a separate benchmark for tpch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was my understanding the intent from #8638 was to have all planning benchmarks in a single place and less to be able to isolate benchmark per source (tpch / clickbench / etc). that being said, i can see value in that if we wanted to take our plan benchmarks a step further and perhaps compare to other engines.
i dont have a strong opinion either way - happy to go with the consensus here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about having a benchmark like this that planned them all in a single go. Something like
let q1_sql = std::fs::read_to_string("../../benchmarks/queries/q1.sql").unwrap();
let q12_sql = std::fs::read_to_string("../../benchmarks/queries/q12.sql").unwrap();
c.bench_function("physical_plan_tpch", |b| {
b.iter(|| physical_plan(&ctx, &q1_sql))
b.iter(|| physical_plan(&ctx, &q12_sql))
});
And we can add per-query benchmarks if needed / desired?
I am personally torn between per-query benchmarks which would provide more detail, but require more aggregation to summarize and this single number, with less specificity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, there are several ways we can go about this. If there isnt an immediate need for per query benchmarks then i think aggregating could work and then we can add per query plans over time or as needed.
Also, and probably not for this PR, it looks like criterion has some features that would allow you to establish a baseline benchmark (https://bheisler.github.io/criterion.rs/book/user_guide/command_line_options.html#baselines). I'm wondering if that could be useful for integrating into CI to ensure no regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also the option of using benchmark_group which allows you to group function benchmarks in final output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @matthewmturner -- this is looking great to me. I think we can address the feedback and then iterate in subsequent PRs
let q1_sql = std::fs::read_to_string("../../benchmarks/queries/q1.sql").unwrap(); | ||
c.bench_function("physical_plan_tpch_q1", |b| { | ||
b.iter(|| physical_plan(&ctx, &q1_sql)) | ||
}); | ||
|
||
let q12_sql = std::fs::read_to_string("../../benchmarks/queries/q12.sql").unwrap(); | ||
c.bench_function("physical_plan_tpch_q12", |b| { | ||
b.iter(|| physical_plan(&ctx, &q12_sql)) | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about having a benchmark like this that planned them all in a single go. Something like
let q1_sql = std::fs::read_to_string("../../benchmarks/queries/q1.sql").unwrap();
let q12_sql = std::fs::read_to_string("../../benchmarks/queries/q12.sql").unwrap();
c.bench_function("physical_plan_tpch", |b| {
b.iter(|| physical_plan(&ctx, &q1_sql))
b.iter(|| physical_plan(&ctx, &q12_sql))
});
And we can add per-query benchmarks if needed / desired?
I am personally torn between per-query benchmarks which would provide more detail, but require more aggregation to summarize and this single number, with less specificity
I believe CI will be fixed by merging up from main -- the clippy issue was fixed in #8662 |
0f0e478
to
707a018
Compare
@alamb i made updates and went with your approach. with this in place it should be pretty easy to iterate and add more targeted benchmarks as needed. let me know if you think anything else needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @matthewmturner
let q12_sql = std::fs::read_to_string("../../benchmarks/queries/q12.sql").unwrap(); | ||
let q13_sql = std::fs::read_to_string("../../benchmarks/queries/q13.sql").unwrap(); | ||
let q14_sql = std::fs::read_to_string("../../benchmarks/queries/q14.sql").unwrap(); | ||
// let q15_sql = std::fs::read_to_string("../../benchmarks/queries/q15.sql").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be good in a follow on PR to note why this query is commented out.
I think we should get this in and iterate. |
Which issue does this PR close?
Part of #8638
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?