-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TPCH-DS planning benchmark #9907
Conversation
let schema = Arc::new(create_schema(column_prefix, num_columns)); | ||
MemTable::try_new(schema, vec![]).map(Arc::new).unwrap() | ||
} | ||
|
||
pub fn create_tpch_schemas() -> [(String, Schema); 8] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just moved this code into test_utils.
@@ -236,40 +144,83 @@ fn criterion_benchmark(c: &mut Criterion) { | |||
}) | |||
}); | |||
|
|||
// --- TPC-H --- | |||
|
|||
let tpch_ctx = register_defs(SessionContext::new(), tpch_schemas()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than using the same context for all tests, I made separate contexts for TPCH and TPC-DS
use arrow::datatypes::{DataType, Field, Schema}; | ||
|
||
/// Schemas for the TPCH tables | ||
pub fn tpch_schemas() -> Vec<TableDef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved out of sql_planner
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
use crate::TableDef; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was moved from tpcds_planning
@@ -1066,577 +1066,3 @@ async fn regression_test(query_no: u8, create_physical: bool) -> Result<()> { | |||
|
|||
Ok(()) | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was just moved to test_utils
|
||
// --- TPC-DS --- | ||
|
||
let tpcds_ctx = register_defs(SessionContext::new(), tpcds_schemas()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the new code
d43aca4
to
d778faa
Compare
FYI @andygrove and @matthewmturner -- I hope this also helps improve the planning performance of DataFusion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how this benchmark can be started?
It can be run with the existing sql_planner suite, like this for example: cargo bench --bench sql_planner To test with just the new tests, something like cargo bench --bench sql_planner -- tpcds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @alamb this will give more accurate details on the planner time
I have a (now not) secret plan to run these benchmarks against the 37.0.0 and report how much better planning time is for 38.0.0 with the work in #9595 from @haohuaijin / @matthewmturner and #9824 from you and some of the improvements I have cooking with @peter-toth / @jayzhan211 in #9637 |
(thank you @comphead ) |
* Move TPCH schema definition * Move schema definitions into test_util * Add tpcds planning benchmark
Which issue does this PR close?
Part of faster planning #5637
Rationale for this change
As we work on improving planning speed (e.g #9637)
it would help to have more planner benchmarks to see how we are doing
What changes are included in this PR?
Are these changes tested?
I tested them manually
Are there any user-facing changes?