-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Substrait insubquery #8363
Substrait insubquery #8363
Conversation
let haystack_expr = &in_predicate.haystack; | ||
if let Some(haystack_expr) = haystack_expr { | ||
let haystack_expr = | ||
from_substrait_rel(ctx, haystack_expr, extensions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to producer, I needed to add ctx
here and the added argument for ctx
in other functions is to support calling from_substrait_rel
recursively
to_substrait_rex(ctx, expr, schema, col_ref_offset, extension_info)?; | ||
|
||
let subquery_plan = | ||
to_substrait_rel(subquery.subquery.as_ref(), ctx, extension_info)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added ctx
here so that I could call to_substrait_rel
recursively. All other functions in this file which have an added argument of ctx
are to support this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, on roundtrip there is an additional projection during TableScan which includes all column of the table, and I had to use assert_expected_plan
here
TableScan: data2 projection=[a, b, c, d, e, f]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is ok -- maybe you could just add a comment to test with this information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this projection is not pushdown-ed to table scan, here is the optimized logical plan I get:
Filter: data.f = Utf8("a") OR data.f = Utf8("b") OR data.f = Utf8("c") OR data.a IN (<subquery>)
Subquery:
Projection: data2.a
Filter: data2.f IN ([Utf8("b"), Utf8("c"), Utf8("d")])
TableScan: data2
TableScan: data projection=[a, f], partial_filters=[data.f = Utf8("a") OR data.f = Utf8("b") OR data.f = Utf8("c") OR data.a IN (<subquery>)]
Subquery:
Projection: data2.a
Filter: data2.f IN ([Utf8("b"), Utf8("c"), Utf8("d")])
TableScan: data2
I am sorry @tallamjr -- this PR got lost somehow when I was reviewing other ones. I will try and review it tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err(DataFusionError::Substrait( | ||
"InPredicate Subquery type must have exactly one Needle expression" | ||
.to_string(), | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use https://docs.rs/datafusion/latest/datafusion/common/macro.substrait_err.html to make the code less verbose
Err(DataFusionError::Substrait( | |
"InPredicate Subquery type must have exactly one Needle expression" | |
.to_string(), | |
)) | |
substrait_err!("InPredicate Subquery type must have exactly one Needle expression") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is ok -- maybe you could just add a comment to test with this information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Others look good to me 👍
Apologize I haven't noticed this PR, there are a few API changes that are in conflict 🥲
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this projection is not pushdown-ed to table scan, here is the optimized logical plan I get:
Filter: data.f = Utf8("a") OR data.f = Utf8("b") OR data.f = Utf8("c") OR data.a IN (<subquery>)
Subquery:
Projection: data2.a
Filter: data2.f IN ([Utf8("b"), Utf8("c"), Utf8("d")])
TableScan: data2
TableScan: data projection=[a, f], partial_filters=[data.f = Utf8("a") OR data.f = Utf8("b") OR data.f = Utf8("c") OR data.a IN (<subquery>)]
Subquery:
Projection: data2.a
Filter: data2.f IN ([Utf8("b"), Utf8("c"), Utf8("d")])
TableScan: data2
Thanks for merging my MR! Happy holidays! |
Which issue does this PR close?
Closes #8362.
Rationale for this change
Support for TPC-DS query set
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
No