-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow place holders like $1
in more types of queries.
#13632
Conversation
Previously, a query like `SELECT $1;` would fail to generate a LogicalPlan. With this change these queries are now workable. This required creating a new LogicalPlan::validate_parametere_types that is called from Optimizer::optimize to assert that all parameter types have been inferred correctly.
This is a straight copy of the `datafusion/expr` directory in the `apache/datafusion` git repository. This repo exist while my PR to Datafusion isn't on crates.io. apache/datafusion#13632
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @davisp -- very much appreciated.
I have some suggestions on how to make this PR better (specifically by not adding a new check, but using an existing check). However, I think we could improve it as a follow on PR as well and this could be merged as is.
Thank you again
@@ -2014,7 +2014,7 @@ async fn test_dataframe_placeholder_missing_param_values() -> Result<()> { | |||
assert!(optimized_plan | |||
.unwrap_err() | |||
.to_string() | |||
.contains("Placeholder type could not be resolved. Make sure that the placeholder is bound to a concrete type, e.g. by providing parameter values.")); | |||
.contains("Placeholder type for '$0' could not be resolved. Make sure that the placeholder is bound to a concrete type, e.g. by providing parameter values.")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
); | ||
|
||
// The placeholder is not replaced with a value, | ||
// so the filter data type is not know, i.e. a = $0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// so the filter data type is not know, i.e. a = $0. | |
// so the filter data type is not known, i.e. a = $0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed here: e1d66ac
// so the filter data type is not know, i.e. a = $0. | ||
// Therefore, the optimization fails. | ||
let optimized_plan = ctx.state().optimize(logical_plan); | ||
assert!(optimized_plan.is_err()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line is redundant -- as unwrap_err()
will panic if optimize_plan is not an err
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed here: e1d66ac
); | ||
|
||
// The placeholder is not replaced with a value, | ||
// so the filter data type is not know, i.e. a = $0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// so the filter data type is not know, i.e. a = $0. | |
// so the filter data type is not known, i.e. a = $0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed here: 3d645e0
@@ -357,6 +357,7 @@ impl Optimizer { | |||
{ | |||
let start_time = Instant::now(); | |||
let options = config.options(); | |||
plan.validate_parameter_types()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking parameter types doesn't seem to logically belong to the optimizer which is supposed to preserve the semantics of each plan but potentially make them faster.
https://docs.rs/datafusion/latest/datafusion/optimizer/trait.OptimizerRule.html
OptimizerRules transforms one LogicalPlan into another which computes the same results, but in a potentially more efficient way. If there are no suitable transformations for the input plan, the optimizer should simply return it unmodified.
I think the Analyzer
might be a more appropriate place to put this check:
https://docs.rs/datafusion/latest/datafusion/optimizer/struct.Analyzer.html#method.execute_and_check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although thinking about it some more, if we are going to defer the error anyways, I think the cleanest solution would be to move the error to when it is needed
When I removed the specific check in the Optimizer and tried to run the queries, it failed with this error:
This feature is not implemented: Physical plan does not support logical expression Placeholder(Placeholder { id: "$1", data_type: None })
Maybe rather than a special check we could simply make this error better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(we can do this in a follow on PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought your suggestion was a lot cleaner and have pushed e5325c3 to address it. I've updated the dataframe tests, added a few to the core/tests/sql/select.rs module, and added a PREPARE
test to the sqllogic test showing how things fail there as well.
Let me know if you can think of anywhere else I should add tests.
@@ -571,10 +571,6 @@ Dml: op=[Insert Into] table=[test_decimal] | |||
"INSERT INTO person (id, first_name, last_name) VALUES ($1, $2, $3, $4)", | |||
"Error during planning: Placeholder $4 refers to a non existent column" | |||
)] | |||
#[case::placeholder_type_unresolved( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens now in this case? Does it pass successfully?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this PR is intentionally about allowing the creation of LogicalPlan
's with placeholders. Previously, LogicalPlan
creation would fail when any placeholder is part of the schema. Given that all this test is doing is creating LogicalPlan's, this example now passes. I thought about moving it to a new test that explicitly shows it failing at the optimization step but then felt that that case was already covered by the other test updates.
This is great ❤️ |
Previously, these errors would occurr during optimization. Now that we allow unbound placeholders through the optimizer they fail when creating the physical plan instead.
53f8e91
to
e5325c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is beautiful -- thank you so much @davisp
$1
in more types of queries.
* Allow place holders in the column list Previously, a query like `SELECT $1;` would fail to generate a LogicalPlan. With this change these queries are now workable. This required creating a new LogicalPlan::validate_parametere_types that is called from Optimizer::optimize to assert that all parameter types have been inferred correctly. * Remove redundant asserts * Fix typo in comments * Add comment explaining DataType::Null * Move unbound placeholder error to physical-expr Previously, these errors would occurr during optimization. Now that we allow unbound placeholders through the optimizer they fail when creating the physical plan instead. * Fix expected error message
Previously, a query like
SELECT $1;
would fail to generate a LogicalPlan. With this change these queries are now workable. This required creating a new LogicalPlan::validate_parametere_types that is called from Optimizer::optimize to assert that all parameter types have been inferred.Which issue does this PR close?
Closes #5617
This might also close an untracked step from #4539 which is listed as "Support PREPARE statements without explicit parameters". The tests I updated mentioned a
TODO
for supporting placeholders without types which is now shown to be fixed in this PR. But I'm uncertain enough on the listed step in that epic to pre-emptively create an issue to attach to it. If this does cover it, I would be more than happy to open the issue and add a comment to the epic as well as updating this PR description.Rationale for this change
Initially I just wanted to support
SELECT $1;
in a FlightSQL server. It appears fixing that has also fixed a number of other issues.What changes are included in this PR?
This removes the error from
ExprSchemable::get_type
when a Placeholder has no specified type. I believe this error to be a bug based on the fact that its suggesting that the error can be fixed by providing parameter values and the fact that the error is triggered when creating a LogicalPlan.I've instead added a new
LogicalPlan::validate_placeholder_types
method which is invoked at the start of the Optimizer::optimize method.I should note, that this PR may also be related to #8819. While developing this, I noticed that the Optimizer seems perfectly happy to optimize plans where some datatypes are
DataType::Null
. I have absolutely nowhere near enough context to know if that's something that should be allowed or not so I added thevalidate_placeholder_types
so that the old behavior is maintained where optimization fails on untyped placeholders.Are these changes tested?
Yep. I've updated/removed old tests that asserted the broken behavior and added tests that cover both my case of
SELECT $1;
as well asSELECT * FROM foo WHERE col LIKE $1
which covers #5617.Are there any user-facing changes?
Some queries that would have failed planning may now fail at optimization time instead? Not sure if that's important. Also, because I was moving the error message anyway, I included the Placeholder.id in the error text which seemed like it'd help improve users' lives.