Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: allow placeholders to be substituted when coercible #8977

Conversation

erratic-pattern
Copy link
Contributor

@erratic-pattern erratic-pattern commented Jan 24, 2024

Which issue does this PR close?

Closes #8976.

Rationale for this change

It should be possible for the types of parameter values and placeholders to be different, as long as the ScalarValue is coercible to the Expr::Placeholders data type.

What changes are included in this PR?

Removal of the eager == check on data types, which allows type coercion to happen later in planning.

Are these changes tested?

Are there any user-facing changes?

The data_type argument to ParamValues::get_placeholders_with_values is currently unused. If we decide to remove it that would be a user-facing change to the API.

Alternatively, instead of deleting the eager check and leaving the data_type parameter unused, we could improve the eager check with code borrowed from the type coercion module but I'm not sure of the best way to bridge the crate gap there (datafusion-common doesn't depend on datafusion-expr)

@erratic-pattern
Copy link
Contributor Author

I think if ParamValues were moved to datafusion-expr that would solve the dependency loop here. Then it could call can_coerce_from directly.

@erratic-pattern
Copy link
Contributor Author

Actually I think this code from the type coercion optimization rule is the logic we would want to use here

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @kallisti-dev -- this is an important problem to solve

The code in this PR looks ok to me, but needs a test to make sure the problem is solved and to ensure we don't introduce regressions in the future

It would be amazing if you could take a shot at writing some tests, but if you don't have time I can do so as well.

Let me know if this does't make sense

Test Outline

I think the test should probably look something like what we want to do in IOx (which I think is pretty standard). Maybe we can add a test to the core_integration tests in datafusion/core/tests/sql/parameters.rs that does something like:

Step 1: User provides a SQL with placeholder

for example, if id is an i64

SELECT * FROM t1 WHRERE id = $1

DataFusion makes a LogicalPlan (possbly wrapped in a DataFrame)

let plan = ctx.sql(sql_string);

User provides the parameters

(note in this case the user may provide an i32 for id which should be automatically coerced to i64)

let plan = plan
  .with_param_values(vec![
    ("id", ScalarValue::from(3i32)),
  ])

Tests to write;

  1. Simply positive test (described above)
  2. Negative tests (when the parameter value can not be coerced -- e.g. pass a string that is not a valid number)
  3. Use both PREPARE and normal SELECT

@erratic-pattern erratic-pattern force-pushed the fix/allow-placeholder-substition-when-coercible branch from 81f8078 to 56cdb81 Compare January 24, 2024 15:42
@github-actions github-actions bot added the core Core DataFusion crate label Jan 24, 2024
@erratic-pattern
Copy link
Contributor Author

@alamb Added new tests to cover those cases. happy path for named params and prepared statements already existed so I put my new tests next to those.

However I noticed one test that should pass is not. It looks the string "not a number" is being successfully coerced to Int32

.with_param_values(vec![("text", ScalarValue::from("not a number"))])?
.collect()
.await;
assert_eq!(results.unwrap_err().strip_backtrace(), "");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is failing with

---- sql::select::test_parameter_invalid_types stdout ----
thread 'sql::select::test_parameter_invalid_types' panicked at datafusion/core/tests/sql/select.rs:701:24:
called `Result::unwrap_err()` on an `Ok` value: []

I would expect it to not be able to coerce the string

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I am also surprised this works, but apparently strings can be coerced to numbers (null in this case):

❯ create table test (signed int) as values (-1), (0), (1);
0 rows in set. Query took 0.005 seconds.

❯ select * from test;
+--------+
| signed |
+--------+
| -1     |
| 0      |
| 1      |
+--------+
3 rows in set. Query took 0.001 seconds.

❯ select * from test where signed = 'not a number';
0 rows in set. Query took 0.003 seconds.

Maybe you can try something like this:

❯ create table test3 as values ([1,2,3]);
0 rows in set. Query took 0.002 seconds.

❯ select * from test3;
+-----------+
| column1   |
+-----------+
| [1, 2, 3] |
+-----------+
1 row in set. Query took 0.001 seconds.

❯ select * from test3 where column1 = 4;
Arrow error: Invalid argument error: Invalid comparison operation: List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) == List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })

Here is an example of creating a list array: https://docs.rs/arrow/latest/arrow/array/type.ListArray.html#method.from_iter_primitive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the test as described

@erratic-pattern erratic-pattern force-pushed the fix/allow-placeholder-substition-when-coercible branch from 56cdb81 to 4b51890 Compare January 24, 2024 16:13
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jan 24, 2024
alamb
alamb previously approved these changes Jan 24, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking very nice @kallisti-dev 👍

I have a test suggestion on how to trigger an error but I think this PR could also be merged as is (if CI passes) and we could improve the tests as a follow on

.with_param_values(vec![("text", ScalarValue::from("not a number"))])?
.collect()
.await;
assert_eq!(results.unwrap_err().strip_backtrace(), "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I am also surprised this works, but apparently strings can be coerced to numbers (null in this case):

❯ create table test (signed int) as values (-1), (0), (1);
0 rows in set. Query took 0.005 seconds.

❯ select * from test;
+--------+
| signed |
+--------+
| -1     |
| 0      |
| 1      |
+--------+
3 rows in set. Query took 0.001 seconds.

❯ select * from test where signed = 'not a number';
0 rows in set. Query took 0.003 seconds.

Maybe you can try something like this:

❯ create table test3 as values ([1,2,3]);
0 rows in set. Query took 0.002 seconds.

❯ select * from test3;
+-----------+
| column1   |
+-----------+
| [1, 2, 3] |
+-----------+
1 row in set. Query took 0.001 seconds.

❯ select * from test3 where column1 = 4;
Arrow error: Invalid argument error: Invalid comparison operation: List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) == List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })

Here is an example of creating a list array: https://docs.rs/arrow/latest/arrow/array/type.ListArray.html#method.from_iter_primitive

@alamb alamb dismissed their stale review January 24, 2024 17:38

ci is failing

@erratic-pattern erratic-pattern force-pushed the fix/allow-placeholder-substition-when-coercible branch from 4b51890 to 287594d Compare January 24, 2024 17:52
@erratic-pattern
Copy link
Contributor Author

@alamb I've updated the failing test with the suggestion you made to try coercing integer to a list. it should pass now.

@alamb
Copy link
Contributor

alamb commented Jan 24, 2024

@alamb I've updated the failing test with the suggestion you made to try coercing integer to a list. it should pass now.

Looks like there is still a clippy failure @kallisti-dev : https://github.com/apache/arrow-datafusion/actions/runs/7644208478/job/20828392768?pr=8977

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @kallisti-dev

I took the liberty of pushing a fix for clippy to your branch and I plan to merge this when CI passes

@alamb alamb merged commit 94a6192 into apache:main Jan 24, 2024
22 checks passed
@alamb
Copy link
Contributor

alamb commented Jan 24, 2024

Thanks again @kallisti-dev

@erratic-pattern erratic-pattern changed the title fix: allow placeholders to be substited when coercible fix: allow placeholders to be substituted when coercible Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Placeholders cannot be substituted with values when the values type is coercible but not equivalent
2 participants