Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Boolean Coercion #8331

Closed
wants to merge 9 commits into from
Closed

Conversation

jayzhan211
Copy link
Contributor

Which issue does this PR close?

Ref #8302

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Nov 27, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 -- this is looking pretty close. I left some suggestions -- let me know what you think

@@ -353,6 +354,20 @@ fn string_temporal_coercion(
}
}

/// Coerce `Boolean` to other larger types, like Numeric as `1` or String as "1"
fn bool_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> {
match (lhs_type, rhs_type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also needs to check when rhs_type is a DataType::Boolean as well.

I would expect both of the following queries to work and return the same thing

sDataFusion CLI v33.0.0
❯ select 1 = true;
Error during planning: Cannot infer common argument type for comparison operation Int64 = Boolean
❯ select true = 1;
+--------------------------+
| Boolean(true) = Int64(1) |
+--------------------------+
| true                     |
+--------------------------+
1 row in set. Query took 0.006 seconds.

select make_array(1, 2, 3), make_array(1.0, 2.0, 3.0), make_array('h', 'e', 'l', 'l', 'o');
----
[1, 2, 3] [1.0, 2.0, 3.0] [h, e, l, l, o]
query ??????
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add some tests for other uses of this logic (not just in make_array) such as comparisons

I notice that postgres doesn't handle boolean coercion

postgres=# select true = 1;
ERROR:  operator does not exist: boolean = integer
LINE 1: select true = 1;
                    ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

However, after this PR datafusion does:

DataFusion CLI v33.0.0select true = 1;
+--------------------------+
| Boolean(true) = Int64(1) |
+--------------------------+
| true                     |
+--------------------------+
1 row in set. Query took 0.006 seconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duckdb support boolean coercion, we can consider follow it. If it does not break the overall design, I think we can support it too.

@jayzhan211 jayzhan211 marked this pull request as draft November 27, 2023 23:42
@jayzhan211 jayzhan211 marked this pull request as ready for review November 28, 2023 00:34
@github-actions github-actions bot added the optimizer Optimizer rules label Nov 28, 2023
@jayzhan211 jayzhan211 marked this pull request as draft November 28, 2023 13:04
@github-actions github-actions bot added the physical-expr Physical Expressions label Nov 28, 2023
@github-actions github-actions bot added the sql SQL Planner label Nov 28, 2023
@jayzhan211 jayzhan211 marked this pull request as ready for review November 28, 2023 14:25
@@ -828,7 +828,8 @@ mod tests {
None,
schema.as_ref(),
);
assert!(expr.is_err());
// expr is ok since boolean coercion is supported
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we lost the negative coverage here -- can. you please change the test back to checking that this is an error?

Perhaps you could change it to

CASE WHEN a = 'foo' THEN timestamp(500ns) WHEN a = 'bar' THEN true END

As bool --> timestamp coercion isn't supported

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jayzhan211 -- this is looking very close. I think the code looks 👨‍🍳 👌 and we just need a few more changes to the tests to avoid losing negative coverage

Thanks for pushing this


# bool_coercion
query BBBB
select 1 == true, false == 0, 1.0 == true, false == 'false';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

datafusion/sqllogictest/test_files/union.slt Show resolved Hide resolved
@jayzhan211 jayzhan211 marked this pull request as draft November 29, 2023 00:09
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@@ -799,69 +799,6 @@ mod tests {
Ok(batch)
}

#[test]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to slt

@@ -2190,18 +2190,6 @@ fn union_with_aliases() {
quick_test(sql, expected);
}

#[test]
fn union_with_incompatible_data_types() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to slt

@jayzhan211

This comment was marked as outdated.

@jayzhan211 jayzhan211 marked this pull request as ready for review November 29, 2023 01:25
alamb
alamb previously approved these changes Dec 1, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @jayzhan211

@Dandandan / @andygrove / @viirya any concerns with supporting coercion from bool to other types? It seems like a good feature to me

@jackwener
Copy link
Member

jackwener commented Dec 3, 2023

Boolean Coercion is a dangerous behavior. It will cause some strange behavior.

such as:

If there is some implicit type cast:
1. true = 'true'
2. true = 1

But true = 1 = '1' is conflict with true = 'true'

I suggest to we'd better to keep same behavior with pg.

@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Dec 4, 2023

Boolean Coercion is a dangerous behavior. It will cause some strange behavior.

such as:

If there is some implicit type cast:
1. true = 'true'
2. true = 1

But true = 1 = '1' is conflict with true = 'true'

I suggest to we'd better to keep same behavior with pg.

I agree it causes some confusing behavior. true is coerced to 1 or 'true' depends on other type.

query ?
select make_array(true, 1, 'true');
----
[true, 1, true]

query ?
select make_array(true, 'true', 1);
----
[true, true, 1]

query ?
select make_array(true, 1);
----
[1, 1]

Then, why do DuckDB introduce this 🤔

@alamb alamb dismissed their stale review December 5, 2023 20:47

Per jakevin, let's not merge this until we sort out if it is a good idea

@alamb alamb marked this pull request as draft December 5, 2023 20:47
@jayzhan211 jayzhan211 closed this Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants