-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: separate get_result_type from coerce_type
#6221
Conversation
let message = format!( | ||
"Expression {case:?} expected to error due to impossible coercion" | ||
); | ||
assert!(logical_plan.is_err(), "{}", message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type checking is done in the Type coercion analyzer rule
rather than directly in the build
coerce_type
Next, we should separate |
coerce_type
coerce_type
| Operator::NotEq | ||
| Operator::Lt | ||
| Operator::Gt | ||
| Operator::GtEq | ||
| Operator::LtEq | ||
| Operator::IsDistinctFrom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original, when we call get_data_type
, it will return common type instead of result type (Boolean
) in coerce_type
. So original it's wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great step forward -- thank you @jackwener
I think this also closes #3419
Perhaps we can update the comments here https://github.com/apache/arrow-datafusion/blob/4297547df6dc297d692ca82566cfdf135d4730b5/datafusion/expr/src/type_coercion/binary.rs#L111-L121 and update it to say that 'coerce_types' is returns the type the arguments should be coerced to
I am happy to do so in a follow on PR
Thank you for taking this first step to clean up the coercion logic ❤️
|| is_interval(lhs_type) | ||
|| is_interval(rhs_type) => | ||
{ | ||
temporal_add_sub_coercion(lhs_type, rhs_type, op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually I think it would make the code easier to understand and maintain if we can disentangle coerce_types
from get_result_types
. To start in this PR, calling into coerce_types is fine.
Eventually I think it would keep the logic much simpler if get_result_type
is only called on expressions that have their inputs properly coerced.
@@ -77,7 +77,7 @@ impl PhysicalExpr for DateTimeIntervalExpr { | |||
} | |||
|
|||
fn data_type(&self, input_schema: &Schema) -> Result<DataType> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also love to remove DateTimeIntervalExpr
out of its own thing and into binary.rs with the other operators, but for now 👍
@@ -42,7 +42,7 @@ pub fn binary_operator_data_type( | |||
let result_type = if !any_decimal(lhs_type, rhs_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I als find binary_oprator_data_type
to be confusing and redundant with get_result_type
but we can fix that in a follow on PR perhaps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion, I will do in following PR
Co-authored-by: Andrew Lamb <[email protected]>
Thanks @alamb review carefully. |
I prepare to merge this PR because there are some following PR. Anyone who wants to review this PR can continue, and they they can also review these content in the following PR |
One slightly unfortunate quirk of this separation is it isn't clear if the inputs to get_result_type are before or after type coercion. This has non-trivial implications for decimal arithmetic, in particular for division, as type coercion is not idempotent. I'm still thinking about how to handle this better, but thought I would mention it in case you've already given this some thought |
Agree with it. It may be related with #6261 |
In my opinion, Basically I think once the "Analyzer" pass has run on the expr tree the rest of the code shouldn't have to guess / coerce types anymore -- the types would always line up or the query would be rejected. I think this is the standard approach in other databases (and, eg. the type checking / semantic pass in compilers) |
Which issue does this PR close?
related to #5927
Rationale for this change
coerce_type
current mix many motivation in it.I prepare to split it into
get_result_type
andget_common_type
about check type convertion #6223
What changes are included in this PR?
separate get_result_type from coerce_type.
Are these changes tested?
Are there any user-facing changes?