type coercion: support is/is_not_`bool`/like/unknown expr #3510

liukun4515 · 2022-09-16T08:24:41Z

Which issue does this PR close?

Closes #3509

block the pr #3396

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

… is_not_false

liukun4515 · 2022-09-16T08:27:31Z

We will add the like,ilike and other expr in the type coercion optimizer rule.

I also has the question about that #3396 (comment)
#3396 (comment).

If we can rewrite these logical expr to binary expr, we can reuse type coercion logic for the binary expr and don't need to add much more match branch for each expr which need type coercion.

codecov-commenter · 2022-09-16T08:54:31Z

Codecov Report

Merging #3510 (ace7c67) into master (3319220) will increase coverage by 0.16%.
The diff coverage is 93.28%.

❗ Current head ace7c67 differs from pull request most recent head e3c35a9. Consider uploading reports for the commit e3c35a9 to get more accurate results

@@            Coverage Diff             @@
##           master    #3510      +/-   ##
==========================================
+ Coverage   85.75%   85.91%   +0.16%     
==========================================
  Files         299      301       +2     
  Lines       55311    56218     +907     
==========================================
+ Hits        47431    48299     +868     
- Misses       7880     7919      +39

Impacted Files	Coverage Δ
datafusion/common/src/lib.rs	`0.00% <ø> (ø)`
...usion/core/src/avro_to_arrow/arrow_array_reader.rs	`0.00% <0.00%> (ø)`
.../core/src/physical_plan/file_format/file_stream.rs	`88.88% <ø> (ø)`
...sion/core/src/physical_plan/file_format/parquet.rs	`94.65% <0.00%> (-0.25%)`	⬇️
datafusion/core/tests/sql/explain_analyze.rs	`83.87% <ø> (ø)`
datafusion/core/tests/user_defined_plan.rs	`87.79% <ø> (ø)`
...ion/physical-expr/src/aggregate/approx_distinct.rs	`51.85% <ø> (ø)`
...sical-expr/src/aggregate/approx_percentile_cont.rs	`81.54% <ø> (ø)`
datafusion/physical-expr/src/aggregate/average.rs	`94.96% <ø> (ø)`
datafusion/physical-expr/src/aggregate/count.rs	`98.05% <ø> (ø)`
... and 52 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

datafusion/expr/src/expr.rs

Co-authored-by: Daniël Heres <[email protected]>

liukun4515 · 2022-09-17T15:01:07Z

@alamb @andygrove @Dandandan PTAL

andygrove · 2022-09-19T15:23:17Z

Thanks @liukun4515 I will review this today or tomorrow

alamb

LGTM -- thanks @liukun4515

alamb · 2022-09-19T20:34:15Z

datafusion/optimizer/src/type_coercion.rs

+                    pattern,
+                    escape_char,
+                };
+                expr.rewrite(&mut self.const_evaluator)


🤔 maybe in the long term figuring out how to avoid having to remember to call the const evaluator each struct would be good -- as it too does a expr tree walk

So this algorithm is at least O(n^2) in the number of items in the expr tree.

I don't think it is important to do in this PR, but I think it is worth keeping in mind as we proceed or if we see the planner being a bit slow

maybe in the long term figuring out how to avoid having to remember to call the const evaluator each struct would be good -- as it too does a expr tree walk

agree with that

I think the SQL level with analyse/expr/optimizer should be refactor.
We are missing some formal logic like spark SQL framework or other SQL framework.
This makes it very difficult for us to do some features or optimization.

I agree the current framework for expr rewriting / analysis is reaching the limits of its existing design. Refactoring the code so it is easier to work with sounds like a good idea to me.

I am open to suggestions -- I think the challenge in any such refactors is not only coming up with a good new design but also figuring out how to incrementally get the existing code into that new structure

alamb · 2022-09-19T20:35:27Z

datafusion/optimizer/src/type_coercion.rs

+                let left_type = expr.get_type(&self.schema)?;
+                let right_type = DataType::Boolean;
+                let coerced_type =
+                    coerce_types(&left_type, &Operator::IsNotDistinctFrom, &right_type)?;


Is there a reason this isn't Operator::IsUnknown?

I think you can got the reason from
#3301
#3246
The unknown will be convert the physical binary op with DistinctFrom operation.
cc @alamb

alamb · 2022-09-19T20:35:51Z

datafusion/optimizer/src/type_coercion.rs

+                let left_type = expr.get_type(&self.schema)?;
+                let right_type = DataType::Boolean;
+                let coerced_type =
+                    coerce_types(&left_type, &Operator::IsDistinctFrom, &right_type)?;


Likewise here, it seem strange that this is not Operator::IsNotUnknown

datafusion/optimizer/src/type_coercion.rs

liukun4515 · 2022-09-20T13:54:27Z

Thanks @liukun4515 I will review this today or tomorrow

waiting for your review.

andygrove · 2022-09-21T00:31:18Z

Thanks @liukun4515 I will review this today or tomorrow

waiting for your review.

Apologies. I have not had time. I see that @alamb has approved this so please don't let me hold this up.

Co-authored-by: Andrew Lamb <[email protected]>

liukun4515 · 2022-09-21T00:55:09Z

Thanks @liukun4515 I will review this today or tomorrow

waiting for your review.

Apologies. I have not had time. I see that @alamb has approved this so please don't let me hold this up.

Thanks for your reply.

You can comment it after it merged.

ursabot · 2022-09-21T01:48:17Z

Benchmark runs are scheduled for baseline = aad82fb and contender = 11abfb9. 11abfb9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

type coercion: support the boolen op, is_true, is_not_true, is_false,…

5e68b31

… is_not_false

github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules labels Sep 16, 2022

liukun4515 requested review from alamb and andygrove September 16, 2022 08:25

Dandandan reviewed Sep 16, 2022

View reviewed changes

datafusion/expr/src/expr.rs Outdated Show resolved Hide resolved

Dandandan reviewed Sep 16, 2022

View reviewed changes

datafusion/expr/src/expr.rs Outdated Show resolved Hide resolved

Dandandan reviewed Sep 16, 2022

View reviewed changes

datafusion/expr/src/expr.rs Outdated Show resolved Hide resolved

Dandandan reviewed Sep 16, 2022

View reviewed changes

datafusion/expr/src/expr.rs Outdated Show resolved Hide resolved

Apply suggestions from code review

f6abe08

Co-authored-by: Daniël Heres <[email protected]>

liukun4515 requested a review from Dandandan September 17, 2022 08:05

liukun4515 marked this pull request as draft September 17, 2022 08:33

liukun4515 marked this pull request as ready for review September 17, 2022 11:23

suport like,unknown for type coercion

67696c2

liukun4515 force-pushed the support_bool_type_coercion_#3509 branch from 52de4d2 to 67696c2 Compare September 17, 2022 11:49

liukun4515 changed the title ~~type coercion: support is/is_not_bool expr~~ type coercion: support is/is_not_bool/like/unknown expr Sep 17, 2022

alamb approved these changes Sep 19, 2022

View reviewed changes

Update datafusion/optimizer/src/type_coercion.rs

e3c35a9

Co-authored-by: Andrew Lamb <[email protected]>

liukun4515 merged commit 11abfb9 into apache:master Sep 21, 2022

liukun4515 deleted the support_bool_type_coercion_#3509 branch September 22, 2022 02:49

This was referenced Sep 22, 2022

Prevent over-allocations (and spills) on sorts with a fixed limit #3593

Merged

Prevent over-allocation (and spills) on TopK queries #3596

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

type coercion: support is/is_not_`bool`/like/unknown expr #3510

type coercion: support is/is_not_`bool`/like/unknown expr #3510

liukun4515 commented Sep 16, 2022 •

edited

Loading

liukun4515 commented Sep 16, 2022 •

edited

Loading

codecov-commenter commented Sep 16, 2022 •

edited

Loading

liukun4515 commented Sep 17, 2022

andygrove commented Sep 19, 2022

alamb left a comment

alamb Sep 19, 2022

liukun4515 Sep 20, 2022

liukun4515 Sep 20, 2022

alamb Sep 21, 2022

alamb Sep 19, 2022

liukun4515 Sep 20, 2022

alamb Sep 19, 2022

liukun4515 commented Sep 20, 2022

andygrove commented Sep 21, 2022

liukun4515 commented Sep 21, 2022

ursabot commented Sep 21, 2022

type coercion: support is/is_not_bool/like/unknown expr #3510

type coercion: support is/is_not_bool/like/unknown expr #3510

Conversation

liukun4515 commented Sep 16, 2022 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

liukun4515 commented Sep 16, 2022 • edited Loading

codecov-commenter commented Sep 16, 2022 • edited Loading

Codecov Report

liukun4515 commented Sep 17, 2022

andygrove commented Sep 19, 2022

alamb left a comment

Choose a reason for hiding this comment

alamb Sep 19, 2022

Choose a reason for hiding this comment

liukun4515 Sep 20, 2022

Choose a reason for hiding this comment

liukun4515 Sep 20, 2022

Choose a reason for hiding this comment

alamb Sep 21, 2022

Choose a reason for hiding this comment

alamb Sep 19, 2022

Choose a reason for hiding this comment

liukun4515 Sep 20, 2022

Choose a reason for hiding this comment

alamb Sep 19, 2022

Choose a reason for hiding this comment

liukun4515 commented Sep 20, 2022

andygrove commented Sep 21, 2022

liukun4515 commented Sep 21, 2022

ursabot commented Sep 21, 2022

type coercion: support is/is_not_`bool`/like/unknown expr #3510

type coercion: support is/is_not_`bool`/like/unknown expr #3510

liukun4515 commented Sep 16, 2022 •

edited

Loading

liukun4515 commented Sep 16, 2022 •

edited

Loading

codecov-commenter commented Sep 16, 2022 •

edited

Loading