Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type coercion: support is/is_not_bool/like/unknown expr #3510

Merged
merged 4 commits into from
Sep 21, 2022

Conversation

liukun4515
Copy link
Contributor

@liukun4515 liukun4515 commented Sep 16, 2022

Which issue does this PR close?

Closes #3509

block the pr #3396

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules labels Sep 16, 2022
@liukun4515
Copy link
Contributor Author

liukun4515 commented Sep 16, 2022

We will add the like,ilike and other expr in the type coercion optimizer rule.

I also has the question about that #3396 (comment)
#3396 (comment).

If we can rewrite these logical expr to binary expr, we can reuse type coercion logic for the binary expr and don't need to add much more match branch for each expr which need type coercion.

@codecov-commenter
Copy link

codecov-commenter commented Sep 16, 2022

Codecov Report

Merging #3510 (ace7c67) into master (3319220) will increase coverage by 0.16%.
The diff coverage is 93.28%.

❗ Current head ace7c67 differs from pull request most recent head e3c35a9. Consider uploading reports for the commit e3c35a9 to get more accurate results

@@            Coverage Diff             @@
##           master    #3510      +/-   ##
==========================================
+ Coverage   85.75%   85.91%   +0.16%     
==========================================
  Files         299      301       +2     
  Lines       55311    56218     +907     
==========================================
+ Hits        47431    48299     +868     
- Misses       7880     7919      +39     
Impacted Files Coverage Δ
datafusion/common/src/lib.rs 0.00% <ø> (ø)
...usion/core/src/avro_to_arrow/arrow_array_reader.rs 0.00% <0.00%> (ø)
.../core/src/physical_plan/file_format/file_stream.rs 88.88% <ø> (ø)
...sion/core/src/physical_plan/file_format/parquet.rs 94.65% <0.00%> (-0.25%) ⬇️
datafusion/core/tests/sql/explain_analyze.rs 83.87% <ø> (ø)
datafusion/core/tests/user_defined_plan.rs 87.79% <ø> (ø)
...ion/physical-expr/src/aggregate/approx_distinct.rs 51.85% <ø> (ø)
...sical-expr/src/aggregate/approx_percentile_cont.rs 81.54% <ø> (ø)
datafusion/physical-expr/src/aggregate/average.rs 94.96% <ø> (ø)
datafusion/physical-expr/src/aggregate/count.rs 98.05% <ø> (ø)
... and 52 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@liukun4515 liukun4515 marked this pull request as draft September 17, 2022 08:33
@liukun4515 liukun4515 marked this pull request as ready for review September 17, 2022 11:23
@liukun4515 liukun4515 force-pushed the support_bool_type_coercion_#3509 branch from 52de4d2 to 67696c2 Compare September 17, 2022 11:49
@liukun4515 liukun4515 changed the title type coercion: support is/is_not_bool expr type coercion: support is/is_not_bool/like/unknown expr Sep 17, 2022
@liukun4515
Copy link
Contributor Author

@alamb @andygrove @Dandandan PTAL

@andygrove
Copy link
Member

Thanks @liukun4515 I will review this today or tomorrow

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- thanks @liukun4515

pattern,
escape_char,
};
expr.rewrite(&mut self.const_evaluator)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 maybe in the long term figuring out how to avoid having to remember to call the const evaluator each struct would be good -- as it too does a expr tree walk

So this algorithm is at least O(n^2) in the number of items in the expr tree.

I don't think it is important to do in this PR, but I think it is worth keeping in mind as we proceed or if we see the planner being a bit slow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe in the long term figuring out how to avoid having to remember to call the const evaluator each struct would be good -- as it too does a expr tree walk

agree with that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the SQL level with analyse/expr/optimizer should be refactor.
We are missing some formal logic like spark SQL framework or other SQL framework.
This makes it very difficult for us to do some features or optimization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the current framework for expr rewriting / analysis is reaching the limits of its existing design. Refactoring the code so it is easier to work with sounds like a good idea to me.

I am open to suggestions -- I think the challenge in any such refactors is not only coming up with a good new design but also figuring out how to incrementally get the existing code into that new structure

let left_type = expr.get_type(&self.schema)?;
let right_type = DataType::Boolean;
let coerced_type =
coerce_types(&left_type, &Operator::IsNotDistinctFrom, &right_type)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this isn't Operator::IsUnknown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can got the reason from
#3301
#3246
The unknown will be convert the physical binary op with DistinctFrom operation.
cc @alamb

let left_type = expr.get_type(&self.schema)?;
let right_type = DataType::Boolean;
let coerced_type =
coerce_types(&left_type, &Operator::IsDistinctFrom, &right_type)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here, it seem strange that this is not Operator::IsNotUnknown

datafusion/optimizer/src/type_coercion.rs Outdated Show resolved Hide resolved
@liukun4515
Copy link
Contributor Author

Thanks @liukun4515 I will review this today or tomorrow

waiting for your review.

@andygrove
Copy link
Member

Thanks @liukun4515 I will review this today or tomorrow

waiting for your review.

Apologies. I have not had time. I see that @alamb has approved this so please don't let me hold this up.

@liukun4515
Copy link
Contributor Author

Thanks @liukun4515 I will review this today or tomorrow

waiting for your review.

Apologies. I have not had time. I see that @alamb has approved this so please don't let me hold this up.

Thanks for your reply.

You can comment it after it merged.

@liukun4515 liukun4515 merged commit 11abfb9 into apache:master Sep 21, 2022
@ursabot
Copy link

ursabot commented Sep 21, 2022

Benchmark runs are scheduled for baseline = aad82fb and contender = 11abfb9. 11abfb9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support the type coercion for like unlike istrue isfalse isunknown
6 participants