Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement 'StringConcat' operator to support sql like "select 'aa' || 'b' " #2142

Merged
merged 6 commits into from
Apr 7, 2022

Conversation

WinkerDu
Copy link
Contributor

@WinkerDu WinkerDu commented Apr 3, 2022

Which issue does this PR close?

Closes #2141 .

Rationale for this change

df now has not implemented StringConcat operator, like

❯ select 'aa' || 'b';
NotImplemented("Unsupported SQL binary operator StringConcat")

But Postgres SQL will come out results like

select 'aa' || 'b';
 ?column? 
----------
 aab
(1 row)

What changes are included in this PR?

Reuse df build-in concat string expression to do it.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Apr 3, 2022
@WinkerDu
Copy link
Contributor Author

WinkerDu commented Apr 3, 2022

cc @Dandandan @alamb @xudong963 @yjshen
Please have a review, thank you.

@jackwener
Copy link
Member

Great job! ❤

@WinkerDu
Copy link
Contributor Author

WinkerDu commented Apr 3, 2022

@jackwener thank you for your comment.
I'm not sure what PG behaves like is reasonable, PG build-in expression concat can take NULL like

select concat('aa', NULL, 'b');
 concat 
--------
 aab
(1 row)

I prefer unify the outputs of || and concat, what other contributors think of?

@WinkerDu
Copy link
Contributor Author

WinkerDu commented Apr 4, 2022

@jackwener thank you for your comment. I'm not sure what PG behaves like is reasonable, PG build-in expression concat can take NULL like

select concat('aa', NULL, 'b');
 concat 
--------
 aab
(1 row)

I prefer unify the outputs of || and concat, what other contributors think of?

@jackwener @renato2099
I've check the behaviors of PG of SparkSQL, they all come out results with NULL

  • Spark 3.2
spark-sql> select 'a' || NULL || 'b';
NULL
  • PG
# select 'a' || NULL || 'b';
 ?column? 
----------
 
(1 row)

Seems here is a convention, I'll comply with it in later commit, thank you both.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @WinkerDu -- this is a nice addition

datafusion/core/src/sql/planner.rs Show resolved Hide resolved
@wesm
Copy link
Member

wesm commented Apr 4, 2022

@alamb do you know why this PR generated e-mails to the dev@ mailing list?

@alamb
Copy link
Contributor

alamb commented Apr 4, 2022

@wesm I am sorry but do not know why there were some emails from this PR to dev e. g. https://lists.apache.org/thread/tqv2x57xmzzrppl3zgox5rt2rhfyg8jz

I vaguely remember getting an email from apache infra about "remailer" but I didn't read the details and now I can't seem to find it :(

There appears to be at least one C++ github conversation that got forwarded too: https://lists.apache.org/thread/vr3yjrcrb01b8zh64dclfl4d2gcxfw0j

@wesm
Copy link
Member

wesm commented Apr 4, 2022

It seems to have been a temporary INFRA hiccup (Iceberg had some issues, too https://lists.apache.org/thread/8f7xdfttk4xhfo7o66lwkpc4xzvx2vjz). If it starts happening again, we should open an INFRA Jira ticket.

@WinkerDu
Copy link
Contributor Author

WinkerDu commented Apr 5, 2022

@alamb @jackwener @renato2099 PTAL, thank you.

  1. support concat string and values that can be cast to string, add some UT, like select 'aa' || 42.
  2. output NULL when encounter NULL in concat pipe.
  3. keep StringConcat operator as BinaryExpr to make column physical name correct.

@WinkerDu
Copy link
Contributor Author

WinkerDu commented Apr 6, 2022

@alamb PTAL, thank you.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- thanks @WinkerDu

Comment on lines +501 to +510
let result = (0..ignore_null_array.len())
.into_iter()
.map(|index| {
if left.is_null(index) || right.is_null(index) {
None
} else {
Some(ignore_null_array.value(index))
}
})
.collect::<StringArray>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a future optimization it might be worth using the take kernel https://docs.rs/arrow/11.1.0/arrow/compute/kernels/take/fn.take.html which is fairly well optimized / tries to avoid copying.

This is good for now though

@alamb alamb merged commit ddf29f1 into apache:master Apr 7, 2022
ovr pushed a commit to cube-js/arrow-datafusion that referenced this pull request May 19, 2022
… 'b' " (apache#2142)

* implement stringconcat operator

* snake case fix

* support non-string concat & handle NULL

* value -> array

* string concat internal coercion

* get NULL in right index of vec

Co-authored-by: duripeng <[email protected]>
ovr pushed a commit to cube-js/arrow-datafusion that referenced this pull request May 19, 2022
… 'b' " (apache#2142)

* implement stringconcat operator

* snake case fix

* support non-string concat & handle NULL

* value -> array

* string concat internal coercion

* get NULL in right index of vec

Co-authored-by: duripeng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add 'StringConcat' operator to df
6 participants