Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement equality = and inequality <> support for StringView #10919

Closed
Tracked by #10918
alamb opened this issue Jun 14, 2024 · 6 comments
Closed
Tracked by #10918

Implement equality = and inequality <> support for StringView #10919

alamb opened this issue Jun 14, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jun 14, 2024

Is your feature request related to a problem or challenge?

Part of #10918, [StringViewArray](https://docs.rs/arrow/latest/arrow/array/type.StringViewArray.html) support in DataFusion

There are several queries in the clickbench suite like follows:

SELECT "MobilePhone", "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits WHERE "MobilePhoneModel" <> '' GROUP BY "MobilePhone", "MobilePhoneModel" ORDER BY u DESC LIMIT 10;
SELECT "SearchPhrase", COUNT(*) AS c FROM hits WHERE "SearchPhrase" <> '' GROUP BY "SearchPhrase" ORDER BY c DESC LIMIT 10;
SELECT "SearchPhrase", COUNT(DISTINCT "UserID") AS u FROM hits WHERE "SearchPhrase" <> '' GROUP BY "SearchPhrase" ORDER BY u DESC LIMIT 10;
SELECT "SearchEngineID", "SearchPhrase", COUNT(*) AS c FROM hits WHERE "SearchPhrase" <> '' GROUP BY "SearchEngineID", "SearchPhrase" ORDER BY c DESC LIMIT 10;

where "MobilePhoneModel" and "SearchPhrase" are string columns with predicates (in this case checking for empty string)

Describe the solution you'd like

In order to improve performance of these queries we will need the ability to actually compare StringViewArrays to constant strings (and likely to each other)

Thus I would like to be able to run

StringViewColumn = scalar
StringViewColumn = StringViewColumn

(and likewise for BinaryView)

I basically want to to run the following queries (where table foo has StringView columns)

> create table foo as values ('Andrew', 'X'), ('Xiangpeng', 'Xiangpeng'), ('Raphael', 'R');
0 row(s) fetched.
Elapsed 0.002 seconds.

> select * from foo where column1 = 'Andrew';
+---------+---------+
| column1 | column2 |
+---------+---------+
| Andrew  | X       |
+---------+---------+
1 row(s) fetched.
Elapsed 0.003 seconds.

> select * from foo where column1 <> 'Andrew';
+-----------+-----------+
| column1   | column2   |
+-----------+-----------+
| Xiangpeng | Xiangpeng |
| Raphael   | R         |
+-----------+-----------+
2 row(s) fetched.
Elapsed 0.001 seconds.

> select * from foo where column1 = column2;
+-----------+-----------+
| column1   | column2   |
+-----------+-----------+
| Xiangpeng | Xiangpeng |
+-----------+-----------+
1 row(s) fetched.
Elapsed 0.002 seconds.

> select * from foo where column1 <> column2;
+---------+---------+
| column1 | column2 |
+---------+---------+
| Andrew  | X       |
| Raphael | R       |
+---------+---------+
2 row(s) fetched.
Elapsed 0.001 seconds.

Describe alternatives you've considered

I suspect we will need to update the coercion logic and maybe also the arrow equality kernels like https://docs.rs/arrow/latest/arrow/compute/kernels/cmp/fn.eq.html

Additional context

No response

@Weijun-H
Copy link
Member

I am glad to pick this ticket.

@Weijun-H
Copy link
Member

This issue must wait until #10920 because there is currently no convenient way to create a StringViewArray in Datafusion. If I am mistaken, please correct me.

@alamb
Copy link
Contributor Author

alamb commented Jun 15, 2024

This issue must wait until #10920 because there is currently no convenient way to create a StringViewArray in Datafusion. If I am mistaken, please correct me.

I think you are right -- conveniently @XiangpengHao has one here #10925

@XiangpengHao
Copy link
Contributor

Hi @Weijun-H , great to know you are working on this!
I believe implementing this feature will eventually require apache/arrow-rs#5897 to be solved, so I'm working on that issue so you won't be blocked

@alamb
Copy link
Contributor Author

alamb commented Jun 17, 2024

BTW I made a branch to work on StringView in DataFusion: #10961

@alamb
Copy link
Contributor Author

alamb commented Jun 19, 2024

StringView comparison added in #10985

@alamb alamb closed this as completed Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants