Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic with queries with multiple COUNT DISTINCT aggregates on dictionary values, and a group by #7938

Closed
alamb opened this issue Oct 26, 2023 · 5 comments · Fixed by #7989
Closed
Assignees
Labels
bug Something isn't working

Comments

@alamb
Copy link
Contributor

alamb commented Oct 26, 2023

Describe the bug

DataFusion panic's when runnning a query like

select count(distinct column1), count(distinct column2) from test group by column1;

Where column1 and column2 are Dictionary

To Reproduce

❯ create table test as values (1, arrow_cast('foo', 'Dictionary(Int32, Utf8)')), (2, arrow_cast('bar', 'Dictionary(Int32, Utf8)'));
0 rows in set. Query took 0.002 seconds.

❯ select * from test;
+---------+---------+
| column1 | column2 |
+---------+---------+
| 1       | foo     |
| 2       | bar     |
+---------+---------+
2 rows in set. Query took 0.001 seconds.

❯ select count(distinct column1), count(distinct column2) from test group by column1;
thread 'tokio-runtime-worker' panicked at /Users/alamb/Software/arrow-datafusion/datafusion/common/src/scalar.rs:1846:18:
Unsupported data type Dictionary(Int32, Utf8) for ScalarValue::list_to_array
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
External error: Join Error
caused by
External error: task 66 panicked

Expected behavior

The test should produce the same as when the values are cast to string:

❯ select count(distinct column1::varchar), count(distinct column2::varchar) from test group by column1;
+------------------------------+------------------------------+
| COUNT(DISTINCT test.column1) | COUNT(DISTINCT test.column2) |
+------------------------------+------------------------------+
| 1                            | 1                            |
| 1                            | 1                            |
+------------------------------+------------------------------+

Additional context

I believe this is a regression introduced in #7629 (not yet released)

We found this downstream in IOx

@alamb alamb added the bug Something isn't working label Oct 26, 2023
@alamb alamb self-assigned this Oct 26, 2023
@jayzhan211
Copy link
Contributor

Other than Dictionary, there are also many types I did not support yet, since I am not sure whether we need them all, so I only supported the type that is used (aka pass all test).

@jayzhan211
Copy link
Contributor

btw I forgot to fix this "Unsupported data type {:?} for ScalarValue::list_to_array" to "Unsupported data type {:?} for ScalarValue::new_list"

@jayzhan211
Copy link
Contributor

@alamb
Copy link
Contributor Author

alamb commented Oct 27, 2023

Other than Dictionary, there are also many types I did not support yet, since I am not sure whether we need them all, so I only supported the type that is used (aka pass all test).

Yeah, there is clearly a test coverage gap -- I'll take a look into what is going on

@alamb
Copy link
Contributor Author

alamb commented Oct 30, 2023

Here is proposed fix: #7989

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants