Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make_array null handling with nested version #7136

Closed
jayzhan211 opened this issue Jul 30, 2023 · 7 comments · Fixed by #7207
Closed

make_array null handling with nested version #7136

jayzhan211 opened this issue Jul 30, 2023 · 7 comments · Fixed by #7207
Labels
enhancement New feature or request

Comments

@jayzhan211
Copy link
Contributor

Is your feature request related to a problem or challenge?

Follow up on #6900. Implement array() nested version.
It seems calling array() for list return flattened list according to the comment

https://github.com/apache/arrow-datafusion/blob/2cf5f5b5bb824598de185d64c541c52c930728cf/datafusion/physical-expr/src/array_expressions.rs#L243-L254

However, I think make_array should wrap the elements into array, the expected behavior
make_array([A,X], []) -> [[A,X], []],
make_array([null, Y], [Q,R,S]) -> [[null,Y],[Q,R,S]] and
make_array([C,Z], null) -> [[C,Z], null]

Main branch:

query ?
select make_array(['A','X'], []);
----
[[A, X], ]

query ?
select make_array(make_array(null,'Y'), make_array('Q','R','S'));
----
[[, Y], [Q, R, S]]

query error DataFusion error: Optimizer rule 'simplify_expressions' failed
caused by
Internal error: failed to downcast\. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
select make_array(['C','Z'], null);

As a result, only the third case fails, the first case is acceptable but should we expected [[A, X], []] to differentiate with null can also be discussed

Describe the solution you'd like

Fix the third case and update the comments.
Optional: Expect the first case to return [[A, X], []]

Describe alternatives you've considered

No response

Additional context

No response

@jayzhan211 jayzhan211 added the enhancement New feature or request label Jul 30, 2023
@jayzhan211
Copy link
Contributor Author

@alamb @izveigor Request your comment on this issue, thank you!

@jayzhan211
Copy link
Contributor Author

Duckdb result

Screenshot 2023-07-30 at 15 35 07

@alamb
Copy link
Contributor

alamb commented Jul 30, 2023

My suggestion is to follow the DuckDB / Clickhouse behavior

We are having a similar discussion about semantics of Unnest here #7088 (comment)

Maybe @vincev has some opinions as well

@vincev
Copy link
Contributor

vincev commented Jul 30, 2023

This makes sense to me too, another thing I have noticed (maybe related to this) is that arrays with a null value generate error:

DataFusion CLI v28.0.0
❯ select make_array([1,2], [3,4]) as a;
+------------------+
| a                |
+------------------+
| [[1, 2], [3, 4]] |
+------------------+
❯ select make_array([1,null], [3,4]) as a;
This feature is not implemented: Arrays with different types are not supported: {Int64, Null}

DuckDB generates:

D select array[[1,null],[3,4]] as a;
┌─────────────────────┐
│          a          │
│      int32[][]      │
├─────────────────────┤
│ [[1, NULL], [3, 4]] │
└─────────────────────┘

@vincev
Copy link
Contributor

vincev commented Jul 30, 2023

Oh I see there is a workaround for that:

❯ select make_array(make_array(1,null), [3,4]) as a;
+-----------------+
| a               |
+-----------------+
| [[1, ], [3, 4]] |
+-----------------+

@izveigor
Copy link
Contributor

Agree with @alamb and @vincev. Thank you for bringing this topic up, @jayzhan211.
I have created a separate ticket for handling NULL and empty arrays: #7143.

@jayzhan211
Copy link
Contributor Author

I will fix the third case first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants