Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to cast [] to FixedSizeList(1, Null) #9158

Open
Weijun-H opened this issue Feb 8, 2024 · 10 comments
Open

Failed to cast [] to FixedSizeList(1, Null) #9158

Weijun-H opened this issue Feb 8, 2024 · 10 comments
Labels
bug Something isn't working

Comments

@Weijun-H
Copy link
Member

Weijun-H commented Feb 8, 2024

Describe the bug

DataFusion CLI v35.0.0
❯ select arrow_cast(make_array(), 'FixedSizeList(1, Null)');
Arrow error: Cast error: Cannot cast to FixedSizeList(1): value at index 0 has length 0

❯ select arrow_cast([], 'FixedSizeList(1, Null)');
Arrow error: Cast error: Cannot cast to FixedSizeList(1): value at index 0 has length 0

To Reproduce

No response

Expected behavior

No response

Additional context

No response

@Weijun-H Weijun-H added the bug Something isn't working label Feb 8, 2024
@r3stl355
Copy link
Contributor

r3stl355 commented Feb 8, 2024

I wonder if that's meant to work. The following works:

select arrow_cast([null], 'FixedSizeList(1, Null)');

However, if you wanted a zero-sized list then should it be be

select arrow_cast([], 'FixedSizeList(0, Null)');

However that throws the following error

thread 'main' panicked at arrow-datafusion/datafusion/common/src/scalar.rs:3184:5:
assertion `left == right` failed
  left: 0
 right: 1

@alamb
Copy link
Contributor

alamb commented Feb 9, 2024

Panic'ing is definitely not good

@r3stl355
Copy link
Contributor

I'll see what I can do

@r3stl355
Copy link
Contributor

take

@r3stl355
Copy link
Contributor

I've done some digging but did not find an easy fix, only few options listed below. Happy to follow up but need a decision on which fix to attempt.

The following works select arrow_cast([null], 'FixedSizeList(1, Null)'); so it's logical to use FixedSizeList(0, Null) when casting an empty array (select arrow_cast([], 'FixedSizeList(0, Null)');). However, that doesn't work because of the following:

  • Docstring for datafusion_common::ScalarValue::FixedSizeList says "The array must be a FixedSizeListArray with length 1." (the same applies to other Scalar::List* types) so any length other than 1 would be invalid.

https://github.com/r3stl355/arrow-datafusion/blob/3b355c798a3258f118016b33f26c5a55fed36220/datafusion/common/src/scalar/mod.rs#L231

  • During the cast, datafusion_common::ScalarValue::FixedSizeList(0, Null) is converted to arrow_schema::datatype::DataType::FixedSizeList(FieldRef, 0) before being passed to arrow::compute::kernels::cast::cast_with_options for evaluation of the arrow_cast

  • arrow::compute::kernels::cast::cast_with_options returns a FixedSizeListArray<0> of length 0 when called with arrow_schema::datatype::DataType::FixedSizeList(FieldRef, 0). Note that this is different for any length greater than 0 used in FixedSizeList (i.e. the return value will always be of length 1), e.g. called with FixedSizeList(FieldRef, 2) as cast type, arrow::compute::kernels::cast::cast_with_options which returns a FixedSizeListArray<2> with a length 1.

The possible fix options are:

  1. Raise an exception if 0 is used as a cast target type (i.e. FixedSizeList(0, Null)'))
  2. Try to convert FixedSizeList(FieldRef, 0) to FixedSizeList(FieldRef, 1) before calling cast_with_options but A. this feels really wrong and B. may still not work
  3. Raise an issue in Arrow asking to return a non-empty array when cast_with_options is called with FixedSizeList(FieldRef, 0). I'll do some digging there to see if it's possible, e.g if FixedSizeListArray<0>[NullArray(0),] would be a valid type

Lastly, this error happens when displaying the result but not when applying some other functions to it, e.g. this following works but its the only function I tested it with:

select arrow_typeof(arrow_cast([], 'FixedSizeList(0, Null)'));

@jayzhan211
Copy link
Contributor

jayzhan211 commented Feb 24, 2024

I prefer 1. I think Fixedsizelist with len 0 is the same as an empty list. I don't think there is any useful case that we need to cast an empty list to Fixedsizelist(0, type). Return exec_error if casting to Fixedsizelist(0, any type). We just need to avoid panic for this casting.

@r3stl355
Copy link
Contributor

@Weijun-H was there any specific reason you were trying to achieve this (i.e. select arrow_cast(make_array(), 'FixedSizeList(1, Null)');)?

@Weijun-H
Copy link
Member Author

@Weijun-H was there any specific reason you were trying to achieve this (i.e. select arrow_cast(make_array(), 'FixedSizeList(1, Null)');)?

There are no particular use cases now, I am working on #9108, which reminded me of this case. And also I vote for the first solution, which is more reasonable.

@r3stl355 r3stl355 removed their assignment Mar 22, 2024
@r3stl355
Copy link
Contributor

I unassigned myself from this issue as I don't have much bandwidth at the moment so maybe someone else is willing to implement the changes. If nobody does then I'll come back to this in 2-3 weeks.

@r3stl355
Copy link
Contributor

Looks like this is still open, happy to resume if noone else is working on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants