-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stride
is not optional for new array_slice
UDF
#10424
Comments
Thanks @Michael-J-Ward -- what is your ideal outcome? That the UDF function can take three arguments (and default the fourth value to a constant 1)? |
Looks like these were last modified in #9788 / #9615 from @jayzhan211 |
I would expect that there exists some value for Roughly:
That way EDIT: The frustration encountered with #10425 is that we were unable to trigger the code-path that the |
I agree it is helpful if we support But I can't think of any good solution to this What you are expecting is something not supported in rust -- variadic function fn array_slice(array:Expr, begin:Expr, end:Expr) {
// with stride 1
}
fn array_slice(array:Expr, begin:Expr, end:Expr, stride:Expr) {
// with stride
} I think the issue here should be handled in
@Michael-J-Ward Did you successfully create expression without stride=1 before 38? It would be surprising if it works previously. |
Is it possible to make the fn array_slice(args: Vec<Expr>) {
} When making its UDF function, we need to do make_udf_function!(
ArraySlice,
array_slice,
"returns a slice of the array.",
array_slice_udf
); |
Another potential option is to add a second expr fn with a different signature fn array_slice_with_stride(array:Expr, begin:Expr, end:Expr, stride:Expr) {
...
} |
The downside of this is we loss the clear argument name and we need to check args length.
I prefer this |
All i did was expand the `make_udf_function` macro and add the `if let Some(stride) = stride` conditional. To me, making the argument `Option<_>` is the natural way to make it optional in rust. I don't know if this solution violates other datafusion constraints, but `cargo test` all passed. Ref: apache#10424
First, I'd like to emphasize that if passing CREATE TEMPORARY VIEW data3 AS VALUES ([1.0, 2.0, 3.0, 3.0]), ([4.0, 5.0, 3.0]), ([6.0]);
❯ select * from data3;
+----------------------+
| column1 |
+----------------------+
| [1.0, 2.0, 3.0, 3.0] |
| [4.0, 5.0, 3.0] |
| [6.0] |
+----------------------+
❯ select array_slice(column1, -1, 2) from data3;
+-----------------------------------------------+
| array_slice(data3.column1,Int64(-1),Int64(2)) |
+-----------------------------------------------+
| [] |
| [] |
| [6.0] |
+-----------------------------------------------+
❯ select array_slice(column1, -1, 2, 1) from data3;
thread 'main' panicked at /Users/andy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-51.0.0/src/transform/primitive.rs:31:43:
range end index 9 out of range for slice of length 8 Now to this issue, I agree that I like the explicit arguments versus passing a I created a draft with what I would think is the natural rust solution in #10450. I don't know if this violates some datafusion constraint, but all the tests pass. #[doc = "returns a slice of the array."]
pub fn array_slice(array: Expr, begin: Expr, end: Expr, stride: Option<Expr>) -> Expr {
if let Some(stride) = stride {
Expr::ScalarFunction(ScalarFunction::new_udf(
array_slice_udf(),
vec![array, begin, end, stride],
))
} else {
Expr::ScalarFunction(ScalarFunction::new_udf(
array_slice_udf(),
vec![array, begin, end],
))
}
} If that is not possible, then a second proposal would be to amend let stride = if args_len == 4 & is_not_null(&args[3]) {
Some(as_int64_array(&args[3])?)
} else {
None
}; But again, this is just a nice-to-have if passing |
I believe we encountered the same issue again with |
I recall hitting something like this with the substr/substring function. One would think they would be identical however they were not (since rust doesn't do variadic functions nor does it allow defaults for args ala scala or Kotlin). For the substr it was expanded to two functions that internally used a vec<> and handled the logic of using the optional 3rd argument there.
|
I am +1 for this solution suggested by @Michael-J-Ward #[doc = "returns a slice of the array."]
pub fn array_slice(array: Expr, begin: Expr, end: Expr, stride: Option<Expr>) -> Expr {
if let Some(stride) = stride {
Expr::ScalarFunction(ScalarFunction::new_udf(
array_slice_udf(),
vec![array, begin, end, stride],
))
} else {
Expr::ScalarFunction(ScalarFunction::new_udf(
array_slice_udf(),
vec![array, begin, end],
))
}
} |
I think there are two issues here, one is the panic issue in
The reason for panic in the second command is due to the bug in datafusion/datafusion/functions-array/src/extract.rs Lines 420 to 425 in 933b430
We should fix this. Another issue is that given the current API constraint of #[doc = "returns a slice of the array."]
pub fn array_slice(array: Expr, begin: Expr, end: Expr, stride: Option<Expr>) -> Expr {
if let Some(stride) = stride {
Expr::ScalarFunction(ScalarFunction::new_udf(
array_slice_udf(),
vec![array, begin, end, stride],
))
} else {
Expr::ScalarFunction(ScalarFunction::new_udf(
array_slice_udf(),
vec![array, begin, end],
))
}
} I think this is more a user experience problem, how should we design it is discussable. Given the design here, we still need to provide 4 args, where you provide TLDR: |
Agreed @jayzhan211, these are two separate issues. The panic issue was filed separately as #10425 |
It is more than a user experience issue. The current API is causing test failures in a downstream project (the DataFusion Python bindings) and this cannot be resolved without the API changes being discussed in this issue. |
Since the Solutions discussed:
|
I believe the failure occurred because of the bug I mentioned. Can you provide another example where df-python still experiences a panic?
If it is because of the |
I submitted a formal PR #10469 to address this. |
Describe the bug
The
array_slice
UDF takes 4 parameters.datafusion/datafusion/functions-array/src/extract.rs
Lines 55 to 61 in 96487ea
Which means that
args.len()
is always 4 inarray_slice_inner
, even when called withstride = Expr::Null
orstride = Expr::Int64(None)
datafusion/datafusion/functions-array/src/extract.rs
Lines 289 to 293 in 96487ea
To Reproduce
I encountered this while creating the python wrapper for
datafusion-python
and had to setstride=1
else the function would panic.https://github.com/apache/datafusion-python/blob/7ad526caf140f7eb76ec541c903027d49693b58f/src/functions.rs#L104-L111
Expected behavior
The
stride
behavior in the UDF should match what's available in the CLIAdditional context
No response
The text was updated successfully, but these errors were encountered: