-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support IS NULL
and IS NOT NULL
on Unions
#11321
Conversation
} | ||
} | ||
|
||
fn dense_union_is_null( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth someone else stepping through this logic and checking it's right. I'm not 100% sure it's all correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did and it looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @samuelcolvin -- This PR is very well tested and coded so I think it would be ok to merge as is
I left some suggestions that might help readability / maintainability but I don't think they are required
I think the right long term fix is to update the arrow-rs
compute::is_null` to handle UnionArray properly (though it is fine to put a workaround into DataFusion until that is available) -- I filed apache/arrow-rs#6017 to track that
let bool_array = if let Some(union_array) = | ||
array.as_any().downcast_ref::<UnionArray>() | ||
{ | ||
union_is_null(union_array)? | ||
} else { | ||
compute::is_null(array.as_ref())? | ||
}; | ||
Ok(ColumnarValue::Array(Arc::new(bool_array))) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of replicating the special case for UnionArray
here and in is_not_null
what do you think about making a wrapper for compute::is_null
in DataFusion:
/// wraper around arrow::compute::is_null
fn is_null(datum: &Datum) -> Result<BooleanArray> {
if let Some(union_array) =
array.as_any().downcast_ref::<UnionArray>()
{
union_is_null(union_array)
} else {
compute::is_null(array.as_ref())
}
}
The idea being that then when the fix is available in arrow-rs then we can simple remove the wrapper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done what I think you mean here.
The only material change is that now for the "is not null" case we use effectively use compute::not(compute::is_null(...))
instead of compute::is_not_null
, I'm not sure if those will compile to the same thing, or if you care about any resultant differences.
} | ||
} | ||
|
||
fn dense_union_is_null( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did and it looks good to me
BooleanArray::new(BooleanBuffer::new_unset(union_array.len()), None); | ||
for type_id in 0..union_array.type_names().len() { | ||
let type_id = type_id as i8; | ||
let union_is_child = cmp::eq(&type_ids, &Int8Array::new_scalar(type_id))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked this and it looks good to me
@alamb, I agree on your comments, I'll get those things fixed tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @samuelcolvin
* Demonstrate unions can't be null * add scalar test cases * support "IS NULL" and "IS NOT NULL" on unions * formatting * fix comments from @alamb * fix docstring
* Demonstrate unions can't be null * add scalar test cases * support "IS NULL" and "IS NOT NULL" on unions * formatting * fix comments from @alamb * fix docstring
Changed to use upstream'd code added by @gstvg in apache/arrow-rs#6303 see #12724 |
Which issue does this PR close?
Closes #11162, replaces #11314
Rationale for this change
See #11162.
What changes are included in this PR?
is_null
to support correctly support unionsis_not_null
to support correctly support unionsScalarValue::is_null
Are these changes tested?
Yes
Are there any user-facing changes?
Should just be the intended fix.