-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Union columns can never be NULL
#11162
Comments
@alamb any idea on where I would start looking to try and fix this? |
I would suggest writing a standlone test case / reproducer as the first step Then I suspect we can either help you find the code needed to be fixed (or maybe even someone would be interested in fixing it themselves) |
NULL
(I think?)NULL
See #11314 as a demonstration of the problem for both dense and sparse unions. After a bit of investigation, the issues lies in the first instance with datafusion/datafusion/physical-expr/src/expressions/is_null.rs Lines 74 to 84 in 08c5345
Then with this code in /// Returns a non-null [BooleanArray] with whether each value of the array is null.
/// # Error
/// This function never errors.
/// # Example
/// ...
pub fn is_null(input: &dyn Array) -> Result<BooleanArray, ArrowError> {
let values = match input.logical_nulls() {
None => BooleanBuffer::new_unset(input.len()),
Some(nulls) => !nulls.inner(),
};
Ok(BooleanArray::new(values, None))
} And then with this code /// Union types always return non null as there is no validity buffer.
/// To check validity correctly you must check the underlying vector.
fn is_null(&self, _index: usize) -> bool {
false
} Ultimately with the spec
Basically arrow is saying "we're not going to tell you if a union is null, you need to look in the child arrays", but datafusion isn't listening and is just asking the union if it's null in the naive way. Two options to move forward as far as I can tell:
If (as I hope) we go for the second option, there's also the issue (as demonstrated by #11314) that the representation of "null" union items doesn't match other types, it shows |
+1 for second option. I think we should check the children's nullability. |
I suppose there's a third option of updating arrow-rs to correctly calculate if a |
I've proposed a fix in #11321. |
It would likely take longer Note there is a method that takes into account child nullability that perhaps we could use instead of Update: it turns out |
Describe the bug
Maybe I'm doing something wrong with my union in datafusion-functions-json, but
is null
expressions never evaluate to true with theJsonUnion
column.They behave as expected when the function in question returns a scalar
JsonUnion
.To Reproduce
See datafusion-contrib/datafusion-functions-json#24
Expected behavior
When the value in every column of the union is null:
<union column> is null
should evaluate to null{<first column_name>=}
(which I what I think it is now)Additional context
No response
The text was updated successfully, but these errors were encountered: