Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support IS NULL and IS NOT NULL on Unions #11321

Merged
merged 6 commits into from
Jul 8, 2024
Merged

Conversation

samuelcolvin
Copy link
Contributor

Which issue does this PR close?

Closes #11162, replaces #11314

Rationale for this change

See #11162.

What changes are included in this PR?

  • Changes to is_null to support correctly support unions
  • Changes to is_not_null to support correctly support unions
  • change to ScalarValue::is_null
  • tests

Are these changes tested?

Yes

Are there any user-facing changes?

Should just be the intended fix.

@github-actions github-actions bot added physical-expr Physical Expressions core Core DataFusion crate labels Jul 7, 2024
}
}

fn dense_union_is_null(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth someone else stepping through this logic and checking it's right. I'm not 100% sure it's all correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did and it looks good to me

datafusion/physical-expr/src/expressions/is_null.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @samuelcolvin -- This PR is very well tested and coded so I think it would be ok to merge as is

I left some suggestions that might help readability / maintainability but I don't think they are required

I think the right long term fix is to update the arrow-rs compute::is_null` to handle UnionArray properly (though it is fine to put a workaround into DataFusion until that is available) -- I filed apache/arrow-rs#6017 to track that

datafusion/physical-expr/src/expressions/is_not_null.rs Outdated Show resolved Hide resolved
Comment on lines 81 to 89
let bool_array = if let Some(union_array) =
array.as_any().downcast_ref::<UnionArray>()
{
union_is_null(union_array)?
} else {
compute::is_null(array.as_ref())?
};
Ok(ColumnarValue::Array(Arc::new(bool_array)))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of replicating the special case for UnionArray here and in is_not_null what do you think about making a wrapper for compute::is_null in DataFusion:

/// wraper around arrow::compute::is_null
fn is_null(datum: &Datum) -> Result<BooleanArray> {
  if let Some(union_array) =
                    array.as_any().downcast_ref::<UnionArray>()
                {
                    union_is_null(union_array)
                } else {
                    compute::is_null(array.as_ref())
                }
  }

The idea being that then when the fix is available in arrow-rs then we can simple remove the wrapper

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done what I think you mean here.

The only material change is that now for the "is not null" case we use effectively use compute::not(compute::is_null(...)) instead of compute::is_not_null, I'm not sure if those will compile to the same thing, or if you care about any resultant differences.

}
}

fn dense_union_is_null(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did and it looks good to me

datafusion/physical-expr/src/expressions/is_null.rs Outdated Show resolved Hide resolved
BooleanArray::new(BooleanBuffer::new_unset(union_array.len()), None);
for type_id in 0..union_array.type_names().len() {
let type_id = type_id as i8;
let union_is_child = cmp::eq(&type_ids, &Int8Array::new_scalar(type_id))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked this and it looks good to me

datafusion/physical-expr/src/expressions/is_null.rs Outdated Show resolved Hide resolved
datafusion/physical-expr/src/expressions/is_null.rs Outdated Show resolved Hide resolved
@samuelcolvin
Copy link
Contributor Author

@alamb, I agree on your comments, I'll get those things fixed tomorrow.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @samuelcolvin

@alamb alamb merged commit 0c02cad into apache:main Jul 8, 2024
23 checks passed
@samuelcolvin samuelcolvin deleted the union-null branch July 8, 2024 20:06
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* Demonstrate unions can't be null

* add scalar test cases

* support "IS NULL" and "IS NOT NULL" on unions

* formatting

* fix comments from @alamb

* fix docstring
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jul 18, 2024
* Demonstrate unions can't be null

* add scalar test cases

* support "IS NULL" and "IS NOT NULL" on unions

* formatting

* fix comments from @alamb

* fix docstring
@alamb
Copy link
Contributor

alamb commented Oct 2, 2024

Changed to use upstream'd code added by @gstvg in apache/arrow-rs#6303

see #12724

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Union columns can never be NULL
2 participants