-
Notifications
You must be signed in to change notification settings - Fork 819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safe API to replace NullBuffers
for Arrays
#6528
Comments
I think this makes sense, the only thing to be careful with is arrays where null values may have undefined contents, e.g. dictionaries. In such cases, allowing users to go from a null to not null, could have safety implications |
FWIW the nullif kernel is very similar to this, but with the caveat that nulls can only remain null, avoiding the above issue Edit: in fact the operation you describe is the nullif kernel I think... |
I think the nullif kernel provides this, so perhaps this can be closed? |
Sounds good -- the current documentation on nullif is pretty sparse (and thus perhaps we can make it easier to discover / more likely people can find it) with some better docs https://docs.rs/arrow/latest/arrow/compute/kernels/nullif/fn.nullif.html I'll try and find some time |
PR to improve the docs: #6658 After doing that it is somewhat of the inverse of what I was looking for (it sets the element to I will attempt a PR with a proposed API as well for consideration |
I made #6659 to add The actual usecase I have is not to turn elements unnull, but instead make them null if a boolean array is not true (the inverse of what Maybe a better API would be something like I will ponder |
You could either negate the boolean array, or construct a BooleanArray with a null buffer of the buffer you want to be null if false, and a values buffer of entirely false. Neither is exactly ideal, but the performance, especially of the latter, is likely going to be hard to beat. |
🤔 this is now we do it today in DataFusion: https://github.com/apache/datafusion/blob/f23360f1e0f90abcf92a067de55a9053da545ed2/datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/nulls.rs#L53-L102 (basically call |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
While implementing apache/datafusion#12792 and various other things in DataFusion I find myself often wanting to combine a filter and a null mask
The relevant code is like
Describe the solution you'd like
I would like an API like
with_nulls
that returns a new array with the same data but a new null mask so my code would look likeDescribe alternatives you've considered
I can keep using the unsafe APIs
Additional context
The text was updated successfully, but these errors were encountered: