-
Notifications
You must be signed in to change notification settings - Fork 785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cast
kernel support for StringViewArray
and BinaryViewArray
#5508
Comments
StringViewArray
and BinaryViewArray
cast
kernel support for StringViewArray
and BinaryViewArray
@RinChanNOWWW has a PR to add initial support here: #5686 I updated this ticket with other potential subtasks |
I want to discuss about casting from ViewArray to ByteArray. As we know, we can use ViewArray for random access of byte buffers. So, when converting ViewArray to ByteArray, memory copy is unavoidable. I can't come up with a zero-copy way. If I implement this operation, I will allocate brand new buffers for the target ByteArray. I want to discuss if there is a better way? |
I agree there is unlikely to be a zero copy way. The only potential exception I can imagine is if the underlying buffer was already pre-packed (though this would require all strings to be longer than 12 bytes and in order and contiguous). I suspect the time necessary to detect if this was the case would be substantial Thus I suggest we start with the simple case (copy to new buffer) and we can optimize later of it turns out that is an important usecase. I think some part of #5513 might be relevant here (namely the operation to compact the strings) 🤔 |
I think what is remaining for this ticket is support for
As @RinChanNOWWW says in #5508 (comment), these casts are going to require copying the string data into the compacted form required of |
Sorry @XiangpengHao corrected me -- it seems that @RinChanNOWWW actually implemented both directions So I think we can claim this ticket is done. #5861 tracks adding support for dictionary |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
This is part of the larger project to implement
StringViewArray
-- see #5374In #5481 we added support for
StringViewArray
andByteViewArray
.This ticket tracks supporting
StringViewArray
andByteViewArray
in thecast
kernel: https://docs.rs/arrow/latest/arrow/compute/kernels/cast/index.htmlDescribe the solution you'd like
Specifically the following conversions should be supported in the cast kernels:
StringViewArray
<-->StringArray
StringViewArray
<-->LargeStringArray
And similarly for
Binary
:BinaryViewArray
<-->BinaryArray
BinaryViewArray
<-->LargeBinaryArray
Notes:
Subtasks
StringArray
/BinaryArray
-->StringView
/BinaryView
#5686StringView
/BinaryView
-->StringArray
/BinaryArray
. #5704Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: