-
Notifications
You must be signed in to change notification settings - Fork 785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cast
kernel support for StringViewArray
and BinaryViewArray
<-->
DictionaryArray`
#5861
Comments
I'm working on this, can you assign me @alamb ? |
Done! |
StringViewArray --> DictionaryArray<IndexType, LargeUtf8> will copy strings twice. |
Yes I agree this is likely -- I suggest using https://docs.rs/arrow/latest/arrow/array/builder/struct.GenericByteDictionaryBuilder.html directly That will also deduplicate the values and result in the smallest dictionary possible. The downside is that inserting each value will require hashing on a string I could imagine a faster implementation that keeps a map of |
|
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
This is part of the larger project to implement
StringViewArray
-- see #5374In #5508, @RinChanNOWWW tracked adding casting to/from StringArray 🙏 ❤️
This ticket tracks adding additional data type support for
StringViewArray
andByteViewArray
in thecast
kernel: https://docs.rs/arrow/latest/arrow/compute/kernels/cast/index.htmlMany systems (e.g InfluxDB 3.0, Apache DataFusion Comet, and I think Coralogix) use DictionaryArrays. Thus supporting casting to/from
DictionaryArray
will be important to permit easy integration into downstream consumersDescribe the solution you'd like
Specifically the following conversions should be supported in the cast kernels:
StringViewArray
<-->DictionaryArray<IndexType, Utf8>
StringViewArray
<-->DictionaryArray<IndexType, LargeUtf8>
And similarly for
Binary
:BinaryViewArray
<-->DictionaryArray<IndexType, Binary>
BinaryViewArray
<-->DictionaryArray<IndexType, LargeBinary>
Notes:
DictionaryArray<IndexType, LargeUtf8>
-->StringViewArray
can be implemented without copying stringsStringViewArray
-->DictionaryArray<IndexType, LargeUtf8>
will likely require copying the stringsDescribe alternatives you've considered
I think casting from Dictionary
Additional context
The text was updated successfully, but these errors were encountered: