-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Epic] Native StringView
support for string functions
#11790
Comments
One thing I have noticed during implementations is that some functions such as For example, in #11920 (comment) from @Kev1n8 it is actually probably a good idea to always generate StringView as output (rather than StringArray) as it could avoid a copy. I am thinking once we get the string functions so they can support StringView as input then we can do a second pass and optimize some functions so they produce StringView as output |
Inspired by @Omega359 's great PR #11941, I have some suggestion on testing Although most implementation is adapted from existing implementation, but the execution takes another path, so I think comprehensive end-to-end tests are still needed. Here are the examples on how to adapt existing test cases for
|
We are making pretty good progress here -- just a few more functions left 🚀 |
Is your feature request related to a problem or challenge?
We are working to add complete
StringView
support in DataFusion, which permits potentially much faster processing of string data. See #10918 for more background.Today, most DataFusion string functions support
DataType::Utf8
andDataType::LargeUtf8
and when called with aStringView
argument DataFusion will cast the argument back toDataType::Utf8
which is expensive.To realize the full speed of
StringView
, we need to ensure that all string functions support theDataType::Utf8View
directly.Describe the solution you'd like
Port all string functions
StringViewArray
#11556starts_with
forUtf8View
#11786ASCII
scalar function to supportUtf8View
#11834BTRIM
scalar function to supportUtf8View
#11835CONCAT
scalar function to supportUtf8View
#11836concat_ws
scalar function to supportUtf8View
#11837CONTAINS
scalar function to supportUtf8View
#11838ENDS_WITH
scalar function to supportUtf8View
#11852INITCAP
scalar function to supportUtf8View
#11853levenshtein
scalar function to supportUtf8View
#11854LOWER
scalar function to supportUtf8View
#11855LTRIM
scalar function to supportUtf8View
#11856LPAD
scalar function to supportUtf8View
#11857OCTET_LENGTH
scalar function to supportUtf8View
#11858SPLIT_PART
scalar function to support Utf8View #11950STRPOS
scalar function to support Utf8View #11951SUBSTR
scalar function to support Utf8View #11952TRANSLATE
scalar function to support Utf8View #11953FIND_IN_SET
scalar function to support Utf8View #11954REPEAT
#11962bit_length
#13195Describe alternatives you've considered
No response
Additional context
See coordination plan with @tshauck and myself here: #11787 (comment)
The text was updated successfully, but these errors were encountered: