You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, a call to RPAD with a Utf8View datatypes induces a cast. After the change that fixes this issue, it should not.
query TT
EXPLAIN SELECT
RPAD(column1_utf8view, 1) as c1,
RPAD(column1_utf8view, 2, column2_utf8view) as c2
FROM test;
----
logical_plan
01)Projection: rpad(__common_expr_1, Int64(1)) AS c1, rpad(__common_expr_1, Int64(2), CAST(test.column2_utf8viewAS Utf8)) AS c2
02)--Projection: CAST(test.column1_utf8view AS Utf8) AS __common_expr_1, test.column2_utf8view03)----TableScan: test projection=[column1_utf8view, column2_utf8view]
Is your feature request related to a problem or challenge?
We are working to add complete StringView support in DataFusion, which permits potentially much faster processing of string data. See #10918 for more background.
Today, most DataFusion string functions support DataType::Utf8 and DataType::LargeUtf8 and when called with a StringView argument DataFusion will cast the argument back to DataType::Utf8 which is expensive.
To realize the full speed of StringView, we need to ensure that all string functions support the DataType::Utf8View directly.
Describe the solution you'd like
Update the function to support DataType::Utf8View directly
Describe alternatives you've considered
The typical steps are:
Write some tests showing the function doesn't support Utf8View (see the tests in string_view.slt to ensure the arguments are not being cast
Change the Signature of the function to accept Utf8View in addition to Utf8/LargeUtf8
Update the implementation of the function to operate on Utf8View
To see if it is using utf8 view, use EXPLAIN to see the plan and verify there is no CAST. In this example the CAST(column1@0 AS Utf8) indicates that the function is not using Utf8View natively
Part of #11752 and #11790
Currently, a call to
RPAD
with a Utf8View datatypes induces a cast. After the change that fixes this issue, it should not.rpad
is defined here: https://github.com/apache/datafusion/blob/e088945c38b74bb1d86dcbb88a69dfc21d59e375/datafusion/functions/src/unicode/rpad.rscasting tests are in: https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/string_view.slt
Is your feature request related to a problem or challenge?
We are working to add complete StringView support in DataFusion, which permits potentially much faster processing of string data. See #10918 for more background.
Today, most DataFusion string functions support DataType::Utf8 and DataType::LargeUtf8 and when called with a StringView argument DataFusion will cast the argument back to DataType::Utf8 which is expensive.
To realize the full speed of StringView, we need to ensure that all string functions support the DataType::Utf8View directly.
Describe the solution you'd like
Update the function to support DataType::Utf8View directly
Describe alternatives you've considered
The typical steps are:
string_view.slt
to ensure the arguments are not being castSignature
of the function to acceptUtf8View
in addition toUtf8
/LargeUtf8
Utf8View
Example PRs
Utf8View
type instarts_with
function #11787StringViewArray
#11556Additional context
The documentation of string functions can be found here: https://datafusion.apache.org/user-guide/sql/scalar_functions.html#string-functions
To test a function with StringView with
datafusion-cli
you can use an example like this (replacingstarts_with
with the relevant function)To see if it is using utf8 view, use
EXPLAIN
to see the plan and verify there is noCAST
. In this example theCAST(column1@0 AS Utf8)
indicates that the function is not usingUtf8View
nativelyIt is also often good to test with a constant as well (likewise there should be no cast):
The text was updated successfully, but these errors were encountered: