-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Epic] Complete Initial StringView
in DataFusion
#11752
Comments
An update here is that @XiangpengHao has a PR with various changes in #11862 We still need to check that PR and figure out what else is in that PR is needed to be enabled "for real" (with tests, etc) |
My ideal resolution here is that we end up in the state where the only change we need to enable string view by default is switch the config setting. I will do some more ticket triage later today to outline other items I know of |
Do we have tickets for regexp binary operators? (like I noticed stringview is not supported on them yet and they have separate implementation than regexp functions Details
|
Not that I know of -- it would be great to add them |
Filed #12180 |
I am going to try and polish up PR to enable string view by default PR (with the arrow upgrade and various recent improvements) and see how close we are #12092 |
StringView by default is finally merged into DataFusion: #13101 so I am claiming success and completion of this issue |
Is your feature request related to a problem or challenge?
This ticket is a follow on to #10918 where we implemented enough initial support for
StringView
/BinaryView
that we can show some pretty sweet ClickBench resultsDescribe the solution you'd like
This epic tracks remaining work to complete the "initial" work which I would like to define as "enable using StringView when reading Strings from Parquet by default"
I am sure there will be additional work / support to add StringView to various other features of DataFusion that we can maybe track with another follow on ticket
Required for enabling StringView by default:
schema_force_string_view
) by default #11682||
forStringViewArray
#11766unreachable code: Utf8/Binary should use ArrowBytesSet
#11767ScalarValue::Utf8View
andScalarValue::BinaryView
#12117ScalarValue::Utf8View
andScalarValue::BinaryView
#12118Utf8View
/BinaryView
-->Utf8
/Binary
at output #12119Utf8
asUtf8View
#12123~
,!~
, etc #12180LIKE
#12500LIKE
slows down some ClickBench queries #12509Could work around but really should be fixed upstream
BinaryView
-->Utf8
andLargeUtf8
arrow-rs#6162StringView
andBinaryView
statistics inStatisticsConverter
arrow-rs#6164StringViewArray::slice()
andBinaryViewArray::slice()
faster / non allocating arrow-rs#6408Additional "Nice to have" Features
StringView
support for string functions #11790CoalesceBatchesExec
for StringViews #11628StringView
in DataFusion #11752The text was updated successfully, but these errors were encountered: