-
Notifications
You must be signed in to change notification settings - Fork 843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let the StringBuilder
use BinaryBuilder
#2181
Let the StringBuilder
use BinaryBuilder
#2181
Conversation
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
@@ -385,12 +406,6 @@ pub type StringArray = GenericStringArray<i32>; | |||
/// ``` | |||
pub type LargeStringArray = GenericStringArray<i64>; | |||
|
|||
impl<T: OffsetSizeTrait> From<GenericListArray<T>> for GenericStringArray<T> { | |||
fn from(v: GenericListArray<T>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move, not remove
Codecov Report
@@ Coverage Diff @@
## master #2181 +/- ##
==========================================
- Coverage 82.87% 82.38% -0.50%
==========================================
Files 237 239 +2
Lines 61465 62096 +631
==========================================
+ Hits 50940 51155 +215
- Misses 10525 10941 +416
Help us with your feedback. Take ten seconds to tell us how you rate us. |
impl<OffsetSize: OffsetSizeTrait> From<GenericBinaryArray<OffsetSize>> | ||
for GenericStringArray<OffsetSize> | ||
{ | ||
fn from(v: GenericBinaryArray<OffsetSize>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would allow non-utf8 data within a StringArray using safe APIs, which would break our safety guarantees
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. But this should not be done in this PR. Because it follows the style of the currentStringArray::from_list
(https://github.com/apache/arrow-rs/blob/master/arrow/src/array/array_string.rs#L122-L143) which is also unsafe.
We could file a follow-up issue to track this. Maybe implementing both safe
and unsafe
styles is a good choice:
impl StringArray {
unsafe fn from_list_unchecked(...) {...}
unsafe fn from_binary_unchecked (...) {...}
}
impl From<ListArray> for StringArray {
/// safe method with utf-8 checking
fn from(...) {...}
}
impl From<BinaryArray> for StringArray {
/// safe method with utf-8 checking
fn from(...) {...}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #2205 to track this.
Benchmark runs are scheduled for baseline = 82e0512 and contender = 9e47779. 9e47779 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2156.
Rationale for this change
Using less memory and faster.
Benchmark
Tested on Intel Ubuntu
What changes are included in this PR?
Are there any user-facing changes?
Delete the pub method
StringBuilder::append(bool)
.