-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add StringArray::num_chars
for calculating number of characters
#1503
Conversation
Signed-off-by: remzi <[email protected]>
delete unchecked fn update doc Signed-off-by: remzi <[email protected]>
/// This function has `O(n)` time complexity where `n` is the string length. | ||
/// If you can make sure that all chars in the string are in the range `U+0x0000` ~ `U+0x007F`, | ||
/// please use the function [`value_length`](#method.value_length) which has O(1) time complexity. | ||
pub fn num_chars(&self, i: usize) -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find an elegant way to make the returned type as OffsetSize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think returning the length as a usize
is a good API and is what would be expected by rust programmers 👍
/// # Performance | ||
/// This function has `O(n)` time complexity where `n` is the string length. | ||
/// If you can make sure that all chars in the string are in the range `U+0x0000` ~ `U+0x007F`, | ||
/// please use the function [`value_length`](#method.value_length) which has O(1) time complexity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
length of string
== number of chars
when all chars are in 0000 ~ 007F
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you have considered using array.value(i).chars().len()
to count utf8 codepoints, as described in
https://stackoverflow.com/questions/46290655/get-the-string-length-in-characters-in-rust?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you have considered using
array.value(i).chars().len()
to count utf8 codepoints, as described in https://stackoverflow.com/questions/46290655/get-the-string-length-in-characters-in-rust?
Thank you for your helpful suggestion, I will have a try!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
update doc and test Signed-off-by: remzi <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #1503 +/- ##
==========================================
+ Coverage 82.70% 82.73% +0.03%
==========================================
Files 188 188
Lines 54403 54359 -44
==========================================
- Hits 44993 44974 -19
+ Misses 9410 9385 -25
Continue to review full report at Codecov.
|
@@ -377,9 +386,9 @@ mod tests { | |||
|
|||
#[test] | |||
fn test_string_array_from_u8_slice() { | |||
let values: Vec<&str> = vec!["hello", "", "parquet"]; | |||
let values: Vec<&str> = vec!["hello", "", "A£ऀ𖼚𝌆৩ƐZ"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @HaoYang670 and @liukun4515 for the review |
StringArray
StringArray::num_chars
for calculating number of characters
Which issue does this PR close?
Closes #1493 .
Rationale for this change
We need a method to calculate the number of chars for
StringArray
What changes are included in this PR?
Provide a pub method
num_chars
.