-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use unicode-width instead of len() or grapheme cluster #7. #71
Conversation
Reason this PR is failing is due to ![feature(rustc_private)]. Please suggest alternative to get past this issue. Thanks, |
Cargo.toml
Outdated
@@ -15,3 +15,4 @@ categories = ["command-line-interface"] | |||
|
|||
[dev-dependencies] | |||
log = "0.4" | |||
unicode-width = "0.1.5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[dev-dependencies]
are not available for the "main" crate. You want [dependencies]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on @prataprc! I've left a few comments. Some of them are just little nitpicks.
src/lib.rs
Outdated
|
||
use std::error::Error; | ||
use std::ffi::OsStr; | ||
use std::fmt; | ||
use std::iter::{repeat, IntoIterator}; | ||
use std::result; | ||
|
||
extern crate unicode_width; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's put this extern
crate up with the extern crate log
statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} else { | ||
row.push_str("--"); | ||
} | ||
row.push_str(if self.long_only { "-" } else { "--" }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
src/lib.rs
Outdated
/// Note: Function was moved here from `std::str` because this module | ||
/// is the only place that uses it, and because it was too specific for | ||
/// a general string function. | ||
fn each_split_within(desc: &String, lim: usize) -> Vec<String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This could be desc: &str
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
src/lib.rs
Outdated
// A single word has gone over the limit. In this | ||
// case we just accept that the word will be too long. | ||
B | ||
/// Note: Function was moved here from `std::str` because this module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment probably isn't accurate anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
src/lib.rs
Outdated
let mut rows = Vec::new(); | ||
for line in desc.trim().lines() { | ||
let mut words = Vec::new(); | ||
let mut word = String::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've got a fair amount of allocation going on in this method that would be good to avoid if we can. It seems like we're processing the line multiple times to clear out excess whitespace. We could do this without the temporary strings by maintaining an index into the line that we're processing:
// Add an additional whitespace to flush the last word
let line_chars = line.chars().chain(Some(' '));
let words = line_chars.fold((Vec::new(), 0, 0), |(mut words, word_start_idx, last_idx), c| {
// Get the current byte offset
let idx = last_idx + c.len_utf8();
// If the char is whitespace, advance the word start and maybe push a word
if c.is_whitespace() {
if word_start_idx != last_idx {
words.push(&line[word_start_idx..last_idx]);
}
(words, idx, idx)
}
// If the char is not whitespace, continue, retaining the current
else {
(words, word_start_idx, idx)
}
}).0;
The example uses the Iterator::fold
method to let us thread state through our chars, so we can find the point at which a word start, then keep that index until we hit the end. Here's a runnable version you can check out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks neat ! 👍
src/lib.rs
Outdated
C | ||
}).0; | ||
|
||
let mut row = String::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could cut down some more allocations in this part of the function too. We don't need the filter
anymore because the words
we get from above are all greater than 0
in length. So we could do something like this:
let mut current_row = String::new();
for word in words.iter() {
let sep = if current_row.len() > 0 { Some(" ") } else { None };
let mut width =
current_row.width() + word.width() + sep.map(UnicodeWidthStr::width).unwrap_or(0);
if width <= lim {
if let Some(sep) = sep {
current_row.push_str(sep);
}
current_row.push_str(word);
continue
}
if current_row.len() > 0 {
rows.push(current_row.clone());
current_row.clear();
}
current_row.push_str(word);
}
if current_row.len() > 0 {
rows.push(current_row);
}
So we re-use the same current_row
with its capacity already set somewhere up around lim
instead of creating a new string buffer each time.
We also don't need to filter and copy rows
, we can just return it as-is at the end of the method because it's only got valid rows in it.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. 👍 amended the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! I think the new version of each_split_within
is definitely easier to follow.
Thanks for working on this @prataprc!
The AppVeyor failure is transient. |
I have refactored each_split_within() to follow, hopefully, a simpler logic. The test cases are passing and I have added new test case to test multi-width characters.
Let me know if this PR will be useful for this issue or need modifications.
Thanks,