-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preproc improvements #48
Preproc improvements #48
Conversation
This makes code use HashMap with string slices as keys. Also eliminate `Peekable` iterators.
Test failure on Travis seems to be unrelated to this changes. EDIT: It turned out to be not true |
Before this change the code effectively discarded the last common character
Previously the code would make "prefixprefixret" into "prefix"
`std::str::Chars` is slow since it have to construct codepoints and thus should be avoided. With a small adjustments finding a common byte prefix is sufficient and compiles into more efficient code (compiler vectorized the comparison loop)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome changes, thank you! I'm reading through some bits (particularly the UTF-8 portions) to make sure I understand them well, but if the tests are passing I'm pretty confident this should be okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great changes, thanks again. I'm pretty OK accepting this as-is, but I left a couple comments for you to look at. I'll be back in tomorrow to merge this unless you have any other changes you want to make.
Merged! Thanks again. I'll get things ready for a new release soon, likely right after the holidays. |
Description of changes:
This series of commits simplifies implementations of
lcs_substr
andremove_common_tokens
as well as greatly reducing amount of memory allocations. Functionality is preserved, all tests pass.Note: in
lcs_substr
the common prefix is.trim()
ed. However, if this common prefix happens to start with whitespace characters, this will give wrong results later inremove_common_tokens
. It seems like it actually should be.trim_end()
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.