Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preproc improvements #48

Merged
merged 12 commits into from
Dec 20, 2019
Merged

Preproc improvements #48

merged 12 commits into from
Dec 20, 2019

Conversation

AnthonyMikh
Copy link
Contributor

Description of changes:
This series of commits simplifies implementations of lcs_substr and remove_common_tokens as well as greatly reducing amount of memory allocations. Functionality is preserved, all tests pass.

Note: in lcs_substr the common prefix is .trim()ed. However, if this common prefix happens to start with whitespace characters, this will give wrong results later in remove_common_tokens. It seems like it actually should be .trim_end().

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@AnthonyMikh
Copy link
Contributor Author

AnthonyMikh commented Dec 12, 2019

Test failure on Travis seems to be unrelated to this changes.

EDIT: It turned out to be not true

Before this change the code effectively discarded the last common character
Previously the code would make "prefixprefixret" into "prefix"
`std::str::Chars` is slow since it have to construct codepoints and thus should be avoided. 
With a small adjustments finding a common byte prefix is sufficient
and compiles into more efficient code (compiler vectorized the comparison loop)
Copy link
Owner

@jpeddicord jpeddicord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome changes, thank you! I'm reading through some bits (particularly the UTF-8 portions) to make sure I understand them well, but if the tests are passing I'm pretty confident this should be okay.

Copy link
Owner

@jpeddicord jpeddicord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great changes, thanks again. I'm pretty OK accepting this as-is, but I left a couple comments for you to look at. I'll be back in tomorrow to merge this unless you have any other changes you want to make.

src/preproc.rs Show resolved Hide resolved
src/preproc.rs Show resolved Hide resolved
@jpeddicord jpeddicord merged commit 916be47 into jpeddicord:master Dec 20, 2019
@jpeddicord
Copy link
Owner

Merged! Thanks again. I'll get things ready for a new release soon, likely right after the holidays.

@AnthonyMikh AnthonyMikh deleted the preproc_improvements branch December 20, 2019 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants