-
-
Notifications
You must be signed in to change notification settings - Fork 756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICU-21710 Remove BOYER_MOORE dead code from usearch.cpp #1800
ICU-21710 Remove BOYER_MOORE dead code from usearch.cpp #1800
Conversation
3d1f058
to
6f61c31
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice getting rid of the non-functional half of the .cpp file!
docs/userguide/icu4j/faq.md
Outdated
@@ -195,7 +195,9 @@ determine whether case and accents are ignored during a search. | |||
|
|||
#### What algorithm are you using to perform the search? | |||
|
|||
StringSearch uses a version of the Boyer-Moore search algorithm that has been | |||
As of ICU 53, StringSearch uses a simple linear search algorithm which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it say ICU 53 here, but ICU 4.0 in usearch.h?
I suspect that ICU 4.0 is closer to when B-M was turned off-by-default for hard-to-fix problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this to have a different date for ICU4J, as it looks like the J changes occurred after the C changes.
Based on looking at the tickets and code history, it appears as though B-M was turned off in ICU4C first, and later ported to ICU4J a few releases later.
The ICU4C changes to turn off B-M were done using ticket ICU-5420, which was marked for 4.0. This also aligns with existing comments in the User Guide:
In ICU4C 4.0, the string search service was updated with the simple linear search algorithm, which locates a match by shifting a cursor in the target text one by one, and these issues were fixed. In ICU4C 4.0.1, the Boyer-Moore search code was reintroduced as a separated API set as a technology preview. In a later release, this code was deleted.
The removal of the additional B-M implementation was done in ICU 51.1 using ticket ICU-9573.
And then later in ICU 53.1 (using ticket ICU-6288), the ICU4C StringSearch changes were ported to ICU4J.
This also was noted on the download page for ICU 53 http://site.icu-project.org/download/53
ICU4J Specific Changes
- StringSearch code re-implemented
- Ported ICU4C implementation to ICU4J, replacing the previous implementation
So I thought it might be good to update the documentation to call out that the J changes occurred later on (in version 53).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking!
This is confusing if we just say "ICU 53 StringSearch", even in the Java FAQ.
Please change this here to say "As of ICU4J 53 / ICU4C 4.0, ..."
and then usearch.h to say "As of ICU4C 4.0 / ICU4J 53, ..."
and something similar in the User Guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I'll force push an update with the changes in a moment.
6f61c31
to
a9af611
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
a9af611
to
e860e1b
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm tnx
This is related to the changes in PR #1473.
In ICU 4.0 (see ticket ICU-5420), the ICU4C StringSearch implementation was changed to use a linear search algorithm, and no longer used the Boyer-Moore search algorithm.
In ICU 51.1 (see ticket ICU-9573), the Boyer-Moore string search implementation and test cases were retracted.
In ICU 53.1 (see ticket ICU-6288), the ICU4C StringSearch changes were ported to ICU4J, meaning that ICU4J now also used a linear search algorithm.
The Boyer-Moore search algorithm code in
usearch.cpp
hasn't actually compiled since the Collation V2 rewrite in ~2013/2014.(The code behind the
BOYER_MOORE
#ifdef
refers to non-existent functions and methods that have been changed, making it no longer compile if set to1
.)This change completely removes the Boyer-Moore search algorithm code from the file
usearch.cpp
.Note: The diff shown on GitHub is slightly confused. I didn't actually edit or change the code in the file -- I only made deletions/removals.
Checklist