Skip to content

Commit

Permalink
ICU-21710 Remove BOYER_MOORE dead code from usearch.cpp
Browse files Browse the repository at this point in the history
  • Loading branch information
jefgen committed Aug 19, 2021
1 parent b03b8be commit a9af611
Show file tree
Hide file tree
Showing 5 changed files with 63 additions and 2,447 deletions.
4 changes: 2 additions & 2 deletions docs/userguide/collation/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -268,8 +268,8 @@ Werner's text searching article for more details
(<http://icu-project.org/docs/papers/efficient_text_searching_in_java.html>).

However, implementing collation-based search with the Boyer-Moore method
while getting correct results is very tricky,
and ICU no longer uses this method.
while getting correct results is very tricky, and ICU no longer uses this method
(as of ICU4C 4.0 and ICU4J 53).

Please see the [String Search Service](./string-search) chapter.

Expand Down
11 changes: 6 additions & 5 deletions docs/userguide/collation/string-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -276,11 +276,12 @@ search service. There were some known issues in these previous releases.
[ICU-5382](https://unicode-org.atlassian.net/browse/ICU-5382),
[ICU-5420](https://unicode-org.atlassian.net/browse/ICU-5420))

In ICU4C 4.0, the string
search service was updated with the simple linear search algorithm, which
locates a match by shifting a cursor in the target text one by one, and these
issues were fixed. In ICU4C 4.0.1, the Boyer-Moore search code was reintroduced
as a separated API set as a technology preview. In a later release, this code was deleted.
In ICU4C 4.0, the string search service was updated to use a simple linear search
algorithm, which locates a match by shifting a cursor in the target text one by one,
and these issues were fixed.

In ICU4C 4.0.1, the Boyer-Moore search code was reintroduced as a separate API with
technology preview status. In ICU 51.1, this code was deleted.

The Boyer-Moore searching
algorithm is based on automata or combinatorial properties of strings and
Expand Down
4 changes: 3 additions & 1 deletion docs/userguide/icu4j/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,9 @@ determine whether case and accents are ignored during a search.

#### What algorithm are you using to perform the search?

StringSearch uses a version of the Boyer-Moore search algorithm that has been
As of ICU4J 53 / ICU4C 4.0, StringSearch uses a simple linear search algorithm which
locates a match by shifting a cursor in the target text one by one. Previous
versions of ICU used a version of the Boyer-Moore search algorithm which was
modified for use with Unicode. Rather than using raw Unicode character values in
its comparisons and shift tables, the algorithm uses collation elements that
have been "hashed" down to a smaller range to make the tables a reasonable size.
Expand Down
11 changes: 6 additions & 5 deletions icu4c/source/i18n/unicode/usearch.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,9 @@
* See the <a href="http://source.icu-project.org/repos/icu/icuhtml/trunk/design/collation/ICU_collation_design.htm">
* "ICU Collation Design Document"</a> for more information.
* <p>
* The implementation may use a linear search or a modified form of the Boyer-Moore
* search; for more information on the latter see
* As of ICU4C 4.0 / ICU4J 53, the implementation uses a linear search. In previous versions,
* a modified form of the Boyer-Moore searching algorithm was used. For more information
* on the modified Boyer-Moore algorithm see
* <a href="http://icu-project.org/docs/papers/efficient_text_searching_in_java.html">
* "Efficient Text Searching in Java"</a>, published in <i>Java Report</i>
* in February, 1999.
Expand Down Expand Up @@ -595,8 +596,8 @@ U_CAPI UCollator * U_EXPORT2 usearch_getCollator(
/**
* Sets the collator used for the language rules. User retains the ownership
* of this collator, thus the responsibility of deletion lies with the user.
* This method causes internal data such as Boyer-Moore shift tables to
* be recalculated, but the iterator's position is unchanged.
* This method causes internal data such as the pattern collation elements
* and shift tables to be recalculated, but the iterator's position is unchanged.
* @param strsrch search iterator data struct
* @param collator to be used
* @param status for errors if it occurs
Expand All @@ -608,7 +609,7 @@ U_CAPI void U_EXPORT2 usearch_setCollator( UStringSearch *strsrch,

/**
* Sets the pattern used for matching.
* Internal data like the Boyer Moore table will be recalculated, but the
* Internal data like the pattern collation elements will be recalculated, but the
* iterator's position is unchanged.
*
* The UStringSearch retains a pointer to the pattern string. The caller must not
Expand Down
Loading

0 comments on commit a9af611

Please sign in to comment.