Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-21710 Additional clean up after removing BOYER_MOORE code from usearch.cpp #1830

Merged
merged 1 commit into from
Aug 30, 2021

Conversation

jefgen
Copy link
Member

@jefgen jefgen commented Aug 27, 2021

This PR is a follow-up to PR #1800 (which removed BOYER_MOORE dead code from usearch.cpp).

This PR contains additional clean-up.

Thanks to Andy for pointing out on the other PR that initialize and setShiftTable were still setting up Boyer-Moore related data. It turns out that we can we can completely remove the shift tables and related fields from the data structs, as well as remove the setShiftTable method.

The creation of UStringSearch objects should be slightly faster now, as we no longer waste time computing the unused shift tables (which hashed the pattern collation elements).

The sizeof(UStringSearch) is decreased from 5240 bytes to 3192 bytes (on x64), so this should help to reduce memory for applications that create many string search objects.

Also added a test case for a pattern with only ignoreable CEs. The comments on initialize said: "If pattern has no non-ignorable ce, we return a illegal argument error". However, it actually does not set illegal argument error.

Checklist
  • Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-21710
  • Required: The PR title must be prefixed with a JIRA Issue number.
  • Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • Required: Each commit message must be prefixed with a JIRA Issue number.
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

markusicu
markusicu previously approved these changes Aug 27, 2021
Copy link
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks!
lgtm -- although I have some optional suggestions (ok to ignore, or to use --amend)

icu4c/source/test/cintltst/usrchtst.c Outdated Show resolved Hide resolved
icu4c/source/test/cintltst/usrchtst.c Outdated Show resolved Hide resolved
…earch.cpp

Changes:
- We can completely remove the shift tables and related fields from
  data structs.

- Creation of UStringSearch objects should be faster now,
  as it doesn't waste time computing the unused shift tables.

- The sizeof(UStringSearch) is decreased from 5240 to 3192, so
  this should help to reduce memory for applications that create many string search objects.

Note regarding the comments on initialize(). It actually does not set illegal argument error if pattern is all ignoreables. Added a test case for it.
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/test/cintltst/usrchtst.c is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Copy link
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm tnx

@jefgen jefgen merged commit 5b96c2e into unicode-org:main Aug 30, 2021
@jefgen jefgen deleted the user/jefgen/usearch-clean-up branch August 30, 2021 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants