-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiVolnitsky added with tests and some benchmark, many multiFunctions are added to support multistring search #4053
Conversation
…ns are added to support multistring search
Some details i should mention. MultiVolnitsky saves into the hashtable the index of a needle and a position of it. Then the simple version is used. 3 functions were added Finding multiPosition() becomes faster according to just many position() if there are at least 3-4 strings. This happens because there are some additional checks in multi version which cannot be thrown away (check if we have enough bytes to compare or if we are not out of range when subtracting position from hashtable). And there are two additional variables to save the answer. multiSearch is much faster even if there are 1-2 strings to search -- there are less variables and code is faster. firstMatch is a bit slower than multiSearch but faster than multiPosition as all we need is to modify multiSearch algorithm to take the minimum index. cmovle instructions are pretty cool and minimize branch conditions. All the functions become extremely fast when the minimun size of all the strings is big (4 is the least number of string size to use MultiVolnitsky, say, if minimum size is 10, we will get much faster search). This is because we use the step of minimum size to find all the needles. And as a consequence, we use less cpu. |
After some optimizations, benchmarks are like this:
|
MultiVolnitsky added with tests and some benchmark, many multiFunctions are added to support multistring search
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
For changelog.
Category:
Short description (up to few sentences):
Added multi searcher to search from multiple constant strings from big haystack. Added functions {multiPosition,multiSearch,firstMatch}{'', UTF8, CaseInsensitive, CaseInsensitiveUTF8}
Detailed description:
Here are the benchmarks. Though finding few number of strings is a bit worse, improvements of others are really great.
This was done as a first part of my diploma at Faculty of Computer Science