-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making connector handling much faster #198
Comments
These seem to be reasonable assumptions.
this looks like a bug ... upper-case letters in t might not be matched.
I don't think we use hash-marks anywhere in the dicts, right? There are |
I'm not sure I understand what you say: If s=AA and t=AAAA, there should be no match.
There are indeed no "#" marks in the dict.
and got this:
|
Meanwhile I found another method to perform connector comparisons in a negligible CPU time comparing to the current code. This method has the problem (which may not be actually a real problem) that it assumes the connectors consist of ASCII only, and each contains a sequence of no more than 12 capital letter and then a sequence of no more than 9 additional letters (in addition to the optional h/d at the start). But it seems enough. Here is my new easy_match():
The struct members For most of the comparisons, the only work of this function is to return false in its first statement. If the idea of pre-computation for connectors (at dict read time) seems to you fine, I will test it. |
On Wed, Nov 4, 2015 at 3:01 PM, Amir Plivatsky [email protected]
OH, OK, right. I misread the code.
Yes, right sorry, I forgot that this was the post-process match and not the |
Precompiling for easy-match seems like an excellent idea. Limiting to ASCII is OK, so is limiting to 12 cap lettters. I understand where the 12 comes from; I don't see why lower-case is only 9 though. Can't the mask handle it all? Whatever. 9 is an OK limit too. These are OK limits for the hand-built dicts, and are OK for the machine-learned dicts, too. Go for it! |
For the "lowercase letters" part, I packed each of the letters as 7 bits. However, it can be done as 6 bits too, still allowing all the useful characters besides lowercase letters (e.g. numbers). This allows for an additional character. Meanwhile I found a better encoding for the uppercase part (than 5-bit packing):
This allows to pack up to 64K different unlimited length uppercase parts into a uint16_t, which is also much more cache friendly than a uint64_t. Pre-handling all the connectors at the dict creating step may have yet another major benefit: their hash can be pre-computed, instead of doing it as done now - per sentence (which is one of the major overhead areas). In theory, one of the many free perfect-hash functions can be used. They can easily handle millions of keys, so I hope handling several thousands keys will be light enough for a one-time dict handling step. This of course needs a check. In addition, each connector could be pre-marked if it belongs to the UNLIMITED-CONNECTORS set, another thing that is now checked (using a hash) per sentence (#11 in the CPU profiling). |
Yes, these are all excellent ideas. The only observation I have to offer is that it was "hard" to hash the connectors, as they were so short, there was little variety. Perhaps enumerating them can help. I tried several different hashes, and the shortest, simplest ones seemed to work best. |
Quoting form my opening post:
It still consumes considerable percentage.
BTW, the post processing is also a major bottleneck in the SAT parser code, and I once tried investigating its encoding in SAT (only the postprocessing pruning is currently encoded).
By now is far away down in the list.
In
However, I couldn't prove it makes the total run faster (still a mystery). So I will use it only if it will be the last obstacle to the said trimming to 16 bytes. So these 2 issues are open and I am leaving this issue open. BTW, the major CPU hog is now the |
I would prefer that over TLS. Note that passing extra arguments through functions adds overhead as well, especially if the order of the arguments has to be altered. But this overhead is minor compared to cache misses. |
Updates:
So I'm closing this issue. |
Hi Linas,
I did profiling and found several "hot spots" in the regular parser (I will discuss the profiling of the SAT parser in another post).
Among them are connector comparisons.
post_process_match()
is now the no. 1 CPU consumer.easy_match()
is no. 2.(I neglect for now malloc/realloc/free, which take about 35% of the CPU...).
It turned out it is possible to speed-up these functions by much.
With these modifications, the batch benchmarks run much faster (~15% speed-up on fixes.batch).
The only drawback is that the new code is maybe less straight-forward (but adding appropriate comments may mitigate that).
The speed-up is based on these ideas:
Here is how the faster
post_process_match()
looks like (still subject to change):Here is some benchmark:
Average of 25 runs, each 5G iterations (interleaving runs of current and modified code).
(The times include the C program invocation and the loop overhead.)
Here is a similar benchmark for the new easy_match():
Average of 12 runs, each 5G iterations (interleaving runs of current and modified code).
(The times include the C program invocation and the loop overhead.)
I have an idea how to further speed-up these function, and also speed-up the connector hash calculation: When reading the dict, calculate the following for each connector: uppercase start offset, uppercase end offset, and maybe also total length and whether there is a wildcard. I intend to check if this can provide an additional significant speed-up.
Meanwhile, if the idea of improving the speed-up of easy_match() and post_process_match() this way seems to you fine, I can send a pull request.
I also did numerous other speed-up modifications, which their accumulated effect is still unknown. Maybe I will include them in the same pull request (as separate commits of course.)
The text was updated successfully, but these errors were encountered: