Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune speedup #845

Merged
merged 17 commits into from
Nov 6, 2018
Merged

Prune speedup #845

merged 17 commits into from
Nov 6, 2018

Conversation

ampli
Copy link
Member

@ampli ampli commented Nov 5, 2018

Corpus batch benchmarks (10x10 runs):
en basic: 2+%
en fixes: 3+%
ru: ~0.5%

Individual runs of a small sample of very long sentences shows ~5% speed up.
Runs with -verbosity=2 show a big power_prune() speedup. However, power_prune() currently takes only a small fraction of the total CPU, so the total saving is not as big.

The largest speedup is due to the shallow/deep connector ordering that allows an early termination of the loop in *_table_search() , that is implemented in the following commit:
prune.c: Matching shallow connectors: Stop earlier

Some other changes didn't show a noticeable save in their benchmarks, so they are not included here:

  1. Using memory pools for pp_prune().
  2. Consolidating all the memory allocations and initializations in power_table_new().
    But now I may know better and can improve their implementation, so I may include them in a future PR.

Some other changes in this PR:

  • Reformat touched code and localize some of its variables.
  • Add some code for ALT_*_CONSISTENCY while pruning (still commented out, more changes are needed in order to make more efficient).
  • Make power_prune() static.
  • Define lc_easy_match() as inline (I guess it got inlined anyway).

(I also have major exprune.c changes with a greater speedup impact, to be sent soon.)

BTW, here is a power_prune() speedup idea for a future check:
Eliminate bool shallow in struct c_list_s, in order that it will be exactly half CPU cache line size.

  • If we had Connector_struct in the expressions, it could be added there. It is not clear if using Connector_struct in expressions will worth their memory handling overhead, but they have other advantages that may offset this overhead.
  • Alternatively, the hashing can be done separately for shallow/deep connectors (with the expense of using some more memory), and this will save the "shallow" checks in *_table_search() loop.

ampli added 17 commits November 4, 2018 01:26
Instead of copying the list (in reverse order) while making the
deletions, just delete in-place, preserving the list order.
(Preserving the order is important for the next change that depends on
the shallow connectors to always be first in the list.)
Connectors cannot match when both are non-shallow.
Now the list of connectors contains mixed shallow/non-shallow, and the
whole list needs scanning, when just before the match try False is
returned as the match result.

Implement the following for stopping earlier in the
left/right_table_search() matching (possible_connection()) loop;

1. On power_table_new(), arrange for the shallow connectors to be first
in the table, by inserting them last (they are inserted in reverse
order).

2. On left/right_table_search(), if we are matching a non-shallow
connector, stop when we don't have any more shallow ones to try.

Note that in a previous commit cleanup_table() got changed not to
change the table order, a thing that is critical for this change.
This allows the code to compile if defined.
It still causes slowness so it remains undefined.
Also delete disjuncts in-place instead of copying the remaining ones to
a new list.
Validate that all BAD_WORDS has been removed.
This saves some CPU by avoiding a long zero-byte padding.
(GCC would most probably inline it anyway, but I don't know about other
compilers.)
At the same occasion:
- Reformat to the new code style.
- Localize variables.
@linas linas merged commit e51657a into opencog:master Nov 6, 2018
@linas
Copy link
Member

linas commented Nov 6, 2018

looks good.

@linas
Copy link
Member

linas commented Nov 6, 2018

So. Stupid computer tricks. Just to see how things are going, I thought to measure the speed of the ten-year-old version 4.2.5 ... had to do some hacking to get past a broken file-search-path design. I was shocked to find that it is 3x faster.

Let that sink in. Basically, 4.0.dict is much much more complex today, than it was back then, so that its at least three times slower. Of course, back then, corpus-fixes.batch had 2654 errors vs 385 errors today. So the complex dictionary is a huge impact on performance.

Lets try a different experiment: run today's parser with the old dictionary. Easy to say. Hard to do. For multiple reasons. Conclude: today's parser is almost 2x faster than the old parser.

What to conclude? The "almost 2x faster" result is ... great! But that also means that the more complex dict is 6x slower (since 6 = 2x3). So what does this mean? Well, part of me wants to say that this is like the fractal "length of the coastline of England" -- the more complex the dictionary, the more fractal language coastline can it cover, more accurately. But that's just hand-wavey meaningless blather.

Its still interesting, that a more accurate dictionary might even result in exponentially slower parsing? Is that possible?

@ampli
Copy link
Member Author

ampli commented Nov 6, 2018

Yes. It seems it is more subtle than what it seems at first glance.
Among other things, the 4.2.5 dict is not "so supportive" relative to the current dict, that after the prune stage many sentences have at least one word with zero disjuncts, a thing that cause a very fast parse failure.

I made the following tests:

Version 4.2.5:

sed 's/verbosity=1/verbosity=9/' ../master/data/en/corpus-fixes.batch | env LD_LIBRARY_PATH=link-grammar/.libs link-grammar/.libs/grammar-parse | grep -A1 'After pp_pruning' | grep -cF '(0)'

2654 errors.
Freeing dictionary en/4.0.dict
Freeing dictionary en/4.0.affix
2772
4.522u 0.536s 0:04.98 101.4%	0+0k 0+8io 0pf+0w

So 2772 sentences fail immediately due to at least one word with 0 disjuncts.

For the current version (5.2.1):ed '/verbosity=/d' data/en/corpus-fixes.batch | link-parser -v=9 -debug=prune.c | grep -F -A1 'After pp_prune' | grep -cF '(0)'

link-grammar: Info: Freeing dictionary en/4.0.dict
link-grammar: Info: Freeing dictionary en/4.0.affix
+ rc=0
master: 13.12u 0.22s 0:13.38e 189300Kb
190

But as you can see for yourself (if you change grep -cF to just grep -F) is absolutely most of the times due to unit alternatives (e.g. did that gets broken to d.u(8) id.u(0)), and the sentence still has to have a costly parse because there are actually no words with 0 disjuncts.

Another thing to consider is parse with FAT links. All the complex (and parse-costly) conjunction links didn't exist on 4.2.5. I don't know much on FAT links, but it may be that the effective sentence length was smaller. Better to compare to 4.8.6 (it is slower than 5.5.1 on the batch benchmarks when even each version uses its own dict).

As I noted in #250, the current aggressive unit split, that didn't exist on 4.2.5 and even not in 4.8.6, causes much slowness. As I also pointed out there, this aggressive split is actually mostly useless because it mostly not supported by the dict. The said aggressive split (that you once added) adds unit alternatives for many words, e.g. mall -> ma l l, did -> d id, TNT -> T N T, As -> A s, All -> A l l. But all of that add much parsing overhead for nothing (as multi unit, beside 1 or 2 exceptions are not supported by the dict).

Proposal: Turn off multi unit split (or add support for them, e.g. that they will be handled as a single unit).

more accurate dictionary might even result in exponentially slower parsing? Is that possible?

Several times I noticed a large slowness due to what seemed as small dict fixes.

ALSO: No regexes in 4.2.5!

Last thing: 5.5.1 using the 4.2.5 dict (after adding an empty 4.0.regex and <UNKNOWN> is slightly faster on both the old and current batches. However, something with the handling of capital words seems totally different so I don't know if this comparison is valid.

@ampli
Copy link
Member Author

ampli commented Nov 6, 2018

I fixed some typos in my previous post so you may want to read it on the web (and not email).

@linas
Copy link
Member

linas commented Nov 7, 2018

Using the currrent parser with the 4.2.5 dicts will not work with an empty regex; to get approximately backwards compat behavior, you need at least this:

NUMBERS: /^[0-9][0-9]?(:[0-9][0-9]?(:[0-9][0-9]?)?)?$/
NUMBERS: /^[0-9.,-]*[0-9](\+\/-[0-9.,-]*[0-9])?$/
NUMBERS: /^[0-9,.]*[0-9]$/

PL-CAPITALIZED-WORDS:  /^[[:upper:]].*[^iuoys'’]s$/

CAPITALIZED-WORDS:     /^[[:upper:]][^'’]*[^[:punct:]]$/

ING-WORDS:        /^\w.+ing$/
S-WORDS:          /^\w.+[^iuoys'’]s$/
ED-WORDS:         /^\w.+ed$/
LY-WORDS:         /^\w.+ly$/

UNKNOWN-WORD: /^[.,-]{4}[.,-]*$/

grepping reveals this:

link-grammar/structures.h:#define ING_WORD         ("ING-WORDS")
link-grammar/tokenize.c:	    if (!guessed_string(sent, i, s, ING_WORD)) return FALSE;

and

static int is_ing_word(char * s) {
    int i=0;
    for (; *s != '\0'; s++) i++;
    if(i<5) return FALSE;
    if(strncmp("ing", s-3, 3)==0) return TRUE;
    return FALSE;
}

@linas
Copy link
Member

linas commented Nov 7, 2018

Yeah, maybe turning off aggressive unit stripping is a good idea. Or maybe making it into a flag. At one point, I was trying to parse biochem papers, and at that time, it seemed like a good idea.

@linas
Copy link
Member

linas commented Nov 7, 2018

More generally, if a sentence does not parse quickly and easily, what is the fallback strategy? Currently its ad-hoc:

  • Try skipping one or more words
  • Use panic-options on timeout.

Should we have unit stripping turned off by default, and then try aggressive unit stripping before trying to ignore some words? Maybe not a bad idea...

Then an oldie-but-a-goodie: what about newspaper headlines, which are short of determiners?

@ampli
Copy link
Member Author

ampli commented Nov 7, 2018

I checked again the overhead of multi-unit stripping, and at least in the fixes batch it is now only a few percents (maybe due to the automatic subscript addition that was recently added). I guess that for very long sentences it can be more.

But I'm for leaving the multi-split as is, and instead fixing the dict to use it.
For example, Nm gets split to N m, but the dict doesn't recognize it (currently no Nm.u, so it can be used as a test case).
(BTW, Kgm is getting split wrongly to K g m instead of Kg m - seems like a bug that can be fixed).

I thought to use it as a test case of "general idiom definitions".
Something like:

<length-unit>: m.u OR ft.u OR ...;
<force-unit>: N.u OR Kg.u. OR ...;
<torque-unit>: <force-unit>_<length-unit>;
<UNITS>: <units-suffix> OR <torque-unit>;
LENGTH-LIMIT-1: ID*+;

And maybe also something like that, if the split will be done even more aggressively:

<torque-unit>: <force-unit>_<length-unit> OR <force-unit>_*_<length-unit>;

(The * above can be also a macro of possible unit separating tokens.)

Then an oldie-but-a-goodie: what about newspaper headlines, which are short of determiners?

#402 is in a WIP stage. I based it on an implementation of sub-expression tags, when the tags in that case are used for per-dialect costs. (They can also be used for other expression manipulations like the dict-capitalization.)

We will also need to find out how to set the dialect if a sentence is not parsed. For example, how can we know if the reason is that it is short of determiners? An easy case may be if there are no determiners at all.

But:

We can also always parse with a high disjunct_cost (say disjunct_cost=10) and implement a cost cutoff as a postprocessing. The cost_max for this cutoff will still be usually the normal one (2.7), and the higher-scored sentences will be considered only if there are no ones <= cost_max. Such a strategy may also automatically identify the dialects (by looking back from which sub-expressions the chosen_disjuncts got generated).
I tested running the fixes batch with -cost-max=10 and the overhead is "only" 25-30%, with no more combinatorial explosions than with the default cost-max - only one in both cases. (Some more optimizations may reduce this performance gap.)

@linas
Copy link
Member

linas commented Nov 7, 2018

I like all these ideas. However, my intent is to never hand-write dictionaries again, and just automate everything. The automated data will have cost attached to everything, including morphology splits. Sorry I don't have dicts for you to play with, I keep getting distracted by other things.

@ampli ampli deleted the prune-speedup branch November 17, 2018 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants