Prune speedup #845

ampli · 2018-11-05T16:54:43Z

Corpus batch benchmarks (10x10 runs):
en basic: 2+%
en fixes: 3+%
ru: ~0.5%

Individual runs of a small sample of very long sentences shows ~5% speed up.
Runs with -verbosity=2 show a big power_prune() speedup. However, power_prune() currently takes only a small fraction of the total CPU, so the total saving is not as big.

The largest speedup is due to the shallow/deep connector ordering that allows an early termination of the loop in *_table_search() , that is implemented in the following commit:
prune.c: Matching shallow connectors: Stop earlier

Some other changes didn't show a noticeable save in their benchmarks, so they are not included here:

Using memory pools for pp_prune().
Consolidating all the memory allocations and initializations in power_table_new().
But now I may know better and can improve their implementation, so I may include them in a future PR.

Some other changes in this PR:

Reformat touched code and localize some of its variables.
Add some code for ALT_*_CONSISTENCY while pruning (still commented out, more changes are needed in order to make more efficient).
Make power_prune() static.
Define lc_easy_match() as inline (I guess it got inlined anyway).

(I also have major exprune.c changes with a greater speedup impact, to be sent soon.)

BTW, here is a power_prune() speedup idea for a future check:
Eliminate bool shallow in struct c_list_s, in order that it will be exactly half CPU cache line size.

If we had Connector_struct in the expressions, it could be added there. It is not clear if using Connector_struct in expressions will worth their memory handling overhead, but they have other advantages that may offset this overhead.
Alternatively, the hashing can be done separately for shallow/deep connectors (with the expense of using some more memory), and this will save the "shallow" checks in *_table_search() loop.

Instead of copying the list (in reverse order) while making the deletions, just delete in-place, preserving the list order. (Preserving the order is important for the next change that depends on the shallow connectors to always be first in the list.)

Connectors cannot match when both are non-shallow. Now the list of connectors contains mixed shallow/non-shallow, and the whole list needs scanning, when just before the match try False is returned as the match result. Implement the following for stopping earlier in the left/right_table_search() matching (possible_connection()) loop; 1. On power_table_new(), arrange for the shallow connectors to be first in the table, by inserting them last (they are inserted in reverse order). 2. On left/right_table_search(), if we are matching a non-shallow connector, stop when we don't have any more shallow ones to try. Note that in a previous commit cleanup_table() got changed not to change the table order, a thing that is critical for this change.

This allows the code to compile if defined. It still causes slowness so it remains undefined.

Also delete disjuncts in-place instead of copying the remaining ones to a new list.

Validate that all BAD_WORDS has been removed.

This saves some CPU by avoiding a long zero-byte padding.

(GCC would most probably inline it anyway, but I don't know about other compilers.)

At the same occasion: - Reformat to the new code style. - Localize variables.

link-grammar/parse/prune.c

linas · 2018-11-06T05:36:21Z

looks good.

linas · 2018-11-06T07:01:38Z

So. Stupid computer tricks. Just to see how things are going, I thought to measure the speed of the ten-year-old version 4.2.5 ... had to do some hacking to get past a broken file-search-path design. I was shocked to find that it is 3x faster.

Let that sink in. Basically, 4.0.dict is much much more complex today, than it was back then, so that its at least three times slower. Of course, back then, corpus-fixes.batch had 2654 errors vs 385 errors today. So the complex dictionary is a huge impact on performance.

Lets try a different experiment: run today's parser with the old dictionary. Easy to say. Hard to do. For multiple reasons. Conclude: today's parser is almost 2x faster than the old parser.

What to conclude? The "almost 2x faster" result is ... great! But that also means that the more complex dict is 6x slower (since 6 = 2x3). So what does this mean? Well, part of me wants to say that this is like the fractal "length of the coastline of England" -- the more complex the dictionary, the more fractal language coastline can it cover, more accurately. But that's just hand-wavey meaningless blather.

Its still interesting, that a more accurate dictionary might even result in exponentially slower parsing? Is that possible?

ampli · 2018-11-06T11:43:14Z

Yes. It seems it is more subtle than what it seems at first glance.
Among other things, the 4.2.5 dict is not "so supportive" relative to the current dict, that after the prune stage many sentences have at least one word with zero disjuncts, a thing that cause a very fast parse failure.

I made the following tests:

Version 4.2.5:

sed 's/verbosity=1/verbosity=9/' ../master/data/en/corpus-fixes.batch | env LD_LIBRARY_PATH=link-grammar/.libs link-grammar/.libs/grammar-parse | grep -A1 'After pp_pruning' | grep -cF '(0)'

2654 errors.
Freeing dictionary en/4.0.dict
Freeing dictionary en/4.0.affix
2772
4.522u 0.536s 0:04.98 101.4%	0+0k 0+8io 0pf+0w

So 2772 sentences fail immediately due to at least one word with 0 disjuncts.

For the current version (5.2.1):ed '/verbosity=/d' data/en/corpus-fixes.batch | link-parser -v=9 -debug=prune.c | grep -F -A1 'After pp_prune' | grep -cF '(0)'

link-grammar: Info: Freeing dictionary en/4.0.dict
link-grammar: Info: Freeing dictionary en/4.0.affix
+ rc=0
master: 13.12u 0.22s 0:13.38e 189300Kb
190

But as you can see for yourself (if you change grep -cF to just grep -F) is absolutely most of the times due to unit alternatives (e.g. did that gets broken to d.u(8) id.u(0)), and the sentence still has to have a costly parse because there are actually no words with 0 disjuncts.

Another thing to consider is parse with FAT links. All the complex (and parse-costly) conjunction links didn't exist on 4.2.5. I don't know much on FAT links, but it may be that the effective sentence length was smaller. Better to compare to 4.8.6 (it is slower than 5.5.1 on the batch benchmarks when even each version uses its own dict).

As I noted in #250, the current aggressive unit split, that didn't exist on 4.2.5 and even not in 4.8.6, causes much slowness. As I also pointed out there, this aggressive split is actually mostly useless because it mostly not supported by the dict. The said aggressive split (that you once added) adds unit alternatives for many words, e.g. mall -> ma l l, did -> d id, TNT -> T N T, As -> A s, All -> A l l. But all of that add much parsing overhead for nothing (as multi unit, beside 1 or 2 exceptions are not supported by the dict).

Proposal: Turn off multi unit split (or add support for them, e.g. that they will be handled as a single unit).

more accurate dictionary might even result in exponentially slower parsing? Is that possible?

Several times I noticed a large slowness due to what seemed as small dict fixes.

ALSO: No regexes in 4.2.5!

Last thing: 5.5.1 using the 4.2.5 dict (after adding an empty 4.0.regex and <UNKNOWN> is slightly faster on both the old and current batches. However, something with the handling of capital words seems totally different so I don't know if this comparison is valid.

ampli · 2018-11-06T12:22:51Z

I fixed some typos in my previous post so you may want to read it on the web (and not email).

linas · 2018-11-07T01:43:58Z

Using the currrent parser with the 4.2.5 dicts will not work with an empty regex; to get approximately backwards compat behavior, you need at least this:

NUMBERS: /^[0-9][0-9]?(:[0-9][0-9]?(:[0-9][0-9]?)?)?$/
NUMBERS: /^[0-9.,-]*[0-9](\+\/-[0-9.,-]*[0-9])?$/
NUMBERS: /^[0-9,.]*[0-9]$/

PL-CAPITALIZED-WORDS:  /^[[:upper:]].*[^iuoys'’]s$/

CAPITALIZED-WORDS:     /^[[:upper:]][^'’]*[^[:punct:]]$/

ING-WORDS:        /^\w.+ing$/
S-WORDS:          /^\w.+[^iuoys'’]s$/
ED-WORDS:         /^\w.+ed$/
LY-WORDS:         /^\w.+ly$/

UNKNOWN-WORD: /^[.,-]{4}[.,-]*$/

grepping reveals this:

link-grammar/structures.h:#define ING_WORD         ("ING-WORDS")
link-grammar/tokenize.c:	    if (!guessed_string(sent, i, s, ING_WORD)) return FALSE;

and

static int is_ing_word(char * s) {
    int i=0;
    for (; *s != '\0'; s++) i++;
    if(i<5) return FALSE;
    if(strncmp("ing", s-3, 3)==0) return TRUE;
    return FALSE;
}

linas · 2018-11-07T01:45:48Z

Yeah, maybe turning off aggressive unit stripping is a good idea. Or maybe making it into a flag. At one point, I was trying to parse biochem papers, and at that time, it seemed like a good idea.

linas · 2018-11-07T01:57:46Z

More generally, if a sentence does not parse quickly and easily, what is the fallback strategy? Currently its ad-hoc:

Try skipping one or more words
Use panic-options on timeout.

Should we have unit stripping turned off by default, and then try aggressive unit stripping before trying to ignore some words? Maybe not a bad idea...

Then an oldie-but-a-goodie: what about newspaper headlines, which are short of determiners?

ampli · 2018-11-07T18:41:21Z

I checked again the overhead of multi-unit stripping, and at least in the fixes batch it is now only a few percents (maybe due to the automatic subscript addition that was recently added). I guess that for very long sentences it can be more.

But I'm for leaving the multi-split as is, and instead fixing the dict to use it.
For example, Nm gets split to N m, but the dict doesn't recognize it (currently no Nm.u, so it can be used as a test case).
(BTW, Kgm is getting split wrongly to K g m instead of Kg m - seems like a bug that can be fixed).

I thought to use it as a test case of "general idiom definitions".
Something like:

<length-unit>: m.u OR ft.u OR ...;
<force-unit>: N.u OR Kg.u. OR ...;
<torque-unit>: <force-unit>_<length-unit>;
<UNITS>: <units-suffix> OR <torque-unit>;
LENGTH-LIMIT-1: ID*+;

And maybe also something like that, if the split will be done even more aggressively:

<torque-unit>: <force-unit>_<length-unit> OR <force-unit>_*_<length-unit>;

(The * above can be also a macro of possible unit separating tokens.)

Then an oldie-but-a-goodie: what about newspaper headlines, which are short of determiners?

#402 is in a WIP stage. I based it on an implementation of sub-expression tags, when the tags in that case are used for per-dialect costs. (They can also be used for other expression manipulations like the dict-capitalization.)

We will also need to find out how to set the dialect if a sentence is not parsed. For example, how can we know if the reason is that it is short of determiners? An easy case may be if there are no determiners at all.

But:

We can also always parse with a high disjunct_cost (say disjunct_cost=10) and implement a cost cutoff as a postprocessing. The cost_max for this cutoff will still be usually the normal one (2.7), and the higher-scored sentences will be considered only if there are no ones <= cost_max. Such a strategy may also automatically identify the dialects (by looking back from which sub-expressions the chosen_disjuncts got generated).
I tested running the fixes batch with -cost-max=10 and the overhead is "only" 25-30%, with no more combinatorial explosions than with the default cost-max - only one in both cases. (Some more optimizations may reduce this performance gap.)

linas · 2018-11-07T20:14:27Z

I like all these ideas. However, my intent is to never hand-write dictionaries again, and just automate everything. The automated data will have cost attached to everything, including morphology splits. Sorry I don't have dicts for you to play with, I keep getting distracted by other things.

ampli added 17 commits November 4, 2018 01:26

Avoid zero_connector_table() also here, if nothing deleted

af84685

Allocate prune_context on the stack

06c44b1

power_table_new(): Add a comment on the hash table construction

d018ad5

prune.c ALT_*_CONSISTENCY: Add missing include file

180cee0

This allows the code to compile if defined. It still causes slowness so it remains undefined.

prune.c ALT_*_CONSISTENCY: Add OPTIMIZE_EN code

3cbcac2

power_prune(): Move total_deleted accumulation out of inner loop

8971db9

power_prune(): Unify the disjunct scan and delete loops

03f845c

Also delete disjuncts in-place instead of copying the remaining ones to a new list.

power_prune(): Localize word loop variables

1cdb63c

power_prune(): Use initializations

8690304

power_prune(): Rearrange prune_context initialization

5e9f175

possible_connection(): Add DEBUG assert for BAD_WORD

f812041

Validate that all BAD_WORDS has been removed.

rule_satisfiable(): Use strdupa() instead of strncpy()

9ea3e61

This saves some CPU by avoiding a long zero-byte padding.

power_prune(): Make it static

067ca68

connectors.h: Define lc_easy_match() as inline

f9e8741

(GCC would most probably inline it anyway, but I don't know about other compilers.)

power_table_new(): Consolidate the connector insert loops

0b9a5f7

At the same occasion: - Reformat to the new code style. - Localize variables.

linas reviewed Nov 6, 2018

View reviewed changes

link-grammar/parse/prune.c Show resolved Hide resolved

linas merged commit e51657a into opencog:master Nov 6, 2018

ampli deleted the prune-speedup branch November 17, 2018 08:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prune speedup #845

Prune speedup #845

ampli commented Nov 5, 2018

linas commented Nov 6, 2018

linas commented Nov 6, 2018

ampli commented Nov 6, 2018 •

edited

Loading

ampli commented Nov 6, 2018

linas commented Nov 7, 2018 •

edited

Loading

linas commented Nov 7, 2018

linas commented Nov 7, 2018

ampli commented Nov 7, 2018

linas commented Nov 7, 2018

Prune speedup #845

Prune speedup #845

Conversation

ampli commented Nov 5, 2018

linas commented Nov 6, 2018

linas commented Nov 6, 2018

ampli commented Nov 6, 2018 • edited Loading

ampli commented Nov 6, 2018

linas commented Nov 7, 2018 • edited Loading

linas commented Nov 7, 2018

linas commented Nov 7, 2018

ampli commented Nov 7, 2018

But:

linas commented Nov 7, 2018

ampli commented Nov 6, 2018 •

edited

Loading

linas commented Nov 7, 2018 •

edited

Loading