Setting the ZZZ connector length_limit to 1 #214

ampli · 2015-11-26T01:11:03Z

In en/corpus-fixes.batch there are about 22K ZZZ actual connector matchings to other ZZZ connectors with distance>1.
In ru/corpus-basic.batch there are about 7K such comparisons.

These comparisons, which naturally succeed and add to the parsing time, could be eliminated.
However, for short sentences (< short_len) it is currently better to avoid altogether the length_limit setup in the standard parser, because it is done in a loop (set_connector_length_limits()) which takes more CPU time than the saving.

But in case of long sentences, in which the said loop is done anyway, we can also set the ZZZ length_limit to 1 with a negligible overhead (in the SAT solver - even for short sentences).

The comparison to ZZZ can be done now using string_set_cmp(). However, there is a need to perform (one time per sentence) string_set_add(EMPTY_CONNECTOR, sent->dict->string_set); in order to get its string-set address.

If all of that seems as a good idea I can send a pull request.

The text was updated successfully, but these errors were encountered:

linas · 2015-11-26T05:02:50Z

Sure, seems plausible to me. I don't understand what EMPTY-WORD does, given that the only ZZZ+ is at the start of the sentence ... do we really need EMPTY-WORD?

ampli · 2015-11-26T17:10:18Z

I'm not sure I understand what you don't understand...

Here is a summary on EMPTY-WORD - it surely has nothing new for you:
EMPTY-WORD was originally added to support Russian stems with a null suffix. It is still needed (in all the languages now) to handle alternatives with different number of words in the word array that the parser uses. If the parser could work on the word-graph directly then we would not need the EMPTY-WORD device.

The ZZZ+ on the LEFT-WALL was added in order to support the quotation mark tokens, that for now has ZZZ- (as if they were EMPTY-WORDs).

So we still need it. Since EMPTY-WORD and quotation mark tokens has ZZZ- and every word has ZZZ+, the parser may try to match the ZZZ- with word distance of more than 1, which is a useless try. (I don't know about any sentence example in which such links of ZZZ- and ZZZ+ with a word distance > 1 are actually created, but if this would happen it would of course be a bogus linkage.)

So for now I will (eventually) send a pull request to set the ZZZ distance to 1 for sentences in which short_len is getting set, for the slight speedup it makes possible. Later I would like to try a more efficient solution, of doing the whole setting of length_limit once in sentence-size O(1) complexity (there are 2 ways to do that, I will have to test both of them).

linas · 2015-11-27T20:58:20Z

I think I forget that every word automatically gets an optional ZZZ+ . I wonder if this is documented on the website.

ampli · 2015-12-05T08:24:58Z

I think it is not documented there at all.

BTW, I found an example of ZZZ links that get linked with a distance > 1, in case there are null words in the linkage:

We should distinguish between "who" and "whom" but who fucking cares?

Found 12 linkages (8 had no P.P. violations) at null count 4
    Linkage 1, cost vector = (UNUSED=4 DIS= 2.00 LEN=43)

    +------------------------------------------------Xp-----------------------------------------------+
    +-------------------------------------------->WV-------------------------------------------->+    |
    |      +-----------------------CC----------------------+----------Ws---------+------Ss*w-----+    |
    +--Wd--+--Sp-+-----I----+-------ZZZ-------+--ZZZ--+    +ZZZ+---ZZZ--+        +--EL--+        |    |
    |      |     |          |                 |       |    |   |        |        |      |        |    |
LEFT-WALL we should.v distinguish.v [between] " [who] " and.ij " [whom] " [but] who fucking.e cares.v ?

(Not related to this issue: Such sentences could parse correctly if we allowed "parallel regexes",
which experimental support has been already removed (a regex like: /"[^"]*"/). However, additional infrastructure is needed in order to prevent the spurious parses it may cause. It seems to me it has something in common with the zero word support, and with the problem of splitting words when their component morphemes are in the dictionary but the linkage between them is not supported by the dictionary. I guess it is better to open a new issue on that.)

linas · 2015-12-05T20:30:45Z

There is now documentation for ZZZ on the website. Its minimal.

I did a lot of work last night on a QU link type, which handles simpler paraphrasing quotations correctly, although it still messes up on more complex cases. That means that ZZZ is no longer used for those cases.

The place where ZZZ does get used is for these "random quotes" where people just "randomly" use "quotes" for "no reason".

Its OK to limit distance to 1 if the underlying sentence isn't parsing anyway. Such a limit does not make things worse.

linas · 2018-02-27T03:50:12Z

issue #632 cause work in this area to happen. Is this issue still alive, or can it be closed?

ampli · 2018-02-27T08:49:53Z

Its OK to limit distance to 1 if the underlying sentence isn't parsing anyway. Such a limit does not make things worse.

If this is not considered a problem, then yes, it can be closed.

Anyway, I still investigate handling of stray quotes as a totally unlinked tokens, with not even null links.
In that case this last usage of ZZZ will not be needed.

ampli mentioned this issue Nov 26, 2015

Connector length-limit setting improvments #213

Merged

linas mentioned this issue Feb 27, 2018

Implementing connector length_limit per connector name #632

Closed

ampli closed this as completed Feb 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting the ZZZ connector length_limit to 1 #214

Setting the ZZZ connector length_limit to 1 #214

ampli commented Nov 26, 2015

linas commented Nov 26, 2015

ampli commented Nov 26, 2015

linas commented Nov 27, 2015

ampli commented Dec 5, 2015

linas commented Dec 5, 2015

linas commented Feb 27, 2018

ampli commented Feb 27, 2018

Setting the ZZZ connector length_limit to 1 #214

Setting the ZZZ connector length_limit to 1 #214

Comments

ampli commented Nov 26, 2015

linas commented Nov 26, 2015

ampli commented Nov 26, 2015

linas commented Nov 27, 2015

ampli commented Dec 5, 2015

linas commented Dec 5, 2015

linas commented Feb 27, 2018

ampli commented Feb 27, 2018