Add !!word functionality #1083

ampli · 2020-01-19T08:29:11Z

This PR makes intensive changes in order to help debugging of dictionary expressions.
This is important for the next changes, that may need such debugging.

The added functionality is documented in !help !.
Also, I added the following in command-line.c:

A short description in the command table (displayed as first line of !help !).
An additional line in !help.
(Changes may be needed to make it clearer.)

I looked at the macro annotation of test.n and test.v, and several macros appear more than once.
It may be intentional but it looks suspected.
The disjunct list also have many entries that seems strange to me (especially those with duplicate disjuncts).
Note also some very strange high costs. This happens due to the "strange" algo used for cost cutoff. I checked in 5.0.8 and the generated disjunct lists are the same (for the same expressions) per given !max-cost (i.e. the same high costs). In case such disjuncts are not useful, they just slow the parsing.

Main changes:

Disjunct display with optional regex filtering. It depends on !cost-max.
An option for macro annotation in expression display.
An option for low level expression display, including macro and dialect indications.
Expressions are displayed after applying dialect info (and thus affected by !dialect).
The expressing stringifying code has been rewritten. Now extra (redundant) parens at the same level are always shown. I chose to still show macros inside parens (that are an artifact of the expression construction because they must be wrapped in a unary AND).
Help text for !.
!<macro> clean display.

Forced-push to update the help text.

ampli · 2020-01-19T16:03:58Z

Here is more info on the disjunct display (that is not yet documented in the help file).

linkparser> !!test.n//
Token "test.n" matches:
    test.n                        8509  disjuncts <en/words/words.n.1-const>


Token "test.n" disjuncts:
    test.n 4273/4501 disjuncts

          test.n: [0]2.000= @AN @A <--> AN
          test.n: [1]2.000= @AN @A <--> AN NMa
...

The displayed disjuncts (4273 ones) are after duplicate elimination.
The number before duplicate elimination is 4501. It is less than 8509 due to the cost cutoff.

Flags may come after the last / but there are no useful ones for now.
It is possible to add a flag to display without duplicate elimination but I didn't find it useful.
It is also possible to add a flag to sort according to cost, or to always sort according to cost.
It is possible (and simple) to add a paginator in link-parser (that can serve for other things too).

I can add a flag to show the source macro for each connector. However, I don't know in which format to display that. One possibility is to use the rest of the line (after the connectors). This will cause the lines to be very long and to fold, but it will make filtering by /regex/ easier.

For expression display, there are 2 flags (documented in the help files). I can add more, but I don't know how much it is useful. For example:

Marking shallow and deep connectors. In case a word has several links, the link from a shallow connector is longer than ones from deep connectors.
Simplify the expressions.

ampli · 2020-01-19T16:53:46Z

I said above:

It is possible (and simple) to add a paginator in link-parser (that can serve for other things too).

The code for wordgraph display can be reused in link-parser to invoke a pager.

BTW, I now think it could be a design error to invoke the wordgraph display process from within the library. Instead, the API call could just return a string in the DOT language and leave it for the UI (link-parser in that case) to invoke the display process.

ampli · 2020-01-19T17:54:24Z

Forced-push with 2 cleanups.
However, I noted a strange regression in the non-debug compilation: The dialect tags are not printed in the expressions. I still cannot understand how this may happen (in debug mode all is fine).
So you may want to wait with the application of this PR (but you can apply it nevertheless and I will submit a fix when I dins the problem).

ampli · 2020-01-19T18:48:59Z

The created expressions are different in DEBUG and non-DEBUG modes, so something bad happens.
I investigate that.
Please don't apply this PR.

linas · 2020-01-19T20:00:23Z

ok

ampli · 2020-01-20T15:50:50Z

I fixed the problem (introduced in PR #1079) ) in commit "make_expression(): Fix comparing to the wrong Exp field".
I also introduced several improvements, including a full description in !help ! (please check if it cane be made clearer).

BTW, in the forced-push commit display, the commits are not in the same order as in my git branch. I hope this doesn't have bad implications.

This PR can be applied now.

ampli · 2020-01-20T15:52:18Z

For these I still need your input:

I can add a flag to show the source macro for each connector. However, I don't know in which format to display that. One possibility is to use the rest of the line (after the connectors). This will cause the lines to be very long and to fold, but it will make filtering by /regex/ easier.

For expression display, there are 2 flags (documented in the help files). I can add more, but I don't know how much it is useful. For example:

Marking shallow and deep connectors. In case a word has several links, the link from a shallow connector is longer than ones from deep connectors.
Simplify the expressions.

Don't split it.

linas · 2020-01-21T04:05:26Z

So, !!test.n// shows a numbers list, but what is the numbering? There's no apparent sort order.

linas · 2020-01-21T04:13:21Z

Marking shallow and deep connectors.

? Would would that marking look like? Isn't shallow/deep already more-or-less explicit with !!test.n// ?

linas · 2020-01-21T04:40:36Z

I can add a flag to show the source macro for each connector.

Yes, this would be nice. One could show the deepest macro only, or show the whole macro chain. So, for example,

linkparser> this is a big test
Found 8 linkages (8 had no P.P. violations)
	Linkage 1, cost vector = (UNUSED=0 DIS=-0.10 LEN=9)

                   +-----Ost-----+
    +----->WV----->+  +---Ds**x--+
    +-->Wd---+-Ss*b+  +PHc+---A--+
    |        |     |  |   |      |
LEFT-WALL this.p is.v a big.a test.n

Let pretend this is wrong. Why is it wrong?

linkparser> !dis
Display of disjuncts used turned on.
linkparser> this is a big test
Found 8 linkages (8 had no P.P. violations)
	Linkage 1, cost vector = (UNUSED=0 DIS=-0.10 LEN=9)

                   +-----Ost-----+
    +----->WV----->+  +---Ds**x--+
    +-->Wd---+-Ss*b+  +PHc+---A--+
    |        |     |  |   |      |
LEFT-WALL this.p is.v a big.a test.n

            LEFT-WALL     0.000  hWd+ hWV+ RW+
               this.p     0.000  Wd- Ss*b+
                 is.v     0.000  Ss- dWV- O*t+
                    a     0.000  PHc+ Ds**x+
                big.a    -0.100  PHc- A+
               test.n     0.000  @A- Ds**x- Os-
           RIGHT-WALL     0.000  RW-

Gee, I think that test.n 0.000 @A- Ds**x- Os- is wrong, or strange I'm not sure, so I want to know, where did @A- Ds**x- Os- come from? Let's try

!!test.n/@A- Ds**x- Os-/m

Well, that doesn't work because I cannot cut-n-paste from the disjunct display to the regex search ... But lets pretend that this worked ... ideally, I would see something similar to this:

<common-const-noun>: (
    <common-phonetic>: (
        <noun-modifiers>: @A-) &
     <nn-modifiers>: Ds**x  &
     <noun-main-s>: (
            <CLAUSE>: Os- ))

I used both indentation, and parenthesis above, but I think only one or the other is needed (the indentation is the same as the open-paren count). Its probably easier to read without the parens. With the above printout, I can immediately jump to the correct locations in the dictionary, and study how or why they might be right/wrong.

The biggest problem with this suggestion is that !!test.n/@A- Ds**x- Os-/m is not a regex. And I would hate to have to insert escape-backslashes to turn it into a valid regex. So maybe it should be ... !!test.n{@A- Ds**x- Os-}m or !!test.n[@A- Ds**x- Os-]m or !!test.n#@A- Ds**x- Os-#m or !!test.n;@A- Ds**x- Os-;m .. I dunno. There is also the question: how can I search for !!test.n/@A- Ds**x- (wildcard)/m ? Maybe that is not important, maybe the existing !!test.n/@A Ds .*<---/m is enough? Except it doesn't work:

linkparser> !!test.n/@A Ds .* <-/m
Token "test.n" matches:
    test.n                        8509  disjuncts <en/words/words.n.1-const>


Token "test.n" disjuncts:
    test.n 4273/4501 disjuncts

(0 disjuncts matched)

so I'm confused about that (let's pretend I'm a naive user and never really how to use a regex, except for very simple ones...)

linkparser> !!test.n/@A Ds/m
Token "test.n" matches:
    test.n                        8509  disjuncts <en/words/words.n.1-const>


Token "test.n" disjuncts:
    test.n 4273/4501 disjuncts

(0 disjuncts matched)

linkparser> !!test.n/A Ds/m
Token "test.n" matches:
    test.n                        8509  disjuncts <en/words/words.n.1-const>


Token "test.n" disjuncts:
    test.n 4273/4501 disjuncts

(0 disjuncts matched)

oh wait, is deep-shallow ordering reversed?

linkparser> !!test.n/D @A/
Token "test.n" matches:
    test.n                        8509  disjuncts <en/words/words.n.1-const>


Token "test.n" disjuncts:
    test.n 4273/4501 disjuncts
          test.n: [3500]2.000= DD @AN @A <--> GN
          test.n: [3513]0.000= DD @A <--> GN
          test.n: [3526]0.100= DD @AN <--> GN
          test.n: [3539]3.100= DD @AN @A @AN <--> GN
          test.n: [3552]1.100= DD @A @AN <--> GN

(5 disjuncts matched)

linkparser> !!test.n/Ds**x @A/
Token "test.n" matches:
    test.n                        8509  disjuncts <en/words/words.n.1-const>


Token "test.n" disjuncts:
    test.n 4273/4501 disjuncts

(0 disjuncts matched)

Argh! regex is confusing.

BTW, I think the typical use case will be to search for one disjunct only. I think that wild-card searches will be uncommon, and I cannot imagine using regex, even if it worked well... (but its late at night and I'm sleepy so my imagination is impaired).

ampli · 2020-01-21T09:08:48Z

So, !!test.n// shows a numbers list, but what is the numbering? There's no apparent sort order.

This is the order as produce by build_disjunct(). The numbering is just sequential.
Flags can be added to sort it according to:

Cost.
LHS connectors and RHS connectors (with a selection which is the main sort key).
Number of connectors.

When you filter by regex, change max_cost, or apply a dialect, the numbering is redone.
Edit:
A code may be added to to preserve the numbering in the case of filtering by regex (not in the other cases. E.g. if you see disjunct number 1234 it will always be the exact same disjunct disregarding the regex filter you use (maybe this may ease debugging disjuncts).
No need - this is already done by default.

ampli · 2020-01-21T09:35:23Z

? Would would that marking look like? Isn't shallow/deep already more-or-less explicit with !!test.n//

In !!test.n//, the leftmost disjuncts on each jet is the shallow one.
However, on expression display it is usually unclear which ones are shallow or deep. Many of the connectors can be both shallow and deep in the generated disjuncts, but some of them are only shallow or only deep. If you know that a connector is only shallow,in case of a several connectors on a jet, it must connect to the farthest position (the deeper one must connect to the nearest position).
However:

I still don't know if this is useful when authoring a dict.
I hoped that using this info, I will be able to make expression_prune() faster, because it will be able drop connectors without even trying to match them, and even if they would match otherwise (like done in power_prune()), and then power_prune() will have less work to do. But for some reason I didn't get net speedup (marking takes time too).
My current demo code only detects shallow connectors and not the deepest ones. Of course it can be extended. It also marks as shallow all the connectors that may be shallow on at least one disjunct. This is not useful. It is most useful to have these marks instead: surly shallow, maybe shallow, surely deepest, maybe deepest.

Example from the demo output (the ones ending with the d mark cannot be shallow, all the rest may be):
(XXXGIVEN+s) or ((((((((@A-s & {[[@AN-s]]}) or [@AN-s]0.100 or ([[@AN-d]0.100 & @A-s] & {[[@AN-s]]}) or ())) & (((({@M+d} & dSJls+s) or ({[@M+s]} & dSJrs-s))) or (GN+s & (DD-s or [()])) or Us-s or ({Ds-d} & [Wa-s]0.050 & ({Mf+s} or {NM+s})))) or ((((@A-s & {[[@AN-s]]}) or [@AN-s]0.100 or ([[@AN-d]0.100 & @A-s] & {[[@AN-s]]}))) & (({NMa+d} & AN+s)
`

ampli · 2020-01-21T09:38:44Z

It is also possible to show the expressions with cost cutoff (same as done with disjuncts), in which case they are more compact. This may be useful because it will show only what is actually used.

ampli · 2020-01-21T10:59:22Z

!!test.n/@A- Ds**x- Os-/m
Well, that doesn't work because I cannot cut-n-paste from the disjunct display to the regex search

In any case the m flag is still unsupported for disjunct list because I didn't know what the desired output is. Now I know, and will try to implement that.

And I would hate to have to insert escape-backslashes to turn it into a valid regex.

Because the default regex engine is PCRE (I guess it is installed in your system, actually PCRE2 if installed), you can preceed the regex with \Q in order to make it literal. E.g., this would work:
!!test.n/\QDs**x/
Moreover, you can use both literal mode and regex mode by ending the literal part with \E.
But see also the r flag suggestion below.

because I cannot cut-n-paste from the disjunct display to the regex search

The problem is that the disjunct display doesn't include the connector signs. Possible solutions:

Always add the connector signs, e.g. instead of displaying
test.n: [3487]2.600= dRJrc @hCOd Ds**x @A @AN <--> Ss*s Bs R NM
display it as:
test.n: [3487]2.600= dRJrc- @hCOd- Ds**x- @A- @AN- <--> Ss*s+ Bs+ R NM+
Like 1, but when a flag for that is used.
Like 1, but only when the pattern is quoted and includes connector signs (but it looks too crazy).

Using solution(), one could do (using flag s for "use connector signs"):
!!test.n/@A- Ds**x- Os-/ms

oh wait, is deep-shallow ordering reversed?

The deep-shallow ordering of expressions and disjuncts is indeed different... It was always so in the library disjunct and connector-list display code. This makes a problem for cut&paste, unless, for example, the s flag above auto-reverses the order. This is indeed a problematic complication.

BTW, I think the typical use case will be to search for one disjunct only. I think that wild-card searches will be uncommon, and I cannot imagine using regex, even if it worked well...

Here are a useful regex searchs:

Show duplicate sequential connectors:
!!test.v/( @?\w+\b)\1\b/
(note the subtelity of the need for \b ...)
You get something like:

...
          test.v: [66112]0.000= dIV B*m dIV I @E <--> VC O VC VC
...
          test.v: [70679]0.000= B*j @E @E VJrpi I @E <--> O
...

Show very long connector lists (6 or more):
!!test.v/^\s*([\w.-]+):? ([^>]*> )?\S+(:? @?[a-z]?[A-Z]+[a-z*]*){6,}/
(You will find here lists of up to 9 connectors. In the dictionary there are words with lists of up to 13 connectors.)
Search connector sequences with slight variations:
!!test.n/ J[sk] D[\w*]+c/

However, if the regular searches are expected to be of literals, a flag r can be added for regex search.

linas · 2020-01-21T19:25:31Z

OK, clarifications:

deep-shallow ordering

So, this is a bug: the printout should match the same order as what is in 4.0.dict. I thought I fixed that bug, once, but maybe not. Connector ordering is already confusing; to have to mentally reorder it, some of the time, for some cases .. that's just bad.

regex

I'm fairly certain that 90% of all dictionary debugging flow will work like in the "this is a big test" example. So this is the flow that must be natural, easy-to-use, and obvious. I don't mind using \Q to quote -- this is a good idea. But I do mind is having open a browser tab, to search for regex documentation, to read the regex documentation, and then go back to what I was doing. It's a complete waste of time and mental energy. So complex regex patterns are almost useless, to me, I will probably never-ever use them. (basically, my brain is already fully occupied trying to solve a linguistic problem; I don't want to also, at the same time, solve a regex problem.)

By contrast, these two I like:

!!test.n/\QDs**x/

and

!!test.n/ J[sk] D[\w*]+c/

and both should be mentioned in !help !

disjunct display doesn't include the connector signs

I think I like solution 1 the best, mostly because it takes the least amount of work/effort (least amount of reading the docs).

ampli · 2020-01-22T07:33:32Z

So, this is a bug: the printout should match the same order as what is in 4.0.dict. I thought I fixed that bug, once, but maybe not. Connector ordering is already confusing; to have to mentally reorder it, some of the time, for some cases .. that's just bad.

I will make the printout of disjuncts and connector lists consistent with expression order. Only debug output is involved - not anything with official API output.

I will also implement the rest according to your above post.

ampli force-pushed the bangbang branch from c7a4bd9 to 6c83887 Compare January 19, 2020 17:51

ampli added 6 commits January 20, 2020 16:26

make_expression(): Fix comparing to the wrong Exp field

0b096aa

link-parser: Fix command truncation after white space

c021b62

link-parser: Allow white space in !! command

227677f

build_disjunct(): Reduce indentation

fff7178

print_sentence_word_alternatives: Make it more robust

3aed015

disjunct-utils.[ch]: Use dyn_str for printing disjuncts/connectors

dbf503a

ampli force-pushed the bangbang branch from 6c83887 to e3fc23f Compare January 20, 2020 15:34

ampli added 11 commits January 20, 2020 19:26

Implement !!word/regex/ for disjunct display

6c21c75

print_expression_parens(): Rewrite

c5a75cd

!!word: Display expressions with macro tags

61faa63

Load dict macros by default; use !!/word/m to show expression macros

420c155

prt_exp_mem(): Move it and related functions to print-dict.c

c4e1247

prt_exp_mem(): Convert to use dyn_str

49ab28c

!!word/l: Print low-level expression memory

88531a9

ChangeLog: Update on adding !!word/

e02be1e

command-help-en.txt: Add help info for the !! command

f30b5ca

command-line.c: Add '!' as a command and add !help text

b9de7bd

!!<macro>: A hack for a clean display

2e78a98

Don't split it.

ampli force-pushed the bangbang branch from e3fc23f to 2e78a98 Compare January 20, 2020 17:27

linas merged commit 9da1368 into opencog:master Jan 21, 2020

ampli mentioned this pull request Jan 23, 2020

More !!word/ additions #1085

Merged

ampli mentioned this pull request Mar 2, 2021

Recent 5x multiplication of verb disjuncts #1072

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add !!word functionality #1083

Add !!word functionality #1083

ampli commented Jan 19, 2020 •

edited

Loading

ampli commented Jan 19, 2020

ampli commented Jan 19, 2020

ampli commented Jan 19, 2020

ampli commented Jan 19, 2020

linas commented Jan 19, 2020

ampli commented Jan 20, 2020

ampli commented Jan 20, 2020

linas commented Jan 21, 2020

linas commented Jan 21, 2020

linas commented Jan 21, 2020 •

edited

Loading

ampli commented Jan 21, 2020 •

edited

Loading

ampli commented Jan 21, 2020

ampli commented Jan 21, 2020

ampli commented Jan 21, 2020

linas commented Jan 21, 2020

ampli commented Jan 22, 2020

Add !!word functionality #1083

Add !!word functionality #1083

Conversation

ampli commented Jan 19, 2020 • edited Loading

ampli commented Jan 19, 2020

ampli commented Jan 19, 2020

ampli commented Jan 19, 2020

ampli commented Jan 19, 2020

linas commented Jan 19, 2020

ampli commented Jan 20, 2020

ampli commented Jan 20, 2020

linas commented Jan 21, 2020

linas commented Jan 21, 2020

linas commented Jan 21, 2020 • edited Loading

ampli commented Jan 21, 2020 • edited Loading

ampli commented Jan 21, 2020

ampli commented Jan 21, 2020

ampli commented Jan 21, 2020

linas commented Jan 21, 2020

ampli commented Jan 22, 2020

ampli commented Jan 19, 2020 •

edited

Loading

linas commented Jan 21, 2020 •

edited

Loading

ampli commented Jan 21, 2020 •

edited

Loading