Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More !!word/ additions #1085

Merged
merged 16 commits into from
Jan 23, 2020
Merged

More !!word/ additions #1085

merged 16 commits into from
Jan 23, 2020

Conversation

ampli
Copy link
Member

@ampli ampli commented Jan 23, 2020

Add features and make fixes per issue the discussion at PR #1083.
Main changes:

  • Print connector expression source macros for disjuncts.
  • Reverse the debug printout of connector lists.
    Use expression connector order.
  • Disjunct printout: Add connector sign.
  • !!word/: Validate flags.
  • !!word/number/: A hack to select a disjunct by number.
    To be used with the m flag (connector macros).
  • Update the help text for the recent changes.

In principle macros can be found more than once in a word, and connectors names may be found more than once in a macro. In that case the output of !!word//m may be be confusing.
If desired I can add something like !!word//mc to show expression with the disjunct connectors marked in it.

If you have any more ideas for debugging disjuncts/connectors/expressions please tell me now, as I have the low-level details in my head so I can make fast implementations.

EDIT: Forced push for display formatting cleanup.

@ampli
Copy link
Member Author

ampli commented Jan 23, 2020

An improvement may be needed in the output format and/or flags.
Here are cases that demonstrate that
First a test sentence:

linkparser> This is a test.
Found 4 linkages (4 had no P.P. violations)
	Linkage 1, cost vector = (UNUSED=0 DIS= 0.00 LEN=6)

    +-------------Xp------------+
    +----->WV----->+---Ost--+   |
    +-->Wd---+-Ss*b+  +Ds**c+   |
    |        |     |  |     |   |
LEFT-WALL this.p is.v a  test.n .

            LEFT-WALL     0.000  hWd+ hWV+ Xp+
               this.p     0.000  Wd- Ss*b+
                 is.v     0.000  Ss- dWV- O*t+
                    a     0.000  Ds**c+
               test.n     0.000  Ds**c- Os-
                    .     0.000  Xp- RW+
           RIGHT-WALL     0.000  RW-

With this PR, you can copy&paste a disjunct search:

linkparser> !!test.n/\QDs**c- Os-/
Token "test.n" matches:
    test.n                        8509  disjuncts <en/words/words.n.1-const>


Token "test.n" disjuncts:
    test.n                        4273/4501 disjuncts

          test.n: [3813]0.000= Ds**c- Os- <--> @M+ @MXs+
          test.n: [3832]0.000= Ds**c- Os- <--> @M+
          test.n: [3851]2.000= Ds**c- Os- <--> @M+ R+ Bs+ @M+ @MXs+
          test.n: [3870]2.000= Ds**c- Os- <--> @M+ R+ Bs+ @M+
          test.n: [3889]0.000= Ds**c- Os- <--> @M+ R+ Bs+ @MXs+
          test.n: [3908]0.000= Ds**c- Os- <--> @M+ R+ Bs+
          test.n: [3927]0.000= Ds**c- Os- <--> @MXs+
          test.n: [3946]0.000= Ds**c- Os- <--> 
          test.n: [3965]2.000= Ds**c- Os- <--> R+ Bs+ @M+ @MXs+
          test.n: [3984]2.000= Ds**c- Os- <--> R+ Bs+ @M+
          test.n: [4003]0.000= Ds**c- Os- <--> R+ Bs+ @MXs+
          test.n: [4022]0.000= Ds**c- Os- <--> R+ Bs+
          test.n: [4045]1.500= Ds**c- Os- <--> NM+ @M+ @MXs+
          test.n: [4064]1.500= Ds**c- Os- <--> NM+ @M+
          test.n: [4083]3.500= Ds**c- Os- <--> NM+ @M+ R+ Bs+ @M+ @MXs+
          test.n: [4102]3.500= Ds**c- Os- <--> NM+ @M+ R+ Bs+ @M+
          test.n: [4121]1.500= Ds**c- Os- <--> NM+ @M+ R+ Bs+ @MXs+
          test.n: [4140]1.500= Ds**c- Os- <--> NM+ @M+ R+ Bs+
          test.n: [4159]1.500= Ds**c- Os- <--> NM+ @MXs+
          test.n: [4178]1.500= Ds**c- Os- <--> NM+
          test.n: [4197]3.500= Ds**c- Os- <--> NM+ R+ Bs+ @M+ @MXs+
          test.n: [4216]3.500= Ds**c- Os- <--> NM+ R+ Bs+ @M+
          test.n: [4235]1.500= Ds**c- Os- <--> NM+ R+ Bs+ @MXs+
          test.n: [4254]1.500= Ds**c- Os- <--> NM+ R+ Bs+

(24 disjuncts matched)

However, you cannot copy&paste this (marked by <--------):

            LEFT-WALL     0.000  hWd+ hWV+ Xp+
               this.p     0.000  Wd- Ss*b+
                 is.v     0.000  Ss- dWV- O*t+   <--------
                    a     0.000  Ds**c+
               test.n     0.000  Ds**c- Os-
                    .     0.000  Xp- RW+
           RIGHT-WALL     0.000  RW-

The separator <--> prevents a match.

Possible solutions:

  1. Remove the separator. However, due to the potential high number of disjuncts the output may be less clear then. You also loss then an anchor for searching a deep LHS connector, e.g.:
    !!test.n/> R\+/.
  2. Have a flag (e.g. s for "simple") for not emitting <-->, so you can do:
    !!is.v/\QSs- dWV- O*t+/rm

When using \Q there is a subtle problem that can be demonstrated by:

linkparser> !!test.v/\Q= @E- I- VJrpi- @E- B*m- dCV- <--> VC+ VC+/
Token "test.v" matches:
    test.v                       93621  disjuncts


Token "test.v" disjuncts:
    test.v                       70700/93621 disjuncts

          test.v: [70367]0.000= @E- I- VJrpi- @E- B*m- dCV- <--> VC+ VC+
          test.v: [70368]0.000= @E- I- VJrpi- @E- B*m- dCV- <--> VC+ VC+ QI+
          test.v: [70369]0.000= @E- I- VJrpi- @E- B*m- dCV- <--> VC+ VC+ Xc+ QI+

(3 disjuncts matched)

The intention was to search an exact disjunct, but "junk" was added at the LHS.
If the 'm' flag was used, the result could be too much output (say 1000 disjuncts matched).
Instead this could be used:
!!test.v/\Q= @E- I- VJrpi- @E- B*m- dCV- <--> VC+ VC+\E$/m
Note the usage of \E to terminate the \Q and then $ to match the end of line.

A solution for that, that also solves the problem of using regex engines that don't support \Q, is to have a flag v verbatim searches. Or if regex searches are the exception, have a flag r for a regex search.

@linas linas merged commit 37e493c into opencog:master Jan 23, 2020
@linas
Copy link
Member

linas commented Jan 23, 2020

Thanks.

Another possibility is to just remove the <---> from the display. It's not really needed .. its an eye-catcher, certainly, but one can certainly know where it is, by looking at the + and - signs. There are other problems: using <---> suggests that there is a link somewhere, and there isn't. Nothing is linked. A different problem shows up in the Lithuanian dictionary, where there is no strict left-right ordering. The work-ordering is more-or-less free, and the only real distinction is the head-dependent distinction.

Note this is a generic issue. Word-order is somewhat free in Russian, although the current Russian dicts do not use that markup at all. If I understand correctly, word order is also free in Turkish and Finnish. The h and d markup becomes extremely important. The +/- markup, mostly not at all. I am sorry, but I completely forgot to talk about this during the jets conversations. Some of your optimizations might not work or might be pointless in this setting. Insofar as the Lithuanian dict is at best a proof-of-concept, and the Russian dict doesn't use this, then ... it doesn't matter right now. But its a potential future stumbling block.

@ampli
Copy link
Member Author

ampli commented Jan 24, 2020

Another possibility is to just remove the <---> from the display.

In principle the displayed form can be different than the internal form (only logically, for efficiency).
I.e., by default the search code can just insert the separator at the needed place.

Regarding the form of the separator, It can be replaced to just <> which doesn't resemble a link.
| could also be used but it is also a regex magic character.

My proposal:

  1. Replace the display separator by <>.
  2. For searches that don't include it, pretend it doesn't exist.

@ampli
Copy link
Member Author

ampli commented Jan 24, 2020

A different problem shows up in the Lithuanian dictionary, where there is no strict left-right ordering. The work-ordering is more-or-less free, and the only real distinction is the head-dependent distinction.

A flag can be added to mean "search the given connectors in any order".

@ampli
Copy link
Member Author

ampli commented Jan 24, 2020

I am sorry, but I completely forgot to talk about this during the jets conversations. Some of your optimizations might not work or might be pointless in this setting.

Note that there was actually no conversion to "jet" optimization (which is really "tracon" optimization).
The actual implementation is of an optional addition that can be disabled if you specify:
-test=min-len-encoding:254. The default is to apply the optimizing encoding for sentence length >= 6.

Some of your optimizations might not work or might be pointless in this setting.

Why? It looks this optimization only depends on the no-crossing rule. In any case, the code infrastructure includes the possibility to selectively disable it.

@linas
Copy link
Member

linas commented Jan 24, 2020

My proposal: <>

OK, sure, that works.

A flag can be added to mean "search the given connectors in any order".

That's nice, even for English, because connector order can be confusing.

"jet" optimization

OK. I was just pointing out a topic that I had forgotten about, and I suspect you had as well. Also -- I do not recall the details, but I think that the arbitrary-direction connectors are implemented is to have two exps, one for each direction. So that is a potential source of inefficiency. I do not recall how commuting connectors are handled.

ampli added a commit to ampli/link-grammar that referenced this pull request Jan 25, 2020
According to the discussion at PR opencog#1085.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants