Dict idioms #747

ampli · 2018-04-21T19:19:40Z

Currently, subscripted idioms are forbidden, so a dict entry like
a_b.c: something;
is considered to be a definition for the word a_b (a word which includes an underbar).
This can be useful if one wants to introduce a word which includes a an underbar (no other way just now).

But it seems to me that it is more useful to allow subscripted idioms.
I encountered that when I tried to check if it is possible to "correct" an idiom usage by the .# device (I still don't have a better name for this idea).
For example (only - this particular correction may not be a good idea):
in_to.#into: [into]0.65;
An example of a possible subscripted idiom:
take_away.p: ...;
(BTW it is not currently in the dict.)

I found that only a minor fix is needed in order to allow subscripted idioms, and I can send a PR if removing this restriction looks as a good idea.

The text was updated successfully, but these errors were encountered:

ampli · 2018-04-23T11:19:16Z

After a few more changes, the following works fine:

rather_then.#rather_than: rather_than;

(It is an expanded form of the existing commented out entry % rather_then: rather_than;.)
The needed changes where:

Insert idioms into the dict also in their original form.
Don't look at subscripts for underbars.

As a bonus, this now works too:

linkparser> !!a_lot
Token "a_lot" matches:
    a_lot                            10  disjuncts
Token "a_lot" expressions:
    a_lot                      [[(({[@M+]0.400 or Mp+} & SJlp+) or ({[@M+]1.400 or [Mp+]} & SJrp-))]] or EC+ or MVa- or ((MVw- & OFw+)) or Wa-

linkparser> !!a_*
(All idioms starting with the word "a" are listed.)

When I made change (1`) above, I got numerous errors on duplicate idioms.
I guess is that many of them were accumulated over time because there was no check for that.
Until they are fixes (if needed) I just allowed them by default. They can be listed using:

link-parser --test=dup-idioms

Duplicate examples (total 43):

Ignoring word "and_yet", which has been multiply defined:
	 Line 12142, next tokens: ";" "..y" "*.j" "•" "⁂" 
link-grammar: Error: While parsing dictionary en/4.0.dict:
Ignoring word "but_not", which has been multiply defined:
	 Line 12142, next tokens: ";" "..y" "*.j" "•" "⁂"

BTW, a change can be introduced to automatically report the line numbers the dict m4 source (when applicable) if you feel this is more useful.

ampli · 2018-04-23T11:43:16Z

Not supported yet (but of course can be):

Dict words which contain underbars.
For example, the following word cannot currently directly supported: snake_case.
(I guess it can still be supported just now through a regex.)
It is not a trivial change, but also it is not hard to implement it (the dict definition will use snake\_case).
Correction definitions like these (currently commented out):

% all.#all_of: [all_of]0.65;
It is not working for now because there is no all_of idiom definitions. However, it can be made to work nevertheless (I will try that).

BTW, I started to investigate the idiom-related stuff after a long pause because I started to actually implement capitalization using the dict (issue #690). While thinking of that, it occured to me that capitalized words can be seen as a special kind of idiom, and this hints on an implementation possibility.

Since my current idiom-related changes seem to me useful, I will send a PR for them.

linas · 2018-04-23T18:38:47Z

I like the idiom-printing extension.

I don't understand why idiom subscripts are useful. Subscripting in general does not seem to be all that useful, except that it helps with the authoring of the dictionary, and some of the debugging of the dictionary; I don't think its useful to end users.

Duplicate entries for idioms seems OK to me.

ampli · 2018-04-24T03:46:21Z

I don't understand why idiom subscripts are useful. Subscripting in general does not seem to be all that useful, except that it helps with the authoring of the dictionary, and some of the debugging of the dictionary; I don't think its useful to end users.

I see several pros for it, and don't see cons:

At least, they are useful for "correction" entries.
I may be useful for idioms that may serve as several POS, like take away.
It removes an exception for the possibility to add a subscript.
It is a trivial change that doesn't introduce any problem, and can be just left unused most of the times.
It may be useful in cases that we didn't think of just now.

Duplicate entries for idioms seems OK to me.

It is not clear to me that if a definition of an idiom got fixed, all its other entries (there may be more than 2) are checked for the need of a similar fix. In addition, the idiom can be both in a word list and directly in the dict, and this doesn't seem to me intentional.

EDIT: Fix a typo.

ampli · 2018-04-24T18:15:22Z

I just sent PR #751.
I couldn't add a ChangeLog line due to a possible conflict.
Here is the line:

Add idiom lookup possibility in link-parser's dict lookup command (!!idiom_here).

linas · 2018-04-24T20:50:36Z

I guess we could add subscripts to all the 43 duplicate idioms. Could you provide that list, or show me how to do get it?

ampli · 2018-04-24T21:04:30Z

link-parser -test=dup-idioms

ampli changed the title ~~Why idioms cannot have a subscript?~~ Dict idioms Apr 23, 2018

ampli mentioned this issue Apr 25, 2018

Misc fixes/updates #753

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dict idioms #747

Dict idioms #747

ampli commented Apr 21, 2018

ampli commented Apr 23, 2018

ampli commented Apr 23, 2018

linas commented Apr 23, 2018

ampli commented Apr 24, 2018 •

edited

Loading

ampli commented Apr 24, 2018

linas commented Apr 24, 2018

ampli commented Apr 24, 2018

Dict idioms #747

Dict idioms #747

Comments

ampli commented Apr 21, 2018

ampli commented Apr 23, 2018

ampli commented Apr 23, 2018

linas commented Apr 23, 2018

ampli commented Apr 24, 2018 • edited Loading

ampli commented Apr 24, 2018

linas commented Apr 24, 2018

ampli commented Apr 24, 2018

ampli commented Apr 24, 2018 •

edited

Loading