Dialect support! #402

linas · 2016-09-16T18:04:22Z

Below is a sketch of how to add dialect support, and why its a good idea.

Currently, {} is used to indicate optional connectors: for example: A+ & {B- & C+} indicates that (B- & C+) is optional.

Lets give options names! These names will be names of dialects! So, for example: A+ & {B- & C+}{irish} means that (B- & C+) is optional, but only if the "irish" dialect is enabled; otherwise, it is never allowed.

In my imagination, this solve zillions of problems. These include:

A) the bad-spelling problem: create a kant-spel dialect, that merges together the disjuncts for they're there and their (and throws in thier, for good measure)

B) enhanced support for ... irish-english, black-american-english, australian-english, hillbilly-basilect, archaic 19th-century English, twitterese, newspaper-headlines

C) Automatic detection of dialects! So, for example, if a sentence does not parse normally, but does parse after enabling some dialect, we can guess that it must be that dialect.

D) post-parse parse-ranking. That is, parse a sentence with all dialect enabled, but then fiddle with the costs associated with each particular dialect. Thus, to turn off the kant-spel dialect, one simply gives those connectors a very high cost, and they would be raked last.

linas · 2016-09-16T18:10:15Z

BTW, this is very similar to the "named disjuncts" idea in https://groups.google.com/forum/#!msg/link-grammar/fFUhgSO0oL4/6RPgcRfpBAAJ just using a different syntax.

To really get named disjuncts, one should allow the syntax (A+ & B+){some-name} which means that (A+ & B+) is mandatory, and is given "some-name". We can still pretend that "some-name" is a dialect, and we can also give it a cost outside of the dicationary. By default, all names have a cost of zero.

ampli · 2016-09-16T20:19:23Z

You propose

{DISJUNCT}{DIALECT}

to select a disjunct by a dialect.
But what if I want to deselect disjuncts? It can be very cumbersome if only disjunct including by dialect is possible.
Maybe:

{DISJUNCT}{-DIALECT}

Similarly, a syntax for different costs by dialect seems to me desired.
Maybe:

{DISJUNCT}[default_cost][cost_for_DIALECT1]{DIALECT1}

I.e., {DIALECT} / {-DIALECT} just select/deselect the item before them.

There may be several other shortcuts needed, like letting one symbol represent several similar dialects that only slightly differ.

linas · 2016-09-17T02:05:11Z

Since DIALECT is just a string, then -DIALECT would just be a naming convention. So one could say , for example {F- & G+}{not-irish} and then just always disable not-irish whenever irish is enabled.

I would rather not make any changes to the cost system. To specify two different costs, just repeat the disjuncts twice.

linas · 2016-09-17T02:09:16Z

So for example:

([A+ & B+]0.5){irish} or ([A+ & B+]1.75){polish}

or even

<a-and-b>: A+ & B+;
([<a-and-b>]0.5){irish} or ([<a-and-b>]1.75){polish}

gives one cost for irish, and another for polish. This uses the currently-implemented square-bracket system for costs -- i.e. no changes are needed there.

ampli · 2016-09-17T07:02:52Z

I had a format error in my proposal - I didn't intend to propose any change in the cost system...
I can repeat it with a proper format, but it seems to me it is best to see if such shortcuts are needed after actually using dialects widely.

Since DIALECT is just a string, then -DIALECT would just be a naming convention. So one could say , for example {F- & G+}{not-irish} and then just always disable not-irish whenever irish is enabled.

But if you refer dialects as "just string", how you know to pre-enable all "not-*" when, for example, no dialect is enabled?

ampli · 2016-09-17T09:24:11Z

A) the bad-spelling problem: create a kant-spel dialect, that merges together the disjuncts for they're there and their (and throws in thier, for good measure)

Note that you cannot just merge together disjuncts of different words (without an additional mechanism to find that there was a problem, and what the fix is).
If you define the word "their" also with the disjuncts of "there", you will get a "strange" linkage (e.g. the word "their" will appear with the disjuncts of "there").
How you can find then if a correction is needed and what the exact correction is?

For example, the bad sentence is:
*It is hot their.
How the dictionary (for fixing "there" and "their" only) looks like?
How the linkage then looks like?

I ask that because I don't understand yet the fine details of your idea.
I have my own proposal, as mentioned in my group posts, but I would like to understand yours.

linas · 2016-09-27T01:04:00Z

{DIALECT} and {-DIALECT} works, I was only using {not-DIALECT} as more verbose version.

linas · 2016-09-27T01:05:55Z

For their/there, we would need a mechanism for indicating alternatives, such as that in issue #404

ampli · 2018-05-19T22:27:55Z

I made an initial implementation.
In this implementation {tag} can name any sub-expression.
I called it "expression tag" and not "expression name" because "expression name" is already used by <name>.

I am still not sure about how to enable dialects by default. What I mean is that it seems to me useful that in a given dict, some dialects will be enabled by default and some not. For example, suppose that {irish} is enabled by default, but {headline} is not. This means that there should be a definition in the dict that declare which of the dialects are enabled and which are disabled by default. I don't have an idea how to provide such a list, since there is currently no good way to define strings (using connectors for that is too cumbersome). Maybe we can add a #define directive (that will be used for version etc. too),
or use something like:
<default-dialect>: (){irish};
<default-dialect>: (){-headline}; % News writing style.

Another question is how to implement the API for enabling/disabling dialects.
One way is to make it a string like "dialect1,dialect2" to enable these dialects, or "-dialect3" to disable dialect3 if it is enabled by default. Another way is to use a NULL terminated char ** argument.

Also, it seems useful to have an API to fetch the list of all dialects and their defaults.

Examples:

<a-and-b>: A+ & B+;
tt: (XXX+{test} & ({YYY+}{test1} or X1-){test2}) or
([<a-and-b>]0.5){irish} or ([<a-and-b>]1.75){polish} or {F- & G+}{testit};

<no-det-null>: [[[[()]]]]{headline} or (){-headline};

ampli · 2018-05-20T17:08:13Z

Here are some implementation details:

I added char *tag field to the Exp struct.
It is set when reading the expression from the dict.
(I also modified accordingly print_expression_parens() and fixed a bug in it regarding printing a costly null.)
At the start of expression_prune() a modified purge_Exp() is called to purge the expressions with disabled tags. This is done by making them a null expression. I hope this is the required semantics....

I still need to write dictionary_get_dialects() and parse_options_*_dialects(), but I need some input regarding my previous post.

In the post above I wrote:

<no-det-null>: [[[[()]]]]{headline} or (){-headline};

It actually should be <no-det-null>: [[[[()]]]]{-headline} or (){headline};

linas · 2018-05-21T01:43:52Z

Rather than treating dialects as boolean on/off options, perhaps they should be treated as variable costs? So, to fully enable the Irish-English grammar, set the cost to 0.0. To disable it, set it to 3.0 or higher. So, if I think a text is Irish-American tinged, I could set it to maybe 1.0, thus preferring standard grammar, and falling back on an Irish interpretation if standard grammar is not possible.

The costs would not be stored in the Exp struct, they would be external.

Should not store the dialect-enable flags (and/or dialect cost) in 4.0.dict -- that would be confusing. A distinct file would be better. For an API to over-ride this file contents, I think that something like lg_dialect(const char*, bool); or lg_dialect(const char*, double); would suffice, where the char string is just a single dialect name; no comma separators, no +/- characters in the string.

ampli · 2018-05-21T11:16:19Z

If dialect labels are used as variable costs, do we still need the {} syntax?
I.e., can we use something like [()]headline instead of [()]{headline}. Or maybe the [] are also not needed?
I started to look at the direct implementation when I was thinking of a possible pseudo-morphology implementation. In order to test some ideas I needed a way to identify sub-expressions in order to manipulate (or avoid to manipulate) them. I guess it will still be possible to use labels also for expression identification, something like (A+ & (B+ or C+))label (or (A+ & (B+ or C+)){label} if you think it is better to keep the {}).

A distinct file would be better.

4.0.dialect?

What should be the format? (I would rather not use a dict format, this seems to me oakward.)
A simple one can be something like:
headline: 4.0
But maybe we can have names for preset values:

[no-det]
default: 4
headline: 0.2

ampli · 2018-05-21T14:35:57Z

Maybe the {} has a benefit in that a default cost can be used, like [A+]0.5{something}.

linas · 2018-05-21T19:41:54Z

Yes, either format for 4.0.dialect would be fine.

Yes, I guess that the curly braces are not needed.

For backwards-compatibility, square brackets without any number at all have a default cost of 1.0 -- therefore, we need to keep using square brackets, anyway. For the moment, this seems harmless. That suggests that ....

We should support expressions like [[A+]0.5]0.3 which would mean that A+ has a total cost of 0.8. Likewise, both [[A+]0.5]something and [[A+]something]0.5 should have a cost of something+0.5
I expect that these last two might get used a lot.

ampli · 2018-06-06T08:59:12Z

lg_dialect(const char*, bool); or lg_dialect(const char*, double);

According to the current API style, the dialect setup functions need a library object to store their setting.
In any case, their setting is dict-related, so they would need to get the dict as an argument.
However, I think it may not be a good idea to store the current dialect setup in the dict struct (e.g. to allow using the same dict from different threads).

The current way of changing parsing parameters is by using parse options. So we can add a parse option for it.

Another thing to consider in the dialect API is that it may be a good idea to use preset settings that are defined in 4.0.dialect, but there is also a need for programmatically set dialect cost values (for development and debug - it is very cumbersome to require that any tweaks will be done only by changing the dict/dialect files).

I first thought that a "dialect object" can be used (something like dialect = dialect_create()) and this object then be used by the dialect setting functions, and finally be provided to parse_options_set_dialect(). The, parse_options_set_*() API gets for now only simple object types and not arbitrary objects, but I don't have any argument why we cannot supply it with a "complex" opaque type.

In any case, it is better to discuss this in details before I continue my implementation (even though I don't mind to experiment and later change it). Any initial API will be declared as "experimental" so we will be free to totally reimplement it.

ampli · 2018-06-12T19:18:05Z

My basic implementation now supports expression definitions (and handling/displaying) such as [[A+]0.5]0.3, [[A+]0.5]something , [[A+]something]0.5 and even [[A+]something]more.

Now I have to finish the implementation of 4.0.dialect and their setting API.
I have specific proposals for the file format and the API, based on how I expect dialects will actually be defined and used. It may be that my expectations are not correct, or changes/ refinements are needed.
Hence we have to discuss it before my initial implementation (even though I will not have a problem to change it later as needed).

([A+ & B+]0.5){irish} or ([A+ & B+]1.75){polish}

(Disregarding the old notation.)
Suppose a support of "irish" dialect is added.
Of course it would not end in modifying costs of a single expression.
So we would have many expressions in the dict with costs which depend on this "irish" dialect.
It doesn't seem a good idea to force using a single cost addition in all of this expressions.
Instead, I propose to provide an infrastructure for a vector of costs, as follows (example):

4.0.dict:

<a>: X- & [[A+ & B+]0.5)]irish_a or Y+;
<b>: Z- & ([A+]irish_b & B+;

4.0.dialect:

[default]
% default costs
no_headline

[irish]
irish_a: 0.8
irish_b: 2

[no_headline]
headline: 4

For the parse-options, I propose to add a cost vector:

struct Parse_Options_s
{
...
/* Options governing the parser internals operation */
...
	double *dialect_cost;  /* Cost associated with dialect tags (NULL=default costs). */
...
}

(My implementation enumerates the dialect tags with internal numbers.)

For manipulation dialects I propose the following API:

void *dialect; // dialect object
void *lg_dialect_create(Dictionary); // return a dialect object
lg_dialect_delete(void *dialect);

lg_dialect_set(void *dialect, const  char *dialect_name, bool);
lg_dialect_cost(void *dialect, const  char *dialect_name, double);

parse_options_set_dialect(Parse_Options, const void *dialect);
void *parse_options_get_dialect(Parse_Options);

ampli · 2018-06-23T01:43:38Z

@linas, I need your input on the above proposal, so I can finish this implementation.

ampli · 2019-12-04T12:29:34Z

I implemented LG-dialect as a proof-of-concept at the time of writing the last post here (June 2018).
It didn't include API at all.

Now I would like to convert it to a production code.
There is no "natural" API to use, so I intend to develop something that is both easy to use and to program. There is also a need for link-parser UI.

Some other decisions should also be made. For example, when to convert the dialect costs to actual expression costs. I think the best place is in expression pruning, since by default most of the dialects are expected to be neutralized (represented by a high cost constant).

I'm preparing a PR which is a complete implementation, to be regarded as a proposal. I will then be able to make changes and update this PR as needed before it is applied.

ampli · 2019-12-04T12:50:52Z

I need to add a pointer field in Exp_struct, which is now 32 bytes, but I would not like to make it bigger.
This is possible if I change cost from double to float.
Is there a special reason that costs are represented by double and not float?

linas · 2019-12-04T17:14:48Z

Looking; there are apparently comments from June that I missed. Sorry!

linas · 2019-12-04T21:56:30Z

float

Yes, float should be enough.

linas · 2019-12-04T21:58:37Z

This seems like overkill:

[irish]
irish_a: 0.8
irish_b: 2

I cannot think of a good reason to have this.

linas · 2019-12-04T22:08:20Z

This API:

void *dialect;

I assume this will store a vector of names+costs (or a map, name->cost)

lg_dialect_set(void *dialect, const  char *dialect_name, bool);

what's the bool for? to enable/disable that specific dialect? In that case, something like this would work better:

void lg_dialect_add(void *dialect, const  char *dialect_name, double cost);
void lg_dialect_remove(void *dialect, const  char *dialect_name);

so that removing it is the same as disabling it.

ampli · 2019-12-04T22:12:48Z

[irish]
irish_a: 0.8
irish_b: 2
I cannot think of a good reason to have this.

Is it expected that all the disjuncts of a specific dialect will have the same cost?

linas · 2019-12-04T22:21:01Z

Is it expected that all the disjuncts of a specific dialect will have the same cost?

Oh! Ah! ... I see what you are doing:

[named-vector]
vector-component-a: 0.2
vector-compnent-b: 0.8

...Is that the intent?

ampli · 2019-12-04T22:23:27Z

lg_dialect_set(void *dialect, const  char *dialect_name, bool);

The idea is to be able to select a dialect without specifying its cost at all, as a normal mean to select a dialect - the cost(s) will be as defined in 4.0.dialect. The other function lg_dialect_cost() is for cases in which it is desired to play with the costs.

Oh! Ah! ... I see what you are doing:
...
...Is that the intent?

Yes!!!

linas · 2019-12-04T22:31:22Z

OK. What's the bool for?

ampli · 2019-12-04T23:11:40Z

Using your example:

[named-vector]
vector-component-a: 0.2
vector-compnent-b: 0.8

Then to enable the named-vector dialect:
lg_dialect_set(dialect, "named-vector", true);

To tune or enable a specific component:
lg_dialect_cost(dialect, "vector-compnent-b", 1.2);

Or to turn it off completely:
lg_dialect_cost(dialect, "vector-compnent-b", 9999.0);
(I defined a constant DIALECT_DISABLED for that, maybe it should be renamed to DIALECT_COMPONENT_DISABLED.)

ampli · 2019-12-07T00:27:49Z

I made about 2/3 progress in this project, and found that my proposed interface (and your original one) is very cumbersome to use, especially to support reasonable link-parser UI (which may be a problem of any user program that would like to use it).

For now (after I have implemented the cumbersome functions) it seems to me my original proposal (to use a string API) is superior.

My current API implementation is:

typedef struct Dialect_Option_s Dialect_Object;  /* Opaque handle. */

Dialect_Option parse_options_get_dialect(Parse_Options opts);
void parse_options_set_dialect(Parse_Options opts, Dialect_Option dopt);
bool lg_dialect_set(Dialect_Option dopt, const char *dialect_vector, bool useit);

#define DIALECT_COST_DISABLE    10000.0   /* A high cost setting to disable disjuncts */
#define DIALECT_COST_REMOVE     10001.0   /* Use the preset cost */
bool lg_dialect_cost(Dialect_Option dopt, const char *dialect_component, double cost);

What is missing but needed is API to fetch the current settings, like:

const char *lg_dialect_get(Dialect_Option dopt, int i); /* Get the i'th name (NULL if no more) */
char *name, *cost;
void lg_dialect_cost_get(Dialect_Option dopt, int i, name, cost); /* Get a dialect component (NULL if no more) */

The link-parser dialect implementation (not done yet) is supposed to use these lg_dialect_*_get() API to display the current setting (so it doesn't need to just specially remember the dialect setting). This is very cumbersome but still maybe reasonable.
However I couldn't find a reasonable matching link-parser UI that matches this API model - by reasonable I mean simple and easy to use. (To use a UI that doesn't directly use the underling API will need much add-hoc code.)

What we need is UI to:

Add and delete a dialect name (vector name). The component costs then are taken from 4.0.dialect.
Define a cost for a dialect component (vector component name and cost), including a way to define "infinite cost" to disable it.
Delete a cost for a previously so defined dialect component (so its cost will be taken from 4.0.dialect).

Instead of all of that, I propose this simple (both for use and implementation) way:
!dialect=vector1,vector2,component1:0.2,component2:,component3:0.8

This will enable dialect names vector1 and vector2 (i.e. their componnets costs will be used as defined in 4.0.dialect and in addition set (or override) the components component1, component2 and component3, when component2 is set to "infinite cost" in order to disable it (say it is defined in 4.0.dialect as a componnet of vector1).
(BTW, this example is to illustrate the idea - of course the user is not expected to issue such a complex setting - a complex setting is mainly for debug, development and testing).

Benefits of this proposal:

The link-parser dialect API implementation is then absolutely trivial and minimal: The same code that now implements !debug and !test is just used with !dialect and that's all.
Trivial and minimal library implementation: The above setup string is just like a [] section in 4.0.dialect (when newlines are replaced by commas) so the same code that parses the file can be used and the resulted data structure is also the one that is needed for actually applying the costs.
Also, only 2 API functions are needed (parse_options_set_dialect() and parse_options_get_dialect()), and in the library there is no need to the extra data structures and internal API functions that are needed to support the many-user-API-functions approach.
No Dialect_Option object is needed.
No additional calls to retrieve the current setting (vector/component names) are needed.

ampli · 2019-12-07T04:03:00Z

I forgot to add 2 additional user API calls that are used in the "many-user-API-functions" approach:

/* Create and delete a diaelct object. */
Dialect_Option lg_dialect_create(void);
void lg_dialect_delete(Dialect_Option dopt);

I will just implement it in both approaches at once so we can decide which is better.

ampli · 2019-12-11T12:48:16Z

The dialect-supporting version has passed my initial tests.
However:

The API I finished to implement consists only the parse_options_set/get_dialect() calls and nothing more. The parse_options_set_dialect() functions, like all the other parse_options_set_*() functions, get only a Parse_Options object and one argument (char * in this case).
The UI of link-parser is only !dialect=string_config.

This approach has a slight problem: Errors in setting the dialect user variable (like specifying a nonexistence dialect) are not detected upon issuing the parse_options_set_dialect() call, because this function has no access to the dict.
The dialect setting (cached on success) is done in sentence_split(), and if it fails due to bad setting in the dialect variable, an appropriate error message is issued and sentence_split() fails. If this is a batch run, the number of errors is then meaningless, which may be considered a problem.

This situation can be somewhat improved by making syntax checks (in parse_options_set_dialect()) on the !dialect variable content, but the parse_options_set_*() calls don't return an error indication (BTW, I think they should, and this will be a mostly compatible change). However, in order to make a perfect validation, The dictionary handle should be somewhat provided to parse_options_set_dialect() (e.g. by an added argument), or alternatively by an additional API like check_dialect() (I guess trying to split a dummy sentence to that end is not a good option...).

This problem doesn't exist with the "many-user-API-functions" approach. However, it is complex. I already have a partial implementation of that under #ifdef DIALECT_OBJECT (undebugged) but it is extremely cumbersome to use. I'm not sure more efforts are needed in that direction, please advise. If my current implementation is fine with you, I will just remove this alternative implementation.

My test implementation includes only a minimal English "headlinedialect (dict and4.0.dialect` definitions) and it doesn't include tests sentences and a test suite (to be added later).

What I can do is to submit a PR of my current work as a request for review, in order that you will be able to actually test it and tell me about needed changes before it is applied. I would like to fist merge with an updated master branch so I will need PR #1058 to be applied first.

ampli · 2019-12-11T12:58:44Z

Another problem to be solved is !!word. It currently shows the expressions after dialect resolution of the "4.0.dialect` definitions, and not as they are in the dict.

Possible solutions:

Add a dialect - to denote don't apply "4.0.dialect", to be issued as dialect=-. After that, !!word will show the expressions exactly as they are in the dict.
Make !!word always show the exact expressions in the dict.
Use !!word!flags, e.g. !word!o, (o for original) to show the exact expressions in the dict. (BTW other useful flags can be added, like d for listing the disjuncts).

linas · 2019-12-12T19:35:05Z

I think that having !!word show what is currently in effect makes the most sense. Recall, !!word is a debugging utility, and if it shows something other than what is currently being done, it would be confusing. One the other hand, it is a fairly useless debug tool -- most words have hundreds if not tens of thousands of disjuncts, and picking over these is .. too hard. It's easier to work directly with the dictionary.

ampli · 2019-12-12T19:49:58Z

I also added a bad-spelling dialect, as you suggested in issue #404:

well, instead of sub-dictionaries, the problem would be solved by "dialect support" - #402 : turn off the "bad speling" dialect, and then these rules no longer apply.

I used the component name bad-spelling, and the dialect name no-bad-spelling in order to turn it off (by cost 4). If you like to use other names I can change that before submitting the PR (which is ready, BTW).

Example from the dict:
then.#than: [[than.e]0.65]bad-spelling;

The current 4.0.dialect definitions:

[default]
no-headline
bad-spelling: 0

[no-headline]
headline: 4

[headline]
headline: 0

[no-bad-spelling]
bad-spelling: 4

ampli · 2019-12-12T19:55:52Z

One the other hand, it is a fairly useless debug tool -- most words have hundreds if not tens of thousands of disjuncts, and picking over these is .. too hard. It's easier to work directly with the dictionary.

I found it useful to work with list of disjuncts. But it may be big so I filter the needed ones.
So maybe something like that may be useful:
!!word!d/filter_string/

linas · 2019-12-13T03:12:18Z

PR

sure. It'll take me a few days to stew over it.

!!word!d/filter_string/

Sure, why not. I assume filter_string is a regex, and so maybe that, but without the d. or something. since d normally means 'digits'.

ampli · 2019-12-13T11:25:46Z

One the other hand, it is a fairly useless debug tool -- most words have hundreds if not tens of thousands of disjuncts, and picking over these is .. too hard.

I have a branch in which I applied the classic pruning results back into the expression pruning code.
The original intention was to speed up the SAT parser by supplying it with much smaller expressions, and also to prevent the duplicate linkages it has due to duplicate disjuncts in the row expressions it currently uses. It works perfectly (fast and providing the exact same linkages).

However, this code may serve to display compact expressions that don't produce disjuncts that got removed by eliminate_duplicate_disjuncts() , power_prune() and pp_prune() (say using !!word!c).

BTW, a similar idea may also serve to display expressions with the connectors that participate in the linkage marked in them.

linas · 2019-12-13T17:49:18Z

However, this code may serve to display compact expressions that don't produce disjuncts that got removed

Not sure I like that. One of the common dictionary debugging problems is answering the quqstion "why didn't connector X attach to connector Y because it seems they should" and if these are already "silently" pruned away, that would be .. confusing. On the other hand, having much shorter disjunct lists would be nice. Perhaps two displays?

ampli · 2019-12-13T19:22:14Z

I guess it is possible to show the disjuncts before and after pp_and_power_prune(), and even to follow the bad option so pp_prune() is not done if specified. The question is what syntax to use for the link-parser command. You suggested !!word/regex/ without a flag, but then we cannot differentiate between the cases and have to show all possibilities while one of them would suffice.

ampli · 2019-12-25T00:12:51Z

"why didn't connector X attach to connector Y because it seems they should"

If the expression is displayed after pruning you may easily note the case in which the answer is "because it (X or Y) got pruned away".

!!word!d/filter_string/

Sure, why not. I assume filter_string is a regex, and so maybe that, but without the d. or something. since d normally means 'digits'.

There is a need for some way in the !!word command to differentiate between:

Original dict expression/disjuncts.
After expression_prune().
After pp_and_power_prune() (applying pp_prune() can be controlled by !bad, and/or another flag).

The place for the flags may be before a leading / or after a trailing one.
Maybe instead of letters use numbers?
!!word/re/1
!!word/re/2
etc.?

linas · 2019-12-25T01:10:02Z

After a trailing slash seems better, from a usability standpoint: one tries !!word, and then consults the docs, then up=arrow and add the backslash, -- less arrow-key usage to look at the multiple versions. Or even print all versions at once...

ampli · 2019-12-25T01:27:35Z

Or even print all versions at once...

The disjunct list is typically tens of thousands of lines... It is not so friendly thing if you only want the expression.

ampli · 2020-01-06T16:24:17Z

I finished to implement !!word/regex. Empty regex means all disjuncts. Flags are supported too as !!word/regex/flags but none of them are useful for now.
Since there may be overlapping changes with the current pending PRs, I will send a rebased PR after they are applied.

ampli · 2021-01-05T00:49:33Z

This discussion includes:

Adding dialect support - done.
Adding !!/word/ debug options. All done but one thing: Show disjunct after power pruning.
Unless this seems to be useless, I will move that to another issue.

linas · 2021-01-05T03:30:26Z

Show disjunct after power pruning.

I do not anticipate needing that.

linas · 2021-01-05T03:31:29Z

So, I guess this issue can be closed!?

linas · 2021-01-05T03:32:33Z

Oh, wait: it needs to be documented on the website ...

ampli · 2021-01-08T22:24:25Z

Eventually, I need to add it to the man page (on the next man page overhaul).

ampli · 2021-03-23T01:05:15Z

Eventually, I need to add it to the man page (on the next man page overhaul).

Added in 9a62ddc.

ampli · 2021-03-23T01:10:03Z

Issue #1172 got opened for completing the dialect API.
Closing this issue as "implemented".

linas added enhancement infrastructure labels Sep 16, 2016

linas mentioned this issue Sep 28, 2016

spell-guessing mis-handles capitalized words #404

Open

ampli self-assigned this May 20, 2018

ampli mentioned this issue Jun 3, 2018

Cost cutoff #783

Closed

ampli mentioned this issue Nov 7, 2018

Prune speedup #845

Merged

ampli mentioned this issue Jul 20, 2019

Exp cleanup #978

Merged

ampli mentioned this issue Dec 13, 2019

Dialect support #1060

Merged

ampli closed this as completed Mar 23, 2021

Dialect support! #402

Dialect support! #402

Comments

linas commented Sep 16, 2016

linas commented Sep 16, 2016

ampli commented Sep 16, 2016

linas commented Sep 17, 2016

linas commented Sep 17, 2016

ampli commented Sep 17, 2016

ampli commented Sep 17, 2016

linas commented Sep 27, 2016

linas commented Sep 27, 2016

ampli commented May 19, 2018

ampli commented May 20, 2018 • edited Loading

linas commented May 21, 2018

ampli commented May 21, 2018

ampli commented May 21, 2018 • edited by linas Loading

linas commented May 21, 2018

ampli commented Jun 6, 2018

ampli commented Jun 12, 2018 • edited Loading

ampli commented Jun 23, 2018

ampli commented Dec 4, 2019

ampli commented Dec 4, 2019

linas commented Dec 4, 2019

linas commented Dec 4, 2019

linas commented Dec 4, 2019

linas commented Dec 4, 2019

ampli commented Dec 4, 2019

linas commented Dec 4, 2019

ampli commented Dec 4, 2019

linas commented Dec 4, 2019

ampli commented Dec 4, 2019

ampli commented Dec 7, 2019

ampli commented Dec 7, 2019 • edited Loading

ampli commented Dec 11, 2019

ampli commented Dec 11, 2019

linas commented Dec 12, 2019

ampli commented Dec 12, 2019

ampli commented Dec 12, 2019

linas commented Dec 13, 2019

ampli commented Dec 13, 2019

linas commented Dec 13, 2019

ampli commented Dec 13, 2019

ampli commented Dec 25, 2019

linas commented Dec 25, 2019

ampli commented Dec 25, 2019

ampli commented Jan 6, 2020

ampli commented Jan 5, 2021

linas commented Jan 5, 2021

linas commented Jan 5, 2021

linas commented Jan 5, 2021

ampli commented Jan 8, 2021

ampli commented Mar 23, 2021

ampli commented Mar 23, 2021

ampli commented May 20, 2018 •

edited

Loading

ampli commented May 21, 2018 •

edited by linas

Loading

ampli commented Jun 12, 2018 •

edited

Loading

ampli commented Dec 7, 2019 •

edited

Loading