Support multi token inflection and copying #46

melisa-writer · 2020-06-19T09:28:40Z

Currently replaCy assumes token matches and text suggestions are single tokens:

PATTERN_REF copies only the first matched token:
https://github.com/Qordobacode/replaCy/blob/bd7df64e2e5b875aaa9fecff31b1d898650219b4/replacy/__init__.py#L166
suggested fix: change the above line, enforce INFLECTION only if one token is matched
TEXT consists of a single word only:
https://github.com/Qordobacode/replaCy/blob/bd7df64e2e5b875aaa9fecff31b1d898650219b4/replacy/__init__.py#L163
suggested fix: json validation to enforce single tokens as TEXT
ex. instead of {"TEXT": "blah blah"} use {"TEXT": "blah"}, {"TEXT": "blah"}
FROM_TEMPLATE_ID: assumes single token match, from which we transfer the inflection
https://github.com/Qordobacode/replaCy/blob/bd7df64e2e5b875aaa9fecff31b1d898650219b4/replacy/__init__.py#L178
suggested fix: enforce TEMPLATE_ID match being single token

Idea:
Single token match enforcement: fail json validation if patterns include FROM_TEMPLATE_ID or PATTERN_REF pointing to a token with OP?

The text was updated successfully, but these errors were encountered:

sam-writer · 2020-06-19T18:16:17Z

I agree with the second two, but is it bad that

PATTERN_REF copies only the first matched token

? That was intentional. I guess I can see use cases for removing this restriction though... what would the API be, also support a dict with start and end? Something like

{
  "PATTERN_REF": {"start": 1, "end": 4}
}

or a list?

{
  "PATTERN_REF": [0, 3, 7]
}

?

sam-writer · 2020-06-19T18:17:07Z

Also, related to #41 and #33

melisa-writer · 2020-06-21T03:40:34Z

I agree with the second two, but is it bad that

PATTERN_REF copies only the first matched token

? That was intentional. I guess I can see use cases for removing this restriction though... what would the API be, also support a dict with start and end? Something like
{
  "PATTERN_REF": {"start": 1, "end": 4}
}
or a list?
{
  "PATTERN_REF": [0, 3, 7]
}
?

Haha, actually I don't need the second and third, I just need the first one ie. PATTERN_REF.

The use case:

imagine you want to match : to (very) quickly go
and turn it into: (very) quickly going

So you would want to copy stuff between TO and infinitive, which in general can be even a few words long.

Of course here one can do it manually and play with indices after finding the match (remove first and second, move middle stuff), but then you need to separately use pyinflect/lemminflect/replaCy wrapper to make the gerund (so at this point you don't need replaCy at all since you are making it fully manually, you could just use the spaCy matcher).

(A few days ago spent 2h trying to confirm its a spaCy matcher bug, then discovered this in replaCy. 😞 The second and the third added for design consistency. )

Copy stuff you matched seems more natural than copy the first token from the phrase you matched
By default would assume we always copy everything.

melisa-writer · 2020-06-21T03:47:39Z

Another use case:

matching a token and moving it to the end of the sentence. If you can't copy more than one token - this won't work.

It's more difficult to imagine a use case when the pattern would be multi-token, but we would want to copy just the first one imo.

sam-writer · 2020-06-22T15:28:55Z

Some of the notes in #47 might also be relevant

melisa-writer added the bug Something isn't working label Jun 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multi token inflection and copying #46

Support multi token inflection and copying #46

melisa-writer commented Jun 19, 2020

sam-writer commented Jun 19, 2020 •

edited

Loading

sam-writer commented Jun 19, 2020 •

edited

Loading

melisa-writer commented Jun 21, 2020 •

edited

Loading

melisa-writer commented Jun 21, 2020

sam-writer commented Jun 22, 2020

Support multi token inflection and copying #46

Support multi token inflection and copying #46

Comments

melisa-writer commented Jun 19, 2020

sam-writer commented Jun 19, 2020 • edited Loading

sam-writer commented Jun 19, 2020 • edited Loading

melisa-writer commented Jun 21, 2020 • edited Loading

melisa-writer commented Jun 21, 2020

sam-writer commented Jun 22, 2020

sam-writer commented Jun 19, 2020 •

edited

Loading

sam-writer commented Jun 19, 2020 •

edited

Loading

melisa-writer commented Jun 21, 2020 •

edited

Loading