-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multi token inflection and copying #46
Comments
I agree with the second two, but is it bad that
? That was intentional. I guess I can see use cases for removing this restriction though... what would the API be, also support a dict with {
"PATTERN_REF": {"start": 1, "end": 4}
} or a list? {
"PATTERN_REF": [0, 3, 7]
} ? |
Haha, actually I don't need the second and third, I just need the first one ie. The use case: imagine you want to match : So you would want to copy stuff between TO and infinitive, which in general can be even a few words long. Of course here one can do it manually and play with indices after finding the match (remove first and second, move middle stuff), but then you need to separately use pyinflect/lemminflect/replaCy wrapper to make the gerund (so at this point you don't need replaCy at all since you are making it fully manually, you could just use the spaCy matcher). (A few days ago spent 2h trying to confirm its a spaCy matcher bug, then discovered this in replaCy. 😞 The second and the third added for design consistency. )
|
Another use case:
It's more difficult to imagine a use case when the pattern would be multi-token, but we would want to copy just the first one imo. |
Some of the notes in #47 might also be relevant |
Currently replaCy assumes token matches and text suggestions are single tokens:
PATTERN_REF
copies only the first matched token:https://github.com/Qordobacode/replaCy/blob/bd7df64e2e5b875aaa9fecff31b1d898650219b4/replacy/__init__.py#L166
suggested fix: change the above line, enforce
INFLECTION
only if one token is matchedTEXT
consists of a single word only:https://github.com/Qordobacode/replaCy/blob/bd7df64e2e5b875aaa9fecff31b1d898650219b4/replacy/__init__.py#L163
suggested fix: json validation to enforce single tokens as
TEXT
ex. instead of
{"TEXT": "blah blah"}
use{"TEXT": "blah"}, {"TEXT": "blah"}
FROM_TEMPLATE_ID
: assumes single token match, from which we transfer the inflectionhttps://github.com/Qordobacode/replaCy/blob/bd7df64e2e5b875aaa9fecff31b1d898650219b4/replacy/__init__.py#L178
suggested fix: enforce
TEMPLATE_ID
match being single tokenIdea:
Single token match enforcement: fail json validation if patterns include
FROM_TEMPLATE_ID
orPATTERN_REF
pointing to a token withOP
?The text was updated successfully, but these errors were encountered: