-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add edit tree lemmatizer #10231
Add edit tree lemmatizer #10231
Conversation
@danieldk: In particular, can you double-check all the types? |
I'm not sure that I'm also not sure that My original intention was to have this kind of component as a |
82c9e50
to
773e1f5
Compare
Maybe |
So what I mean is the default component name / factory name and not the class name, where I think that Nothing else is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really excited to have this as part of the main library! During review, I mainly had some small comments about typing.
I was thinking that too. But having For the |
Maybe |
|
I don't think that'll work, because the internal parser classes are so genericly named that then it might become more difficult to see what all the parser code is... |
Co-authored-by: Sofie Van Landeghem <[email protected]>
Co-authored-by: Sofie Van Landeghem <[email protected]>
This change also changes the serialized representation. Rather than mirroring the deep C structure, we use a simple flat union of the match and substitution node types.
Tested German with the validation/serialization changes, there doesn't seem to be a regression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is ready to merge?
We can revise the cast thing later, it'd require an update of Thinc and I don't want to hold up this PR over something like that.
No, not yet. I would wait a bit until the potential integrations with |
Oh, ok. I thought we didn't want to go there initially, but I'll leave that up to you Adriane. I'll put this in draft to signal that we shouldn't merge yet! |
…/adrianeboyd/spacy into feature/add-edit-tree-lemmatizer
We decided that an integration with
|
Now this needs docs... |
… List[int], List[str]]] for thinc v8.0.14
Added, hope I didn't miss anything. |
Description
Add edit tree lemmatizer, converted from
spacy_experimental.edit_tree_lemmatizer
Types of change
Enhancement
Checklist