-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let RegexEntityExtractor
work without the dummy entity annotation
#9439
Comments
RegexEntityExtractor
work without the dummy entity annotation
Exalate commented: samsucik commented: @wochinge based on this thread what do you say to including this in the 3.0 milestone? |
Exalate commented: samsucik commented: Just to make the definition of done clearer for this one: As part of updating the relevant docs, we should change the bits here that talk about having to include at least two training examples in order for the NLU model to pick up the entity. I haven't been able to find this rule anywhere in our code and I suspect that, in reality, anything with one or more example is picked up. |
Exalate commented: wochinge commented: I'm hesitant to add anything to the milestone this late in the process. If so I'd add it to a "nice to have in 3.0" milestone. |
Exalate commented: samsucik commented: @wochinge I think having as a "nice to have" would be great. (You know the 3.0 milestone better, I myself am just trying to bump up this particular issue so it gets addressed soon.) |
Exalate commented: wochinge commented: @TyDunn What do you think? Should we create a nice to have milestone? |
➤ Maxime Verger commented: 💡 Heads up! We're moving issues to Jira: https://rasa-open-source.atlassian.net/browse/OSS. From now on, this Jira board is the place where you can browse (without an account) and create issues (you'll need a free Jira account for that). This GitHub issue has already been migrated to Jira and will be closed on January 9th, 2023. Do not forget to subscribe to the corresponding Jira issue! ➡️ More information in the forum: https://forum.rasa.com/t/migration-of-rasa-oss-issues-to-jira/56569. |
Description of Problem: When we use RegexEntityExtractor we have to annotate at least one example of each entity in the training data. This is only necessary because of this condition, and it is problematic when you want to use multiple entity extractors.
We recommend that "If you use multiple entity extractors, we advise that each extractor targets an exclusive set of entity types", but we don't actually allow this. You cannot distinguish in the annotation what entity extractor the annotation is for, so once you annotate the entity, DIET will try to extract it. But you also have to annotate an entity if you use RegexEntityExtractor because it should only pay attention to those lookup tables and entities whose names are also names of entities, but the code that checks this has no access to the domain and thus asks the training data, hence the required annotation.
Overview of the Solution: Make the domain (or entities in the domain) available here. That’ll require that Featurizers and Extractors get access to domain during train - which isn’t done there currently, but we do that for policies already.
Definition of Done:
The text was updated successfully, but these errors were encountered: