Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entity Recognition on sub-words #5509

Closed
tabergma opened this issue Mar 27, 2020 · 0 comments · Fixed by #5511
Closed

Entity Recognition on sub-words #5509

tabergma opened this issue Mar 27, 2020 · 0 comments · Fixed by #5511
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR

Comments

@tabergma
Copy link
Contributor

Description of Problem:
Related to #5475

We found other edge cases that can happen if we are using a tokenizer that splits up words into sub-words. Let's take a look at an example:
Sentence: Buenos Aires is a city
Tokens: Buen, os, Ai, res, is, a, city

Scenario 1:
One entity covers multiple words or a single word.
city entity -> Buen os Ai res
type entity -> city

Scenario 2:
An entity covers just a part of a word.
city entity -> Buen

Scenario 3:
An entity covers two words, but at least on of the words just partly.
city entity -> os Ai

Scenario 4:
The sub-words of one word are annotated with different entities.
city entity -> Ai, state entity -> res

Scenario 1 and 4 are handled. We need to take care of Scenario 2 and 3.

Overview of the Solution:
We should keep labels if possible. Extend the entities to cover complete words instead of just parts of the words.

@tabergma tabergma added type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Mar 27, 2020
@tabergma tabergma self-assigned this Mar 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant