One word two entity labels #5475
Labels
area:rasa-oss 🎡
Anything related to the open source Rasa framework
type:bug 🐛
Inconsistencies or issues which will cause an issue or problem for users or implementors.
Description of the problem
ConveRT and also other language models we have in our pipeline split words during tokenization into sub-words.
DIETClassifier
assigns different entities to the individual sub-words.Example:
Overview of the solution:
It should not be possible to assign two different entities to one word/token. We should add a sanity check that permits double assignments. We might want to keep the assignment with the higher confidence.
We need to check if this also happens with the
CRFEntityExtractor
.The text was updated successfully, but these errors were encountered: