Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make entity extraction for entities separated by comma configurable #6852

Closed
akelad opened this issue Sep 30, 2020 · 2 comments
Closed

Make entity extraction for entities separated by comma configurable #6852

akelad opened this issue Sep 30, 2020 · 2 comments
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR

Comments

@akelad
Copy link
Contributor

akelad commented Sep 30, 2020

With PR #6191, entities separated by comma now get extracted separately. This makes sense for something like:

please add buffalo, ranch, mustard, barbeque sauces results in {'entity': 'sauces', 'start': 11, 'end': 43, 'value': 'buffalo, ranch, mustard, barbeque', 'extractor': 'DIETClassifier'}

But maybe shouldn't work this way for address entities like 5306 Walnut Ave., Building A, Sacramento, CA 95841. See also issue #6795.

We should make this behaviour configurable in the entity extractors, and maybe even on an entity level, given you might have both types of entities in one bot.

@akelad akelad added type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Sep 30, 2020
@tttthomasssss
Copy link
Contributor

@tabergma @akelad just to be clear - is the idea to make this configurable per entity type (on the basis of the entity types that are annotated in the NLU training data)? For example, supposing that address and ingredients are two entity types in the data, users simply need to specify sth like the following in the config (with default behaviour of True?):

language: en
pipeline:
  ...
  - name: DIETClassifier
    epochs: 100
    entity_recognition: True
    split_entities_by_comma: 
      address: False
      ingredients: True
  ...

@tabergma
Copy link
Contributor

Something like this sounds good I would say. I guess it would be even better if we support both variants:

language: en
pipeline:
  ...
  - name: DIETClassifier
    epochs: 100
    entity_recognition: True
    split_entities_by_comma: False/True
  ...

and

language: en
pipeline:
  ...
  - name: DIETClassifier
    epochs: 100
    entity_recognition: True
    split_entities_by_comma: 
      address: False
      ingredients: True
  ...

We also should be able to handle the cases where not all entities are listed in the second configuration. And this should also be available for the CRFEntityExtractor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR
Projects
None yet
Development

No branches or pull requests

3 participants