Make entity extraction for entities separated by comma configurable #6852

akelad · 2020-09-30T14:12:13Z

With PR #6191, entities separated by comma now get extracted separately. This makes sense for something like:

please add buffalo, ranch, mustard, barbeque sauces results in {'entity': 'sauces', 'start': 11, 'end': 43, 'value': 'buffalo, ranch, mustard, barbeque', 'extractor': 'DIETClassifier'}

But maybe shouldn't work this way for address entities like 5306 Walnut Ave., Building A, Sacramento, CA 95841. See also issue #6795.

We should make this behaviour configurable in the entity extractors, and maybe even on an entity level, given you might have both types of entities in one bot.

The text was updated successfully, but these errors were encountered:

tttthomasssss · 2020-10-21T13:35:23Z

@tabergma @akelad just to be clear - is the idea to make this configurable per entity type (on the basis of the entity types that are annotated in the NLU training data)? For example, supposing that address and ingredients are two entity types in the data, users simply need to specify sth like the following in the config (with default behaviour of True?):

language: en
pipeline:
  ...
  - name: DIETClassifier
    epochs: 100
    entity_recognition: True
    split_entities_by_comma: 
      address: False
      ingredients: True
  ...

tabergma · 2020-10-22T06:17:02Z

Something like this sounds good I would say. I guess it would be even better if we support both variants:

language: en
pipeline:
  ...
  - name: DIETClassifier
    epochs: 100
    entity_recognition: True
    split_entities_by_comma: False/True
  ...

and

language: en
pipeline:
  ...
  - name: DIETClassifier
    epochs: 100
    entity_recognition: True
    split_entities_by_comma: 
      address: False
      ingredients: True
  ...

We also should be able to handle the cases where not all entities are listed in the second configuration. And this should also be available for the CRFEntityExtractor.

…fier (#6852)

…xtractor (#6852)

akelad added type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Sep 30, 2020

akelad mentioned this issue Sep 30, 2020

CRFEntityExtractor or DIETClassifier is splitting a entity into multiple if contains punctuation #6795

Closed

tabergma assigned tttthomasssss Oct 5, 2020

tttthomasssss added a commit that referenced this issue Oct 26, 2020

adds support for configurable entity splitting by comma to DIETClassi…

ff0fb9d

…fier (#6852)

tttthomasssss added a commit that referenced this issue Oct 26, 2020

adds support for configurable entity splitting by comma to CRFEntityE…

cb9d3a7

…xtractor (#6852)

tttthomasssss added a commit that referenced this issue Oct 26, 2020

add changelog for #6852

603757a

tttthomasssss mentioned this issue Oct 26, 2020

Split entities by comma #7103

Merged

4 tasks

tabergma closed this as completed Oct 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make entity extraction for entities separated by comma configurable #6852

Make entity extraction for entities separated by comma configurable #6852

akelad commented Sep 30, 2020

tttthomasssss commented Oct 21, 2020

tabergma commented Oct 22, 2020

Make entity extraction for entities separated by comma configurable #6852

Make entity extraction for entities separated by comma configurable #6852

Comments

akelad commented Sep 30, 2020

tttthomasssss commented Oct 21, 2020

tabergma commented Oct 22, 2020