Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DIETClassifier wrongly detects an entity with another #5563

Closed
eugeniumegherea opened this issue Apr 2, 2020 · 2 comments
Closed

DIETClassifier wrongly detects an entity with another #5563

eugeniumegherea opened this issue Apr 2, 2020 · 2 comments
Labels
type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@eugeniumegherea
Copy link

Rasa version: Rasa 1.9.4
Python version: Python 3.6.9
Operating system: Ubuntu 18.04
Issue:
I have an entity annotated as: [MAD](iata:mad) and with a regex: [a-zA-Z]{3}
And I have another entity: [OABC123](booking_id) with regex: [OHWFGA]{1}[\dA-Z]{6}
I have around 100 iata examples and around 40 booking id examples.

For some reason, NLU detects a booking id as iata consistently, which is wrong.

I recently migrated from 1.7.4 to 1.9.4.
In 1.7.4 the exact same dataset produced expected results, but not in 1.9.4.

Is it related to #5475?

2020-04-02 13:08:25 DEBUG    rasa.core.processor  - Received user message 'BCF4D6E' with intent '{'name': 'inform', 'confidence': 0.9982755780220032}' and entities '[{'entity': 'iata', 'start': 0, 'end': 7, 'extractor': 'DIETClassifier', 'value': 'BCF4D6E'}, {'start': 3, 'end': 4, 'text': '4', 'value': 4, 'confidence': 1.0, 'additional_info': {'value': 4, 'type': 'value'}, 'entity': 'number', 'extractor': 'DucklingHTTPExtractor'}, {'start': 5, 'end': 6, 'text': '6', 'value': 6, 'confidence': 1.0, 'additional_info': {'value': 6, 'type': 'value'}, 'entity': 'number', 'extractor': 'DucklingHTTPExtractor'}]'

Command or request that led to error:

rasa train && rasa shell -vv

Content of configuration file (config.yml)

language: en
pipeline:
  - name: "SpacyNLP"
    case_sensitive: false
    model: "en_core_web_lg"
  - name: "SpacyTokenizer"
  - name: "SpacyFeaturizer"
  - name: "RegexFeaturizer"
  - name: LexicalSyntacticFeaturizer
    features: [
      ["low", "title", "upper"],
      [
        "BOS",
        "EOS",
        "low",
        "prefix5",
        "prefix2",
        "suffix5",
        "suffix3",
        "suffix2",
        "upper",
        "title",
        "digit",
      ],
      ["low", "title", "upper"],
    ]
  - name: "CountVectorsFeaturizer"
  - name: "CountVectorsFeaturizer"
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: "EntitySynonymMapper"
  - name: ResponseSelector
  - name: "DucklingHTTPExtractor"
    url: "http://192.168.5.58:8000"
    dimensions: ["time", "number", "ordinal", "distance", "amount-of-money", "email", "url", "phone-number"]
    locale: "en_US"
    timezone: "Europe/Chisinau"
  - name: "SpacyEntityExtractor"
    dimensions: ["GPE", "FAC"]
  - name: "components.extractors.entity_filter.EntityFilterExtractor"
    mappings:
    - entities: ["FAC", "iata"]
      mapTo: "FAC"
    - entities: ["GPE", "iata"]
      mapTo: "GPE"


policies:
  - name: TEDPolicy
    max_history: 10
    epochs: 20
    batch_size:
    - 32
    - 64
  - name: AugmentedMemoizationPolicy
    max_history: 6
  - name: TwoStageFallbackPolicy
    core_threshold: 0.3
    nlu_threshold: 0.6
  - name: FormPolicy
  - name: MappingPolicy
@eugeniumegherea eugeniumegherea added the type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. label Apr 2, 2020
@eugeniumegherea eugeniumegherea changed the title DIETClassifier wrongly detected an entity with another DIETClassifier wrongly detects an entity with another Apr 2, 2020
@sara-tagger
Copy link
Collaborator

Thanks for the issue, @chkoss will get back to you about it soon!

You may find help in the docs and the forum, too 🤗

@eugeniumegherea
Copy link
Author

My issue was solved by changing the pipeline.

If your pipeline contains CRFEntityExtractor and EmbeddingIntentClassifier you can substitute both components with DIETClassifier. You can use the following pipeline for that:

Migration guide from 1.7 to 1.8 didn't work out for me, so I just used spacy pipeline from docs

My current working config:

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en
pipeline:
  - name: "SpacyNLP"
    case_sensitive: false
    model: "en_core_web_lg"
  - name: "SpacyTokenizer"
  - name: "SpacyFeaturizer"
  - name: "RegexFeaturizer"
  - name: LexicalSyntacticFeaturizer
  - name: "CountVectorsFeaturizer"
  - name: "CountVectorsFeaturizer"
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: "EntitySynonymMapper"
  - name: ResponseSelector
    epochs: 100
  - name: "DucklingHTTPExtractor"
    url: "http://192.168.5.58:8000"
    dimensions: ["time", "number", "ordinal", "distance", "amount-of-money", "email", "url", "phone-number"]
    locale: "en_US"
    timezone: "Europe/Chisinau"
  - name: "SpacyEntityExtractor"
    dimensions: ["GPE", "FAC"]
  - name: "components.extractors.entity_filter.EntityFilterExtractor"
    mappings:
    - entities: ["FAC", "iata"]
      mapTo: "FAC"
    - entities: ["GPE", "iata"]
      mapTo: "GPE"


policies:
  - name: TEDPolicy
    max_history: 10
    epochs: 20
    batch_size:
    - 32
    - 64
  - name: AugmentedMemoizationPolicy
    max_history: 6
  - name: TwoStageFallbackPolicy
    core_threshold: 0.3
    nlu_threshold: 0.6
  - name: FormPolicy
  - name: MappingPolicy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

No branches or pull requests

2 participants