Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rasa 1.6.0 breaks custom components #4995

Closed
samscudder opened this issue Dec 19, 2019 · 7 comments
Closed

Rasa 1.6.0 breaks custom components #4995

samscudder opened this issue Dec 19, 2019 · 7 comments
Assignees
Labels
type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@samscudder
Copy link

Rasa version: 1.6.0

Rasa SDK version (if used & relevant): 1.6.0

Python version: 3.6.9 (anaconda)

Operating system (windows, osx, ...): RHEL 7 (CentOS)

Issue: Custom components now fail during training, caused by this:

[https://github.com/RasaHQ/rasa/commit/255b91088a57c45a766af37826845b8c936a0a77]

Documentation doesn't reflect this alteration.

Error (including full traceback):

Traceback (most recent call last):
  File "/home/xxxx/anaconda3/envs/ChatAna/bin/rasa", line 8, in <module>
    sys.exit(main())
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/__main__.py", line 76, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/cli/train.py", line 76, in train
    kwargs=extract_additional_arguments(args),
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/train.py", line 46, in train
    kwargs=kwargs,
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/train.py", line 97, in train_async
    kwargs,
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/train.py", line 184, in _train_async_internal
    kwargs=kwargs,
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/train.py", line 241, in _do_training
    persist_nlu_training_data=persist_nlu_training_data,
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/train.py", line 470, in _train_nlu_with_validated_data
    persist_nlu_training_data=persist_nlu_training_data,
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/nlu/train.py", line 74, in train
    trainer = Trainer(nlu_config, component_builder)
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/nlu/model.py", line 147, in __init__
    self.pipeline = self._build_pipeline(cfg, component_builder)
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/nlu/model.py", line 159, in _build_pipeline
    component = component_builder.create_component(component_cfg, cfg)
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/nlu/components.py", line 482, in create_component
    component = registry.create_component_by_config(component_config, cfg)
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/nlu/registry.py", line 232, in create_component_by_config
    return component_class.create(component_config, config)
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/nlu/components.py", line 267, in create
    return cls(component_config)
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/nlu/tokenizers/tokenizer.py", line 60, in __init__
    "No default value for 'use_cls_token' was set. Please, "
KeyError: "No default value for 'use_cls_token' was set. Please, add it to the default dict of the tokenizer."

Command or request that led to error:

rasa train --force

Content of configuration file (config.yml) (if relevant):

language: pt
pipeline:
- name: "SpacyNLP"
- name: "SpacyTokenizer"
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "CountVectorsFeaturizer"
  strip_accents: "unicode"
  analyzer: "char_wb"
  min_ngram: 1
  max_ngram: 4
- name: "CustomLemmatization"
- name: "EmbeddingIntentClassifier"
policies:
- name: FormPolicy
- name: AugmentedMemoizationPolicy
  max_history: 1
- name: KerasPolicy
  validation_split: 0.2
  batch_size: 64
  epochs: 300
- name: MappingPolicy
- name: FallbackPolicy
  nlu_threshold: 0.7
  core_threshold: 0.7
  fallback_action_name: "utter_default"

Content of domain file (domain.yml) (if relevant):

**CustomLemmatization.py file:

import typing
import spacy
from typing import Any

from rasa.nlu.components import Component
from rasa.nlu.config import RasaNLUModelConfig
from rasa.nlu.tokenizers.tokenizer import Token, Tokenizer
from rasa.nlu.training_data import Message, TrainingData

spacy.load('pt')

if typing.TYPE_CHECKING:
    from spacy.tokens.doc import Doc

class CustomLemmatization(Tokenizer, Component):
    name = "tokenizer_spacy_lemma"
    provides = ["tokens"]
    requires = ["spacy_doc"]

    def train(self,
              training_data: TrainingData,
              config: RasaNLUModelConfig,
              **kwargs: Any)-> None:

        for example in training_data.training_examples:
            example.set("tokens", self.tokenize(example.get("spacy_doc")))

    def process(self, message: Message, **kwargs: Any)-> None:

        message.set("tokens", self.tokenize(message.get("spacy_doc")))

    def tokenize(self, doc: 'Doc')-> typing.List[Token]:

        return [Token(t.lemma_, t.idx) for t in doc]
@samscudder samscudder added the type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. label Dec 19, 2019
@sara-tagger
Copy link
Collaborator

Thanks for the issue, @tmbo will get back to you about it soon!

You may find help in the docs and the forum, too 🤗

@samscudder
Copy link
Author

samscudder commented Dec 19, 2019

Actually... looks like this is the problem. The default value for use_cls_token in tokenizer.py was changed from False to True.

3b51563

@tmbo
Copy link
Member

tmbo commented Dec 19, 2019

@samscudder thanks a lot for the report.

@tabergma in the previous implementation, we actually checked if the use_cls_token config key is actually set, is there a reason why we don't do that anymore? 3b51563#diff-647f9ebc7a8ac8043482039acae61395R29

(e.g. why not go with something like self.component_config.get("use_cls_token", True))

@tabergma
Copy link
Contributor

The default value should be set to False for now. We wanted to make users aware that something changed and that they should update their custom components. Thus, we added the error.
We will remove the option use_cls_token soon again. Then, every tokenizer should set it by default. This change is needed for the new NLU model we want to release beginning of next year.

@samscudder
Copy link
Author

Shouldn't this be a warning instead of an error then?

@samscudder
Copy link
Author

To help anybody else that has this problem while you decide what needs to be done..., the recommended course or action for < 2.0 would be to add use_cls_token: false to the config.yml for my component then?

@tabergma
Copy link
Contributor

For custom tokenizer you should add

"use_cls_token": False

to your default dict in your custom components.

For custom featurizers you should add

"return_sequence": False

to your default dict in your custom components.

Will update the docs and print warning instead of the error.

@tabergma tabergma self-assigned this Dec 19, 2019
@tabergma tabergma closed this as completed Jan 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

No branches or pull requests

4 participants