Rasa 1.6.0 breaks custom components #4995

samscudder · 2019-12-19T12:32:04Z

Rasa version: 1.6.0

Rasa SDK version (if used & relevant): 1.6.0

Python version: 3.6.9 (anaconda)

Operating system (windows, osx, ...): RHEL 7 (CentOS)

Issue: Custom components now fail during training, caused by this:

[https://github.com/RasaHQ/rasa/commit/255b91088a57c45a766af37826845b8c936a0a77]

Documentation doesn't reflect this alteration.

Error (including full traceback):

Traceback (most recent call last):
  File "/home/xxxx/anaconda3/envs/ChatAna/bin/rasa", line 8, in <module>
    sys.exit(main())
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/__main__.py", line 76, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/cli/train.py", line 76, in train
    kwargs=extract_additional_arguments(args),
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/train.py", line 46, in train
    kwargs=kwargs,
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/train.py", line 97, in train_async
    kwargs,
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/train.py", line 184, in _train_async_internal
    kwargs=kwargs,
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/train.py", line 241, in _do_training
    persist_nlu_training_data=persist_nlu_training_data,
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/train.py", line 470, in _train_nlu_with_validated_data
    persist_nlu_training_data=persist_nlu_training_data,
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/nlu/train.py", line 74, in train
    trainer = Trainer(nlu_config, component_builder)
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/nlu/model.py", line 147, in __init__
    self.pipeline = self._build_pipeline(cfg, component_builder)
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/nlu/model.py", line 159, in _build_pipeline
    component = component_builder.create_component(component_cfg, cfg)
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/nlu/components.py", line 482, in create_component
    component = registry.create_component_by_config(component_config, cfg)
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/nlu/registry.py", line 232, in create_component_by_config
    return component_class.create(component_config, config)
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/nlu/components.py", line 267, in create
    return cls(component_config)
  File "/home/xxxx/anaconda3/envs/ChatAna/lib/python3.6/site-packages/rasa/nlu/tokenizers/tokenizer.py", line 60, in __init__
    "No default value for 'use_cls_token' was set. Please, "
KeyError: "No default value for 'use_cls_token' was set. Please, add it to the default dict of the tokenizer."

Command or request that led to error:

rasa train --force

Content of configuration file (config.yml) (if relevant):

language: pt
pipeline:
- name: "SpacyNLP"
- name: "SpacyTokenizer"
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "CountVectorsFeaturizer"
  strip_accents: "unicode"
  analyzer: "char_wb"
  min_ngram: 1
  max_ngram: 4
- name: "CustomLemmatization"
- name: "EmbeddingIntentClassifier"
policies:
- name: FormPolicy
- name: AugmentedMemoizationPolicy
  max_history: 1
- name: KerasPolicy
  validation_split: 0.2
  batch_size: 64
  epochs: 300
- name: MappingPolicy
- name: FallbackPolicy
  nlu_threshold: 0.7
  core_threshold: 0.7
  fallback_action_name: "utter_default"

Content of domain file (domain.yml) (if relevant):

**CustomLemmatization.py file:

import typing
import spacy
from typing import Any

from rasa.nlu.components import Component
from rasa.nlu.config import RasaNLUModelConfig
from rasa.nlu.tokenizers.tokenizer import Token, Tokenizer
from rasa.nlu.training_data import Message, TrainingData

spacy.load('pt')

if typing.TYPE_CHECKING:
    from spacy.tokens.doc import Doc

class CustomLemmatization(Tokenizer, Component):
    name = "tokenizer_spacy_lemma"
    provides = ["tokens"]
    requires = ["spacy_doc"]

    def train(self,
              training_data: TrainingData,
              config: RasaNLUModelConfig,
              **kwargs: Any)-> None:

        for example in training_data.training_examples:
            example.set("tokens", self.tokenize(example.get("spacy_doc")))

    def process(self, message: Message, **kwargs: Any)-> None:

        message.set("tokens", self.tokenize(message.get("spacy_doc")))

    def tokenize(self, doc: 'Doc')-> typing.List[Token]:

        return [Token(t.lemma_, t.idx) for t in doc]

The text was updated successfully, but these errors were encountered:

sara-tagger · 2019-12-19T13:00:03Z

Thanks for the issue, @tmbo will get back to you about it soon!

You may find help in the docs and the forum, too 🤗

samscudder · 2019-12-19T13:01:06Z

Actually... looks like this is the problem. The default value for use_cls_token in tokenizer.py was changed from False to True.

3b51563

tmbo · 2019-12-19T13:05:54Z

@samscudder thanks a lot for the report.

@tabergma in the previous implementation, we actually checked if the use_cls_token config key is actually set, is there a reason why we don't do that anymore? 3b51563#diff-647f9ebc7a8ac8043482039acae61395R29

(e.g. why not go with something like self.component_config.get("use_cls_token", True))

tabergma · 2019-12-19T13:12:06Z

The default value should be set to False for now. We wanted to make users aware that something changed and that they should update their custom components. Thus, we added the error.
We will remove the option use_cls_token soon again. Then, every tokenizer should set it by default. This change is needed for the new NLU model we want to release beginning of next year.

samscudder · 2019-12-19T13:19:10Z

Shouldn't this be a warning instead of an error then?

samscudder · 2019-12-19T13:38:07Z

To help anybody else that has this problem while you decide what needs to be done..., the recommended course or action for < 2.0 would be to add use_cls_token: false to the config.yml for my component then?

tabergma · 2019-12-19T13:41:04Z

For custom tokenizer you should add

"use_cls_token": False

to your default dict in your custom components.

For custom featurizers you should add

"return_sequence": False

to your default dict in your custom components.

Will update the docs and print warning instead of the error.

samscudder added the type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. label Dec 19, 2019

tabergma self-assigned this Dec 19, 2019

tabergma mentioned this issue Dec 19, 2019

Tokenizer featurizer warning #4998

Merged

4 tasks

tabergma closed this as completed Jan 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rasa 1.6.0 breaks custom components #4995

Rasa 1.6.0 breaks custom components #4995

samscudder commented Dec 19, 2019

sara-tagger commented Dec 19, 2019

samscudder commented Dec 19, 2019 •

edited

Loading

tmbo commented Dec 19, 2019

tabergma commented Dec 19, 2019

samscudder commented Dec 19, 2019

samscudder commented Dec 19, 2019

tabergma commented Dec 19, 2019

Rasa 1.6.0 breaks custom components #4995

Rasa 1.6.0 breaks custom components #4995

Comments

samscudder commented Dec 19, 2019

sara-tagger commented Dec 19, 2019

You may find help in the docs and the forum, too 🤗

samscudder commented Dec 19, 2019 • edited Loading

tmbo commented Dec 19, 2019

tabergma commented Dec 19, 2019

samscudder commented Dec 19, 2019

samscudder commented Dec 19, 2019

tabergma commented Dec 19, 2019

samscudder commented Dec 19, 2019 •

edited

Loading