Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter "share_hidden_layers" not compatible with RegexFeaturizer/LexicalSyntacticFeaturizer #5528

Closed
hotzenklotz opened this issue Mar 30, 2020 · 5 comments
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@hotzenklotz
Copy link
Contributor

hotzenklotz commented Mar 30, 2020

Rasa version:
1.9.3

Rasa SDK version (if used & relevant):
1.9.0

Rasa X version (if used & relevant):
not used

Python version:
3.7

Operating system (windows, osx, ...):
OSX

Issue:
Training a response selector / DIET classifier with the parameter share_hidden_layers: true leads to the following error, even though the text+label dimension were configure to be of equal size in the config:

ValueError: If embeddings are shared text features and label features must coincide. Check the output dimensions of previous components.

Suggested fix by @Ghostvv : Removing the RegexFeaturizer and the LexicalSyntacticFeaturizer from my config solved the issue. Looks like Rasa is attaching some features to either the text or label vectors internally that breaks the share_hidden_layers parameter.

Breaking NLU config:

pipeline:
  - name: HFTransformersNLP
    model_name: "bert"
    model_weights: "bert-base-german-cased"
  - name: "LanguageModelTokenizer"
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: LanguageModelFeaturizer
  - name: CountVectorsFeaturizer
    lowercase: false
    use_shared_vocab: true
  - name: DIETClassifier
    epochs: 50
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 500
    share_hidden_layers: true
    hidden_layers_sizes:
      text: [256, 128]
      label: [256, 128]

Error (including full traceback):

ValueError: If embeddings are shared text features and label features must coincide. Check the output dimensions of previous components.

in diet_classifier.py, line ~400
see method _check_input_dimension_consistency

Command or request that led to error:

rasa train / rasa train with cross_validation
@hotzenklotz hotzenklotz added the type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. label Mar 30, 2020
@sara-tagger
Copy link
Collaborator

Thanks for raising this issue, @alwx will get back to you about it soon✨

Please also check out the docs and the forum in case your issue was raised there too 🤗

@dakshvar22 dakshvar22 self-assigned this Apr 6, 2020
@dakshvar22
Copy link
Contributor

Don’t think this can be solved right now without implementing #5510 because

  1. It doesn’t make sense to process responses through LexicalSyntacticFeaturizer
  2. Inside ResponseSelector there is no way of finding which feature set corresponds to the ones coming from LexicalSyntacticFeaturizer for text, so that they can be left out.

@Ghostvv
Copy link
Contributor

Ghostvv commented Apr 14, 2020

@dakshvar22 I think it even doesn't make sense to have sentence level features in LexicalSyntacticFeaturizer

@Ghostvv Ghostvv added the area:rasa-oss 🎡 Anything related to the open source Rasa framework label Apr 14, 2020
@dakshvar22
Copy link
Contributor

@Ghostvv I agree, they should be excluded. cc @tabergma

@tabergma
Copy link
Contributor

@dakshvar22 Can this be closed as you can now exclude specific features from the ResponseSelector? (#5863)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

No branches or pull requests

5 participants