Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lookup Table for featurized tracker messages #9020

Closed
4 tasks
twerkmeister opened this issue Jul 5, 2021 · 1 comment
Closed
4 tasks

Lookup Table for featurized tracker messages #9020

twerkmeister opened this issue Jul 5, 2021 · 1 comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml 👁 All issues related to machine learning research:end-to-end type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR

Comments

@twerkmeister
Copy link
Contributor

Description of Problem:
Currently a lot of duplicate computing is done featurizing messages in trackers multiple times. There are two kinds of duplication:

  1. Among the positions of the sliding window across a single conversation
  2. Whenever we have identical messages across conversations

With small to medium-sized datasets this is not an issue. For larger Datasets such as Multiwoz, this duplication adds almost an hour of additional preprocessing time.

Overview of the Solution:
Featurize each unique message once and store the result to be used downstream by other components.

Inside the v3 architecture prototype is a prototypical implementation of this feature. There is also a necessary, but so far unmerged fix

I have extracted the code from the prototype before to run tests on the current architecture and the latest version can also be found in the combined-e2e-fixes branch.

This feature would also unlock batch encoding during training, which would be too computationally expensive without having the features cached in the lookup table beforehand.

Open Issues:

  • How to solve for inference is still marked as TODO in the current v3 architecture prototype
  • Entity encoding is still problematic in hte current v3 architecture prototype
    • # TODO: determine why an interpreter is needed.
      interpreter = RegexInterpreter()
      message = interpreter.featurize_message(Message(entity_data))
      if not message:
      return {}
    • If you just use an empty interpreter, featurize_message returns None resulting in no entity info

Definition of Done:

  • existing code is integrated into the new architecture
  • open issues are addressed
  • Tests are added
  • Feature mentioned in the changelog
@twerkmeister twerkmeister added type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml 👁 All issues related to machine learning research:end-to-end labels Jul 5, 2021
@ka-bu
Copy link
Contributor

ka-bu commented Sep 24, 2021

closed via #9405

@ka-bu ka-bu closed this as completed Sep 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml 👁 All issues related to machine learning research:end-to-end type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR
Projects
None yet
Development

No branches or pull requests

2 participants