Lookup Table for featurized tracker messages #9020

twerkmeister · 2021-07-05T11:45:43Z

Description of Problem:
Currently a lot of duplicate computing is done featurizing messages in trackers multiple times. There are two kinds of duplication:

Among the positions of the sliding window across a single conversation
Whenever we have identical messages across conversations

With small to medium-sized datasets this is not an issue. For larger Datasets such as Multiwoz, this duplication adds almost an hour of additional preprocessing time.

Overview of the Solution:
Featurize each unique message once and store the result to be used downstream by other components.

Inside the v3 architecture prototype is a prototypical implementation of this feature. There is also a necessary, but so far unmerged fix

I have extracted the code from the prototype before to run tests on the current architecture and the latest version can also be found in the combined-e2e-fixes branch.

This feature would also unlock batch encoding during training, which would be too computationally expensive without having the features cached in the lookup table beforehand.

Open Issues:

How to solve for inference is still marked as TODO in the current v3 architecture prototype
- rasa/rasa/core/featurizers/single_state_featurizer.py
  
  Line 226 in b359b4c
  
  # TODO: We need a fallback for unexpected user texts during prediction time

Entity encoding is still problematic in hte current v3 architecture prototype

rasa/rasa/core/featurizers/single_state_featurizer.py

Lines 319 to 324 in d812376

    
           # TODO: determine why an interpreter is needed. 
        
           interpreter = RegexInterpreter() 
        
           message = interpreter.featurize_message(Message(entity_data)) 
        
           if not message: 
        
               return {}

If you just use an empty interpreter, featurize_message returns None resulting in no entity info

Definition of Done:

existing code is integrated into the new architecture
open issues are addressed
Tests are added
Feature mentioned in the changelog

ka-bu · 2021-09-24T08:18:16Z

closed via #9405

twerkmeister added type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml 👁 All issues related to machine learning research:end-to-end labels Jul 5, 2021

This was referenced Jul 5, 2021

Tracker featurization during batch generation #9022

Closed

Support training on big datasets #8595

Closed

ka-bu closed this as completed Sep 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lookup Table for featurized tracker messages #9020

Lookup Table for featurized tracker messages #9020

twerkmeister commented Jul 5, 2021

ka-bu commented Sep 24, 2021

Lookup Table for featurized tracker messages #9020

Lookup Table for featurized tracker messages #9020

Comments

twerkmeister commented Jul 5, 2021

ka-bu commented Sep 24, 2021