Lookup Table for featurized tracker messages #9020
Labels
area:rasa-oss 🎡
Anything related to the open source Rasa framework
area:rasa-oss/ml 👁
All issues related to machine learning
research:end-to-end
type:enhancement ✨
Additions of new features or changes to existing ones, should be doable in a single PR
Description of Problem:
Currently a lot of duplicate computing is done featurizing messages in trackers multiple times. There are two kinds of duplication:
With small to medium-sized datasets this is not an issue. For larger Datasets such as Multiwoz, this duplication adds almost an hour of additional preprocessing time.
Overview of the Solution:
Featurize each unique message once and store the result to be used downstream by other components.
Inside the v3 architecture prototype is a prototypical implementation of this feature. There is also a necessary, but so far unmerged fix
I have extracted the code from the prototype before to run tests on the current architecture and the latest version can also be found in the combined-e2e-fixes branch.
This feature would also unlock batch encoding during training, which would be too computationally expensive without having the features cached in the lookup table beforehand.
Open Issues:
rasa/rasa/core/featurizers/single_state_featurizer.py
Line 226 in b359b4c
rasa/rasa/core/featurizers/single_state_featurizer.py
Lines 319 to 324 in d812376
Definition of Done:
The text was updated successfully, but these errors were encountered: