Rasa training is very slow due to excessive copy of the domain, fails on machine with low memory. #6044
Labels
area:rasa-oss 🎡
Anything related to the open source Rasa framework
type:bug 🐛
Inconsistencies or issues which will cause an issue or problem for users or implementors.
Rasa version:
Rasa 1.10.1
Python version:
Python 3.7.6
Operating system (windows, osx, ...):
Consistent on osx and linux.
For example on OSX 10.14.6 using 1.7 GHz Intel Core i7 16 GB 2133 MHz LPDDR3
Issue:
We use the memoization policy (no ML-based policy), have a domain with a few hundred slots, the vast majority is
unfeaturized
. Training takes about 15 minutes. We ran profiling, and noticed that a lot of time is spent simply copying the Domain dictionary. We have observed in the past similar issues at runtime, where time is spent copying the domain dictionary simply to load the tracker in memory.It seems like this is done because the domain is mutated with the actual values of the slots. Perhaps a better implementation would avoid mutating the domain, and keep values in a separate structure.
Error (including full traceback):
Here's the profiling data
stats.zip
You can parse it using :
Here's the output
We suspect this could be the issue
rasa/rasa/core/trackers.py
Line 135 in 850344f
Additionally, this fails on Circle CI, probably due to lack of memory. Note that this is using a docker executor with 16GB of memory, and waiting 30 minutes for training. We are not yet sure of the exact cause of this failure, but what we know is that removing a dozen unfeaturized slots for the domain trains in about 15 minutes without issues.
Command or request that led to error:
Content of configuration file (config.yml) (if relevant):
config.yml.zip
Content of domain file (domain.yml) (if relevant):
Cannot attach due to IP, but contains about 400 hundred slots, and only a handful are featurized ( but that's not the issue here). We did reproduce this in the past with only unfeaturized slots, the main variable is the number of slots.
The text was updated successfully, but these errors were encountered: