Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rasa training is very slow due to excessive copy of the domain, fails on machine with low memory. #6044

Closed
edouardlp opened this issue Jun 18, 2020 · 3 comments
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@edouardlp
Copy link

edouardlp commented Jun 18, 2020

Rasa version:
Rasa 1.10.1

Python version:
Python 3.7.6

Operating system (windows, osx, ...):
Consistent on osx and linux.
For example on OSX 10.14.6 using 1.7 GHz Intel Core i7 16 GB 2133 MHz LPDDR3

Issue:
We use the memoization policy (no ML-based policy), have a domain with a few hundred slots, the vast majority is unfeaturized. Training takes about 15 minutes. We ran profiling, and noticed that a lot of time is spent simply copying the Domain dictionary. We have observed in the past similar issues at runtime, where time is spent copying the domain dictionary simply to load the tracker in memory.

It seems like this is done because the domain is mutated with the actual values of the slots. Perhaps a better implementation would avoid mutating the domain, and keep values in a separate structure.

Error (including full traceback):

Here's the profiling data
stats.zip

You can parse it using :

p = pstats.Stats('stats')
p.sort_stats('cumulative').print_stats(100)

Here's the output

>>> p.sort_stats('cumulative').print_stats(100)
Thu Jun 18 13:44:27 2020    stats

         7606471166 function calls (6713924733 primitive calls) in 2791.514 seconds

   Ordered by: cumulative time
   List reduced from 1374 to 100 due to restriction <100>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        4    0.005    0.001 5542.777 1385.694 train.py:23(train)
        1    0.000    0.000 2791.515 2791.515 {built-in method builtins.exec}
        1    0.000    0.000 2791.515 2791.515 <string>:1(<module>)
        1    0.019    0.019 2791.514 2791.514 {method 'run_until_complete' of 'uvloop.loop.Loop' objects}
       10    0.031    0.003 2770.536  277.054 train.py:55(train_async)
        9    0.001    0.000 2770.418  307.824 train.py:118(_train_async_internal)
        4    0.004    0.001 2752.328  688.082 train.py:205(_do_training)
        4    1.061    0.265 2752.324  688.081 train.py:332(_train_core_with_validated_data)
        3    0.036    0.012 2655.368  885.123 agent.py:615(load_data)
        3   36.174   12.058 2655.332  885.111 __init__.py:35(load_data)
        2    9.935    4.968 2619.143 1309.572 generator.py:195(generate)
     1480    2.050    0.001 2600.574    1.757 generator.py:502(_process_step)
   192965    5.976    0.000 2298.746    0.012 generator.py:84(copy)
   196695    0.539    0.000 2194.473    0.011 generator.py:74(init_copy)
   196697    0.456    0.000 2193.916    0.011 generator.py:46(__init__)
   196697    1.112    0.000 2193.460    0.011 trackers.py:114(__init__)
   196697   52.468    0.000 2166.356    0.011 trackers.py:135(<dictcomp>)
963628955/71794526  918.577    0.000 2114.312    0.000 copy.py:132(deepcopy)
71811240/71802852  174.158    0.000 1740.057    0.000 copy.py:268(_reconstruct)
71823031/71803382  262.444    0.000 1179.439    0.000 copy.py:236(_deepcopy_dict)
 18453936   12.414    0.000  493.042    0.000 generator.py:113(update)
  1161527    6.041    0.000  331.654    0.000 generator.py:106(_append_current_state)
  1163393    2.001    0.000  325.754    0.000 domain.py:694(get_active_states)
  1163393  207.800    0.000  319.673    0.000 domain.py:621(get_parsing_states)
2044613317  188.622    0.000  188.622    0.000 {method 'get' of 'dict' objects}
146792854  130.723    0.000  160.374    0.000 copy.py:252(_keep_alive)
 18453936   14.887    0.000  129.547    0.000 trackers.py:465(update)
143622366   30.846    0.000  119.947    0.000 copy.py:273(<genexpr>)
1329017704  104.614    0.000  104.614    0.000 {built-in method builtins.id}
   349158    0.485    0.000   88.073    0.000 __init__.py:828(apply_to)
        2    0.012    0.006   85.022   42.511 agent.py:660(train)
        2    0.009    0.005   85.001   42.500 ensemble.py:116(train)
        2    0.267    0.133   74.798   37.399 ensemble.py:67(_training_events_from_trackers)
816836101   69.253    0.000   69.253    0.000 copy.py:190(_deepcopy_atomic)
   545855    1.613    0.000   67.244    0.000 trackers.py:554(_reset)
   545855   34.013    0.000   60.251    0.000 trackers.py:565(_reset_slots)
 74321166   54.157    0.000   54.157    0.000 {method 'items' of 'dict' objects}
 16287502   37.387    0.000   48.399    0.000 slots.py:218(as_feature)
   349158    4.713    0.000   46.085    0.000 trackers.py:412(replay_events)
 71811240   37.090    0.000   37.090    0.000 {method '__reduce_ex__' of 'object' objects}
 71802852   26.244    0.000   36.611    0.000 copyreg.py:87(__newobj__)
227780667/227780666   31.961    0.000   32.024    0.000 {built-in method builtins.isinstance}
 19106920   11.743    0.000   27.440    0.000 __init__.py:1041(apply_to)
199237075   26.131    0.000   26.131    0.000 slots.py:65(reset)
145847684   25.925    0.000   25.925    0.000 {built-in method builtins.getattr}
3158640/3148225    7.254    0.000   25.041    0.000 copy.py:210(_deepcopy_list)
371122367   24.844    0.000   24.844    0.000 slots.py:187(as_feature)
 74180950   24.241    0.000   24.241    0.000 {method 'update' of 'dict' objects}
 13960716   20.176    0.000   22.337    0.000 slots.py:128(as_feature)
   352752   11.608    0.000   21.672    0.000 trackers.py:382(applied_events)
        6    0.000    0.000   20.144    3.357 rasa.py:63(get_domain)
        6    0.000    0.000   20.135    3.356 domain.py:88(load)
        6    0.000    0.000   20.014    3.336 domain.py:105(from_path)
        6    0.000    0.000   20.013    3.336 domain.py:121(from_file)
        6    0.001    0.000   20.012    3.335 domain.py:125(from_yaml)
       14    0.000    0.000   19.788    1.413 io.py:102(read_yaml)
       20    0.000    0.000   19.768    0.988 constructor.py:108(get_single_data)
       14    0.006    0.000   19.673    1.405 main.py:328(load)
       20    0.000    0.000   19.637    0.982 composer.py:70(get_single_node)
       20    0.000    0.000   19.632    0.982 composer.py:95(compose_document)
 24978/20    0.093    0.000   19.632    0.982 composer.py:109(compose_node)
  4990/20    0.046    0.000   19.630    0.982 composer.py:194(compose_mapping_node)
   297943    0.130    0.000   19.157    0.000 generator.py:55(past_states)
     1866    0.007    0.000   19.028    0.010 trackers.py:191(past_states)
     1866    0.049    0.000   19.017    0.010 domain.py:701(states_for_tracker_history)
     1866    0.005    0.000   18.968    0.010 domain.py:706(<listcomp>)
     3732    0.010    0.000   18.443    0.005 trackers.py:308(generate_all_prior_trackers)
 72024127   17.952    0.000   17.952    0.000 {method 'match' of 're.Pattern' objects}
       63    0.044    0.001   17.638    0.280 tarfile.py:234(copyfileobj)
        2    0.000    0.000   16.988    8.494 model.py:442(package_model)
        2    0.001    0.001   16.988    8.494 model.py:234(create_package_rasa)
     52/6    0.002    0.000   16.913    2.819 tarfile.py:1905(add)
       52    0.001    0.000   16.809    0.323 tarfile.py:1956(addfile)
    14444    0.108    0.000   16.623    0.001 gzip.py:247(write)
    14442   16.046    0.001   16.046    0.001 {method 'compress' of 'zlib.Compress' objects}
    19586    0.059    0.000   15.703    0.001 composer.py:142(compose_scalar_node)
109841420   15.565    0.000   15.565    0.000 {method 'append' of 'list' objects}
    24978    4.685    0.000   15.553    0.001 resolver.py:357(resolve)
 19106920    9.615    0.000   12.929    0.000 trackers.py:218(set_latest_action_name)
 73924490   11.684    0.000   11.684    0.000 {built-in method builtins.hasattr}
        2    0.001    0.001   10.860    5.430 agent.py:782(persist)
        2    0.001    0.000   10.792    5.396 domain.py:785(persist)
        2    0.000    0.000   10.785    5.392 utils.py:204(dump_obj_as_yaml_to_file)
        2    0.000    0.000   10.785    5.392 io.py:228(write_yaml_file)
        2    0.000    0.000   10.784    5.392 main.py:1208(dump)
        2    0.002    0.001   10.784    5.392 main.py:1136(dump_all)
        2    0.000    0.000   10.774    5.387 representer.py:83(represent)
        2    0.001    0.001   10.734    5.367 serializer.py:102(serialize)
   7292/2    0.058    0.000   10.723    5.362 serializer.py:155(serialize_node)
      150    0.000    0.000   10.504    0.070 rasa.py:42(get_stories)
 71804979   10.368    0.000   10.368    0.000 {built-in method __new__ of type object at 0x104f64610}
    13602    3.067    0.000   10.210    0.001 resolver.py:271(resolve)
        4    0.007    0.002   10.188    2.547 memoization.py:145(train)
        6    0.002    0.000    9.936    1.656 validation.py:15(validate_yaml_schema)
        4    0.244    0.061    8.935    2.234 featurizers.py:594(training_states_and_actions)
    44196    0.228    0.000    8.889    0.000 std.py:1383(set_postfix)
 72004254    8.827    0.000    8.827    0.000 {built-in method builtins.issubclass}
    45501    0.104    0.000    8.782    0.000 std.py:1314(refresh)
    45515    0.091    0.000    8.363    0.000 std.py:1444(display)
 10440662    3.927    0.000    7.623    0.000 __init__.py:1160(apply_to)

We suspect this could be the issue

self.slots = {slot.name: copy.deepcopy(slot) for slot in slots}

Additionally, this fails on Circle CI, probably due to lack of memory. Note that this is using a docker executor with 16GB of memory, and waiting 30 minutes for training. We are not yet sure of the exact cause of this failure, but what we know is that removing a dozen unfeaturized slots for the domain trains in about 15 minutes without issues.

Too long with no output (exceeded 30m0s): context deadline exceeded 

Command or request that led to error:

rasa train -d data/domain.yml --out models -c config.yml

Content of configuration file (config.yml) (if relevant):

config.yml.zip

Content of domain file (domain.yml) (if relevant):
Cannot attach due to IP, but contains about 400 hundred slots, and only a handful are featurized ( but that's not the issue here). We did reproduce this in the past with only unfeaturized slots, the main variable is the number of slots.

@edouardlp edouardlp added area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. labels Jun 18, 2020
@sara-tagger
Copy link
Collaborator

Thanks for the issue, @tabergma will get back to you about it soon!

You may find help in the docs and the forum, too 🤗

@ricwo
Copy link
Contributor

ricwo commented Jun 19, 2020

@edouardlp thanks a lot for raising this - and especially for the detailed profiling. excessive deepcopying in preparing the trackers for training might not be necessary. we'll look into this 👍

@Shandilya21
Copy link

Hi, @ricwo @edouardlp I think we can avoid this issue by customizing the copying behavior through implementation of the memo dictionary of an object already copied during current copying pass. The memo dictionary is used to keep track of the values that have been copied already, so as to avoid infinite recursion and hence avoids to fails in machines with low memory.

 def __deepcopy__(self, memo):
    print('__deepcopy__({})'.format(memo))
    return MyClass(copy.deepcopy(self.name, memo))`

Thanks,
Your feedback would be extremely welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

No branches or pull requests

6 participants