Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rasa 1.10.14 Memory Leakage #7690

Closed
2 tasks
OmarFarag95 opened this issue Jan 7, 2021 · 10 comments · Fixed by #8319
Closed
2 tasks

Rasa 1.10.14 Memory Leakage #7690

OmarFarag95 opened this issue Jan 7, 2021 · 10 comments · Fixed by #8319
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml/nlu-components Issues focused around rasa's NLU components area:rasa-oss/ml 👁 All issues related to machine learning effort:atom-squad/4 Label which is used by the Rasa Atom squad to do internal estimation of task sizes. type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@OmarFarag95
Copy link

OmarFarag95 commented Jan 7, 2021

Rasa version: 1.10.14

Rasa SDK version (if used & relevant):

Rasa X version (if used & relevant):

Python version: 3.6.9

Operating system (windows, osx, ...): linux

Issue:

  • I was training CRF model on about 120K sentences for entity recognition on Rasa 1.3.5 and things were going just fine. However, upon upgrading to Rasa 1.10.14; I am always getting a memory leakage (whereasa RAM tends to increase significantly until the process is killed).

  • I am using Google Colab with available RAM of 25GB

  • One important note, is that training DIET classifier on the same dataset size and Rasa 1.10.14 works just fine. so the issue is within the CRF itself.

Error (including full traceback):


Command or request that led to error:

rasa train nlu

Content of configuration file (config.yml) (if relevant):

  - name: "SpacyNLP"
  - name: "SpacyTokenizer"
  - name: "SpacyFeaturizer"
  - name: "CRFEntityExtractor"
    max_iterations: 100
    features: [
    ["pos", "pos2"],
    [
      "bias",
      "prefix5",
      "prefix2",
      "suffix5",
      "suffix3",
      "suffix2",
      "pos",
      "pos2",
      "digit",
    ],
    ["pos", "pos2"],
    ]
    L1_c: 0.01,
    L2_c: 0.05

Content of domain file (domain.yml) (if relevant):

Definition of Done

  • assert that problem still persists in last main state (consider using Vincent's spacy-3.0 branch)
  • find what's causing the memory leak and propose next steps
@OmarFarag95 OmarFarag95 added area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. labels Jan 7, 2021
@sara-tagger
Copy link
Collaborator

Thanks for the issue, @m-vdb will get back to you about it soon!

You may find help in the docs and the forum, too 🤗

@m-vdb
Copy link
Collaborator

m-vdb commented Jan 12, 2021

👋🏻 hey @OmarFarag95 thanks a lot for your bug report. Would it be a possibility for you to upgrade to:

  • either 1.10.20, which is the latest micro we currently have on the 1.10.x branch
  • or even better 2.2.4, which is backwards-compatible with 1.x
    ?

That would help us diagnose the issue and make it easier to come up with a fix (especially if you upgrade to 2.2.4). LMK if it's not on the table 😄

@OmarFarag95
Copy link
Author

OmarFarag95 commented Jan 12, 2021

Hi @m-vdb , Thank you for your reply.

I have tried to upgrade to 2.2.4. However, the bug is still persisting.

@m-vdb
Copy link
Collaborator

m-vdb commented Jan 12, 2021

OK, (sadly) good to know. I'll circle back with the team and find out the best way forward. Sorry about the inconvenience 😓

@OmarFarag95
Copy link
Author

Never mind!

I will be looking forward to your response 😸

@wochinge wochinge added area:rasa-oss/ml 👁 All issues related to machine learning area:rasa-oss/ml/nlu-components Issues focused around rasa's NLU components labels Jan 28, 2021
@wochinge wochinge added the effort:atom-squad/4 Label which is used by the Rasa Atom squad to do internal estimation of task sizes. label Mar 12, 2021
@twerkmeister
Copy link
Contributor

twerkmeister commented Mar 15, 2021

I just snooped around in the code for the crf entity extractor and had a quick look at the differences between v 1.3.5 and v 1.10.14.

Although it is not a change, in terms of memory generally a thing that caught my eye is all_possible_transitions=True in the crf function call. The doc for this feature reads:

    all_possible_transitions : bool, optional (default=False)
        Specify whether CRFsuite generates transition features that
        do not even occur in the training data (i.e., negative transition
        features). When True, CRFsuite generates transition features that
        associate all of possible label pairs. Suppose that the number
        of labels in the training data is L, this function will
        generate (L * L) transition features.
        This function is disabled by default.

One possible idea is, that something minor changed in sentence_to_features vs the new crf_tokens_to_features that added a few more features, but given this quadratic scale up and the size of the dataset in question, it could have an outsized effect. Just one hypothesis, could be wrong!

@twerkmeister
Copy link
Contributor

@OmarFarag95 would it be possible for you to provide a few more stats on the dataset you are using for context? I know you are using a sizable dataset of 120k sentences. How about (roughly)

  • the number of entity types?
  • and number of examples per entity type ?

@twerkmeister
Copy link
Contributor

Also, I am assuming that the system already crashes during the first iteration, is that right @OmarFarag95 ?

@OmarFarag95
Copy link
Author

@twerkmeister Thanks for your response.

  • The number of distinct entity types is 8
  • The number of examples per entity is nearly equal = 120k/8 = 15k
  • Yes, the training fails few seconds after starting the first iteration

@twerkmeister
Copy link
Contributor

Hey @OmarFarag95, much appreciated! This will make tracking down the problem easier 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml/nlu-components Issues focused around rasa's NLU components area:rasa-oss/ml 👁 All issues related to machine learning effort:atom-squad/4 Label which is used by the Rasa Atom squad to do internal estimation of task sizes. type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants