Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support training on big datasets #8595

Closed
2 tasks
twerkmeister opened this issue May 3, 2021 · 2 comments
Closed
2 tasks

Support training on big datasets #8595

twerkmeister opened this issue May 3, 2021 · 2 comments
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml 👁 All issues related to machine learning type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR

Comments

@twerkmeister
Copy link
Contributor

twerkmeister commented May 3, 2021

Description of Problem

  • With big datasets we run into memory and computation time issues
    • multiwoz: 113,000 conversation turns, 8500 e2e stories
    • ubuntu: 470,000 conversation turns, 75000 e2e stories
    • advising: 269,000 conversation turns, 24500 e2e stories

Status as of May 2021

There have been several attempts to tackle this issue:

These approaches haven't been finished yet, or haven't directly addressed the biggest bottleneck right now, which seems to be core training. According to this forum post 400 stories lead to out of memory for 12GB (GPU?) memory. Also, 400-500 stories seems to have been max for research so far, but potentially that wasn't with maxed out memory (slack thread). Part of #8593 is figuring out how much memory is consumed where in the core e2e training. This will help us decide which parts of the existing proposals to use to tackle this issue.

Product view of current status

Status as of July 2021

Definition of Done: End2end training for the following corpora can be run on one of our cloud gpu servers:

  • [x] multiwoz
  • Ubuntu
  • advising
@twerkmeister twerkmeister added type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml 👁 All issues related to machine learning labels May 3, 2021
@TyDunn
Copy link
Contributor

TyDunn commented Jun 7, 2021

Exalate commented:

TyDunn commented:

We are also working on #8734 as part of this as well

@rasabot-exalate rasabot-exalate added area:rasa-oss and removed type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml 👁 All issues related to machine learning labels Mar 15, 2022 — with Exalate Issue Sync
@m-vdb m-vdb added area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml 👁 All issues related to machine learning and removed area:rasa-oss labels Mar 16, 2022
@rasabot rasabot added type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR and removed type:enhancement labels Mar 16, 2022
@rasabot-exalate rasabot-exalate added area:rasa-oss and removed type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml 👁 All issues related to machine learning area:rasa-oss labels Mar 17, 2022 — with Exalate Issue Sync
@rasabot-exalate rasabot-exalate added type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR area:rasa-oss 🎡 Anything related to the open source Rasa framework type:enhancement_:sparkles: area:rasa-oss/ml 👁 All issues related to machine learning and removed type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR area:rasa-oss :ferris wheel: labels Mar 17, 2022 — with Exalate Issue Sync
@sync-by-unito
Copy link

sync-by-unito bot commented Dec 19, 2022

➤ Maxime Verger commented:

💡 Heads up! We're moving issues to Jira: https://rasa-open-source.atlassian.net/browse/OSS.

From now on, this Jira board is the place where you can browse (without an account) and create issues (you'll need a free Jira account for that). This GitHub issue has already been migrated to Jira and will be closed on January 9th, 2023. Do not forget to subscribe to the corresponding Jira issue!

➡️ More information in the forum: https://forum.rasa.com/t/migration-of-rasa-oss-issues-to-jira/56569.

@m-vdb m-vdb closed this as completed Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml 👁 All issues related to machine learning type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR
Projects
None yet
Development

No branches or pull requests

5 participants