-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support training on big datasets #8595
Comments
Exalate commented: TyDunn commented: We are also working on #8734 as part of this as well |
➤ Maxime Verger commented: 💡 Heads up! We're moving issues to Jira: https://rasa-open-source.atlassian.net/browse/OSS. From now on, this Jira board is the place where you can browse (without an account) and create issues (you'll need a free Jira account for that). This GitHub issue has already been migrated to Jira and will be closed on January 9th, 2023. Do not forget to subscribe to the corresponding Jira issue! ➡️ More information in the forum: https://forum.rasa.com/t/migration-of-rasa-oss-issues-to-jira/56569. |
Description of Problem
Status as of May 2021
There have been several attempts to tackle this issue:
These approaches haven't been finished yet, or haven't directly addressed the biggest bottleneck right now, which seems to be core training. According to this forum post 400 stories lead to out of memory for 12GB (GPU?) memory. Also, 400-500 stories seems to have been max for research so far, but potentially that wasn't with maxed out memory (slack thread). Part of #8593 is figuring out how much memory is consumed where in the core e2e training. This will help us decide which parts of the existing proposals to use to tackle this issue.
Product view of current status
Status as of July 2021
Definition of Done: End2end training for the following corpora can be run on one of our cloud gpu servers:
The text was updated successfully, but these errors were encountered: