Support training on big datasets #8595

twerkmeister · 2021-05-03T14:36:40Z

Description of Problem

With big datasets we run into memory and computation time issues
- multiwoz: 113,000 conversation turns, 8500 e2e stories
- ubuntu: 470,000 conversation turns, 75000 e2e stories
- advising: 269,000 conversation turns, 24500 e2e stories

Status as of May 2021

There have been several attempts to tackle this issue:

training on chunks for NLU Partial dataset loading for NLU #8306 (ideas doc)
streamed training for NLU proof of concept for streaming messages in nlu training #8518
training on chunks for core branch with the draft PR for it. Note: The branch currently has these issues and cannot be adopted as the solution before they are addressed.
there have been improvements in the recent Rasa 3.0 prototypes in how tracker stores are featurized (benchmarking still outstanding Architecture Prototype: profile memory for new tracker featurization #8593 )

These approaches haven't been finished yet, or haven't directly addressed the biggest bottleneck right now, which seems to be core training. According to this forum post 400 stories lead to out of memory for 12GB (GPU?) memory. Also, 400-500 stories seems to have been max for research so far, but potentially that wasn't with maxed out memory (slack thread). Part of #8593 is figuring out how much memory is consumed where in the core e2e training. This will help us decide which parts of the existing proposals to use to tackle this issue.

Product view of current status

Goal: Support training on big datasets Support training on big datasets #8595
There are three problems that we need to work on to make this happen:
- GPU memory issues (GPU memory problems #8734)
- CPU memory issues (E2E large dataset featurization memory and cpu improvements #8803)
- Slow preprocessing (E2E large dataset featurization memory and cpu improvements #8803)

Status as of July 2021

prototypes to address main memory, gpu memory, and processing time issues have been combined and tested in 8595-combined-e2e-fixes
Issues for turning the fixes into production code:
Dataset related issues
- investigate advising e2e dataset problem #9023
- fix ubuntu e2e dataset problem involving special characters #9024
other
- Investigate polynomial epoch time increases in e2e #9025

Definition of Done: End2end training for the following corpora can be run on one of our cloud gpu servers:

[x] multiwoz
Ubuntu
advising

TyDunn · 2021-06-07T14:41:11Z

Exalate commented:

TyDunn commented:

We are also working on #8734 as part of this as well

sync-by-unito · 2022-12-19T13:03:30Z

➤ Maxime Verger commented:

💡 Heads up! We're moving issues to Jira: https://rasa-open-source.atlassian.net/browse/OSS.

From now on, this Jira board is the place where you can browse (without an account) and create issues (you'll need a free Jira account for that). This GitHub issue has already been migrated to Jira and will be closed on January 9th, 2023. Do not forget to subscribe to the corresponding Jira issue!

➡️ More information in the forum: https://forum.rasa.com/t/migration-of-rasa-oss-issues-to-jira/56569.

twerkmeister added type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml 👁 All issues related to machine learning labels May 3, 2021

twerkmeister mentioned this issue Jun 3, 2021

E2E large dataset featurization memory and cpu improvements #8803

Closed

4 tasks

twerkmeister mentioned this issue Jul 5, 2021

Partial dataset loading #6836

Closed

TyDunn added the priority:normal label Jul 15, 2021

This was referenced Oct 13, 2021

Spike: Changes to Rasa Open Source / X needed to hit e2e ready for production goal #9865

Closed

Spike: Gap between current e2e implementation and where we need it for e2e ready for production goal #9866

Closed

m-vdb added area:rasa-oss 🎡 Anything related to the open source Rasa framework area:rasa-oss/ml 👁 All issues related to machine learning and removed area:rasa-oss labels Mar 16, 2022

rasabot added type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR and removed type:enhancement labels Mar 16, 2022

m-vdb removed the priority:normal label Dec 7, 2022

m-vdb closed this as completed Jan 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support training on big datasets #8595

Support training on big datasets #8595

twerkmeister commented May 3, 2021 •

edited by rasabot-exalate

Loading

TyDunn commented Jun 7, 2021 •

edited by rasabot-exalate

Loading

sync-by-unito bot commented Dec 19, 2022

Support training on big datasets #8595

Support training on big datasets #8595

Comments

twerkmeister commented May 3, 2021 • edited by rasabot-exalate Loading

Description of Problem

Status as of May 2021

Status as of July 2021

TyDunn commented Jun 7, 2021 • edited by rasabot-exalate Loading

sync-by-unito bot commented Dec 19, 2022

twerkmeister commented May 3, 2021 •

edited by rasabot-exalate

Loading

TyDunn commented Jun 7, 2021 •

edited by rasabot-exalate

Loading