Incremental Training inside Rasa Open Source #7498

dakshvar22 · 2020-12-09T12:22:45Z

Proposed changes:

Closes Incremental training #6971
Adds the ability to load NLU components and Core policies in finetune mode which means they are initialized with a previously trained model and can be further trained on a mix of old and new training data.

Individual PRs:

Changes to CLI - Incremental training #6971
Applying constraints for finetuning - 7330 Add functionality to check if a model is fine-tunable #7427
- Configuration of base model is exactly the same as configuration for finetuning except the number of epochs.
- No new labels(intents, actions, entities and intent_response_keys) are added to the training data. New training examples for each of these can still be added.
- Base model is trained with MINIMUM_COMPATIBLE_VERSION of currently installed Rasa Open Source version.
Load NLU(#7329 load models in finetune mode nlu #7456) and Core(#7329 load models in finetune mode core #7458) pipelines in finetune mode.
Loading NLU components and Core policies in finetune mode - Get ML components ready for incremental training #7419 This PR includes running model regression tests to verify that performance of current models are not hurt by the extra featurization we do.
Docs (Docs for incremental training #7469)

Status (please check what you already did):

added some tests for the functionality
updated the documentation
updated the changelog (please check changelog for instructions)
reformat files using black (please check Readme for instructions)

Co-authored-by: Joe Juzl <[email protected]>

describe `--epoch-fraction` usage

dakshvar22 · 2020-12-11T11:37:28Z

Okay, so this turned out to be a large PR as well but all the smaller PRs were peer reviewed between @joejuzl @wochinge and @dakshvar22.

@alwx Can you please review the following files - rasa/cli/..., rasa/core/agent.py, rasa/core/train.py, rasa/core/policies/ensemble.py, rasa/model.py, rasa/nlu/model.py, rasa/nlu/train.py, rasa/telemetry.py, rasa/train.py, tests/cli/test_rasa_train.py, tests/conftest.py, tests/core/conftest.py, tests/core/test_agent.py, tests/core/test_ensemble.py, tests/test_train.py, tests/test_model.py ? These are the engineering bits of the PR.

@Ghostvv Would be great if you could glance over - rasa/core/policies/..., rasa/nlu/classifiers/..., rasa/nlu/featurizers/.. , rasa/nlu/constants.py, rasa/nlu/selectors.py, rasa/shared/nlu, rasa/utils/tensorflow/models.py, tests/core/policies, tests/core/test_policies.py, tests/nlu/classifiers/test_diet_classifier.py, tests/nlu/featurizers/..., tests/nlu/selectors/, tests/shared/nlu/training_data/test_training_data.py. Most of these were reviewd by Tanja as part of this PR - #7419)

…tinuous_training

* Add migration guide for policies * spelling fix * changelog

Ghostvv

looks good from my side. left a couple of comments

changelog/6971.feature.md

docs/docs/components.mdx

rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurizer.py

rasa/utils/tensorflow/models.py

tests/core/policies/test_rule_policy.py

alwx · 2020-12-14T08:58:55Z

rasa/cli/arguments/train.py

+def add_force_param(
+    parser: Union[argparse.ArgumentParser, argparse._ActionsContainer]
+) -> None:
+    """Specifies if the model should be trained from scratch."""


Args section is missing in the docstring but that's something CI would probably complain about.

It's missing on purpose because I don't see any value in having full docstrings for these helper functions. The CI actually allows one-line docstrings 😁 What do you think?

Ghostvv

the files that I was assigned to look good

rasa/nlu/featurizers/sparse_featurizer/regex_featurizer.py

alwx · 2020-12-14T11:31:56Z

tests/cli/test_rasa_train.py

@@ -419,7 +422,8 @@ def test_train_core_help(run: Callable[..., RunResult]):
                       [--augmentation AUGMENTATION] [--debug-plots] [--force]
                       [--fixed-model-name FIXED_MODEL_NAME]
                       [--percentages [PERCENTAGES [PERCENTAGES ...]]]
-                       [--runs RUNS]"""
+                       [--runs RUNS] [--finetune [FINETUNE]]
+                       [--epoch-fraction EPOCH_FRACTION]"""


There are no tests for the command itself, right? I think it makes sense to add them.
(something with run_in_simple_project("train", "--finetune"))

I've done a basic cli test with the command (which just tests the arg is actually passed down). And I've changed test_model_finetuning to not mock the train methods and actually run the second training.

* add the marking * add missing import Co-authored-by: Tobias Wochinger <[email protected]>

…tinuous_training

dakshvar22 · 2020-12-14T16:42:15Z

A final run of model regression tests before merge is running here

joejuzl and others added 30 commits December 1, 2020 16:37

Add functionality to check if a model is fine-tunable

60e0ac1

add params for finetuning

14093aa

simplify temporary directory creation

f16e689

fix usage of deprecated asyncio.coroutine

334c607

load potential model for finetuning

71ffac4

fix types

2b4b6a9

pass in Agent / Interpreter to finetune

687ff5b

add docstrings

b76d8ea

use faster model instead of moodbot model

2e245a9

improve performance for getting model to finetune

5cfe28e

load model from directory and polish

bff637b

move test module to correct location

25b6190

test edge cases of get_models_for_finetuning

0ad6739

add docstrings

9d51715

undo not necessary changes

3f69d45

improve phrasing

08fa050

use absolute import

7527342

add documentation

b5f2037

add telemetry

2dff126

Also return correctly loaded agent

3860eba

fix typos / phrasing

7b33f58

Co-authored-by: Joe Juzl <[email protected]>

describe `

6294644

describe `--epoch-fraction` usage

use True instead of weird string

6415072

simplify by using helper to mock async things

ab250f7

de-duplicate tests

32d739c

refactor model loading

dfebe10

move debug message to correct location

b37bde8

debug CI

70094e9

remove unused param

b859861

unpack tuples for older python versions

597225d

dakshvar22 assigned wochinge and joejuzl Dec 11, 2020

dakshvar22 added this to the 2.2 Rasa Open Source milestone Dec 11, 2020

dakshvar22 requested review from alwx and Ghostvv December 11, 2020 11:37

dakshvar22 and others added 5 commits December 11, 2020 12:50

fix regex test

df926ec

Fix min version test

e18640d

Merge branch 'continuous_training' of github.com:RasaHQ/rasa into con…

a78c71c

…tinuous_training

fix regex tests

4a0a642

Add migration guide for policies (#7522)

903d25a

* Add migration guide for policies * spelling fix * changelog

Ghostvv reviewed Dec 11, 2020

View reviewed changes

review comments

5a1d75e

dakshvar22 requested a review from Ghostvv December 13, 2020 20:57

alwx reviewed Dec 14, 2020

View reviewed changes

Ghostvv approved these changes Dec 14, 2020

View reviewed changes

alwx reviewed Dec 14, 2020

View reviewed changes

rasa/nlu/featurizers/sparse_featurizer/regex_featurizer.py Outdated Show resolved Hide resolved

alwx suggested changes Dec 14, 2020

View reviewed changes

dakshvar22 and others added 5 commits December 14, 2020 14:25

add kwarg

5ee6a58

Merge remote-tracking branch 'origin/master' into continuous_training

977de59

Mark incremental training experimental (#7543)

235b5f9

* add the marking * add missing import Co-authored-by: Tobias Wochinger <[email protected]>

Add basic cli test, and stop train test from mocking train methods

3ce26e5

Merge branch 'continuous_training' of github.com:RasaHQ/rasa into con…

45f7171

…tinuous_training

increase timeout for fintuning tests

08726d0

dakshvar22 requested a review from alwx December 14, 2020 18:05

Merge branch 'master' into continuous_training

bfedf85

alwx approved these changes Dec 15, 2020

View reviewed changes

dakshvar22 merged commit 9f94cbc into master Dec 15, 2020

dakshvar22 deleted the continuous_training branch December 15, 2020 08:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental Training inside Rasa Open Source #7498

Incremental Training inside Rasa Open Source #7498

dakshvar22 commented Dec 9, 2020 •

edited

Loading

dakshvar22 commented Dec 11, 2020 •

edited

Loading

Ghostvv left a comment

alwx Dec 14, 2020 •

edited

Loading

wochinge Dec 14, 2020

Ghostvv left a comment

alwx Dec 14, 2020 •

edited

Loading

joejuzl Dec 14, 2020

dakshvar22 commented Dec 14, 2020

Incremental Training inside Rasa Open Source #7498

Incremental Training inside Rasa Open Source #7498

Conversation

dakshvar22 commented Dec 9, 2020 • edited Loading

dakshvar22 commented Dec 11, 2020 • edited Loading

Ghostvv left a comment

Choose a reason for hiding this comment

alwx Dec 14, 2020 • edited Loading

Choose a reason for hiding this comment

wochinge Dec 14, 2020

Choose a reason for hiding this comment

Ghostvv left a comment

Choose a reason for hiding this comment

alwx Dec 14, 2020 • edited Loading

Choose a reason for hiding this comment

joejuzl Dec 14, 2020

Choose a reason for hiding this comment

dakshvar22 commented Dec 14, 2020

dakshvar22 commented Dec 9, 2020 •

edited

Loading

dakshvar22 commented Dec 11, 2020 •

edited

Loading

alwx Dec 14, 2020 •

edited

Loading

alwx Dec 14, 2020 •

edited

Loading