Incremental training #6971

tabergma · 2020-10-08T14:14:24Z

Description of Problem:
Once a model is trained it cannot be updated. It is not possible to continue training the model on new data that came in. Instead the model needs to be retrained from scratch, which takes up a lot of time.

Overview of the Solution:
It should be possible to load a model from a previous checkpoint and continue training with new data added.

evgeniiaraz · 2020-10-21T20:19:00Z

@tabergma how urgent is this one?

tabergma · 2020-10-22T06:18:10Z

@evgeniiaraz We want to tackle this issue this quarter. @dakshvar22 is leading this topic. Why do you ask?

evgeniiaraz · 2020-10-22T07:43:09Z

@tabergma I wanted to work on it not to lose shape :) but if it is urgent, I'll pick something non-essential

dakshvar22 · 2020-11-07T13:58:28Z

Based on the discussion in the document, here are more fine-grained implementation tasks that are needed -

Changes to CLI and rasa/train.py

Add a parameter to rasa train called finetune_previous_model which starts training in finetuning mode.
Add a parameter to rasa train called finetune_model_path which lets you specify the path to a previous model which should be finetuned.
rasa.train_async_internal should be refactored to check if training should proceed in finetuning mode(if finetune_previous_model is set to True). If yes, then it should check if it is possible to do so.(Check the doc for constraints for finetuning to be possible).
rasa.nlu.train should be refactored to create the Trainer object in fine-tune mode which means each component should be loaded with the model to be finetuned from. This will involve building the pipeline in a way similar to how the pipeline is built during inference, i.e. when rasa shell is run or rasa test is run.
rasa.core.train should also be refactored similarly as above.
Add telemetry event.

Changes to ML components

CountVectorsFeaturizer(CVF)

Add a parameter max_additional_vocabulary_size which lets users specify the additional buffer size that CVF should keep to accommodate new vocabulary tokens during fine-tuning.
_train_with_independent_vocab should be refactored to construct the vocabulary with the additional buffer specified above. Things to keep in mind here -

When a new training cycle is triggered, the ordering of existing vocabulary tokens should not be changed and the new vocabulary tokens should only occupy the empty slots in the vocabulary.
If the vocabulary size of CVF is exhausted, we should continue training, but warn the user that the vocabulary is exhausted and treat the new tokens which overflow as OOV tokens. At this point, the user should also be informed about the total vocabulary size of their dataset and should be prompted to retrain with full vocabulary.

DIETClassifier, ResponseSelector and TEDPolicy

load() should be refactored to load the models with weights in training mode and not in prediction mode. Currently, _load_model() builds the TF graph in predict mode which should be changed if the classifier is being loaded for finetuning. So instead of calling _get_tf_call_model_function(), _get_tf_train_functions() should be reused to build the graph for training.
Make sure the signature of RasaModelData in finetune mode is the same as what is constructed during training from scratch.

dakshvar22 · 2020-11-09T09:51:01Z

A working version(very draft) of the above steps is implemented on this branch. From early observations, what needs to be improved/additionally done to make this mergeable as a feature -

Ability to specify a model path to fine-tune from in the CLI.
Implement checks here to see if the previous model is compatible to be fine-tuned with the current configuration specified, for e.g., all parameters for the two configurations should be the same except number of epochs for training.
The above working version loads up the pipeline in fine-tune mode only for the NLU pipeline. Still needs to be done for the Core pipeline. The refactoring needed inside TEDPolicy is straightforward and identical to what is done for DIETClassifier. Loading up the instance of Agent class with the old model in fine-tune mode is what needs to be implemented.
While loading the NLU pipeline, currently the config of the loaded model is passed to the components, which means if I change the number of epochs in my new configuration, it is not used by the component. Will need to refactor that.
Make sure fine-tuning is possible for rasa train nlu and rasa train core as well. Currently it works for rasa train.

Of course docs, code quality and tests also need to be added.

wochinge · 2020-11-12T16:07:40Z

Next steps based on the call with @dakshvar22 @joejuzl

Create engineering issues from this (should be around 2-3 issues 🤔 )
Get started with the engineering issues in the week of November 23rd

Other things to keep in mind:

Can we branch off master or does it make more sense to branch off the e2e branch?

dakshvar22 · 2020-11-13T21:26:33Z

I ran some initial experiments using the working version on this branch -

Setup

Data: Financial Bot NLU data split into 80:20 train test split. The train split is further divided into 2 sets - split 80:20. The first set is used for training an initial model from scratch. The second set is used for finetuning the first model trained. Consider the second set as new annotations that a user added to their training data.

Size of Set 1: 233
Size of Set 2: 59
Size of held-out test set: 73

Training: We train the first model from scratch for 100 epochs. Then add the second set to the training data and further train the first model for 30 more epochs.

Config

Note: Finetuning is done by mixing the new data with the old data and then training on batches from the combined data.

Results:

Initial Model	Training data	Number of epochs	Intent F1(held out test set)	Entity F1(held out test set)	Time for training
Randomly initialized	Set 1	100	0.753	0.9	48s
Model trained on set 1	Set 1 + Set 2	30	0.861	0.927	16s
Randomly initialized	Set 1 + Set 2	130	0.876	0.911	1 min 16s

dakshvar22 · 2020-11-15T21:15:41Z

Experiments on Sara data -

Size of Set 1: 3166
Size of Set 2: 792
Size of held-out test set: 990

Config

Note: additional_vocabulary_size was set to 1000 for char based CVF and 100 for word based CVF.

Results:

Initial Model	Training data	Number of epochs	Intent F1	Entity F1	Response F1	Time for training
Randomly initialized	Set 1	40	0.789	0.832	0.927	4m 10s
Model trained on set 1	Set 1 + Set 2	10	0.823	0.861	0.935	1m 39s
Randomly initialized	Set 1 + Set 2	50	0.818	0.854	0.938	6m 2s

wochinge · 2020-11-16T08:35:13Z

@dakshvar22 Do I understand it correctly that the incremental training is in total faster than training everything at once? This seems somewhat counterintuitive for me as I'd expect overhead from loading training data / pipelines etc.

dakshvar22 · 2020-11-16T09:40:31Z

@wochinge The time mentioned above are the time to train DIETClassifier alone and does not include the pipeline and training data loading time. We should measure that too, but it would be much smaller in comparison to the amount of time required to train DIETClassifier for an additional 40/70 epochs as shown in the examples above.

wochinge · 2020-11-16T10:11:04Z

Thanks for clarifying! Even if we measure the DIETClassier on its own - shouldn't the total time of the incremental timing be greater than when training everything in one go?

dakshvar22 · 2020-11-16T10:20:40Z

@wochinge The small overhead(11s) that you see when trained in one go is because of the increase in input feature vector size and hence bigger matrix multiplications. The first two experiments on Sara data have an input feature vector of size 11752(actual vocabulary size + buffer added). The third experiment has an input feature vector of size 12752(actual vocabulary size + buffer added). The additional 1000 dimensions are present because the model is trained from scratch and hence new buffer space is added in CountVectorsFeaturizer. I did run an additional experiment to validate this with additional_vocabulary_size set to 0 in CountVectorsFeaturizer and the training times were then comparable with a small stochastic overhead(+-2 secs) either side. Does that help clarify?

wochinge · 2020-11-17T09:26:34Z

Thanks a lot for digging into and clarifying this! 🙌

wochinge · 2020-11-20T17:54:15Z

I had a short look on the e2e branch and at least for the engineering changes we don't need to branch off e2e. However, DietClassifier has huge changes @dakshvar22 so you want to probably branch off from e2e for your changes, what do you think?

dakshvar22 · 2020-11-23T13:48:33Z

@wochinge The only change that we need for incremental training inside DIETClassifier is a change in the load method which isn't touched on e2e. So, we should be fine branching off master. Would like to decouple it from e2e as much as possible.

dakshvar22 · 2020-11-30T17:29:22Z

@wochinge @joejuzl Created a shared branch named continuous_training for us to merge our respective PRs into.

wochinge · 2020-12-02T11:25:31Z

@dakshvar22 cc @joejuzl Can we finetune a core model when NLU was finetuned previously? Or do we have to train Core from scratch as the featurization of messages will change?

dakshvar22 · 2020-12-02T11:28:45Z

Not sure if I understand the case completely. Do you mean that rasa train nlu finetune was run and then rasa train core finetune was run?

wochinge · 2020-12-02T12:10:01Z

We run rasa train --finetune
NLU model is finetuned
Do we now finetune the core model or do we train it from scratch?

dakshvar22 · 2020-12-02T12:18:36Z

Ohh, we can finetune the core model as long as we are inside our current constraints, i.e. no change to labels(intents, actions, slots, entities, etc.). Why do you think we would need to train it from scratch?

tabergma added type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Oct 8, 2020

tabergma assigned dakshvar22 Nov 2, 2020

wochinge self-assigned this Nov 13, 2020

This was referenced Nov 20, 2020

Implement Command-Line interface for finetuning models #7328

Closed

Load models in finetune mode based on command line parameters. #7329

Closed

Check if model is fine-tunable #7330

Closed

dakshvar22 mentioned this issue Nov 30, 2020

Refactor CountVectorizer and ML components to support incremental training #7413

Closed

3 tasks

dakshvar22 assigned joejuzl Nov 30, 2020

joejuzl added this to the 2.2 Rasa Open Source milestone Dec 4, 2020

This was referenced Dec 7, 2020

Docs for incremental training #7469

Merged

Expose MINIMUM_COMPATIBLE_VERSION in rasa version's output #7477

Closed

Incremental Training inside Rasa Open Source #7498

Merged

dakshvar22 closed this as completed in #7498 Dec 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental training #6971

Incremental training #6971

tabergma commented Oct 8, 2020

evgeniiaraz commented Oct 21, 2020

tabergma commented Oct 22, 2020

evgeniiaraz commented Oct 22, 2020 •

edited

Loading

dakshvar22 commented Nov 7, 2020 •

edited

Loading

dakshvar22 commented Nov 9, 2020 •

edited

Loading

wochinge commented Nov 12, 2020

dakshvar22 commented Nov 13, 2020 •

edited

Loading

dakshvar22 commented Nov 15, 2020 •

edited

Loading

wochinge commented Nov 16, 2020

dakshvar22 commented Nov 16, 2020

wochinge commented Nov 16, 2020

dakshvar22 commented Nov 16, 2020

wochinge commented Nov 17, 2020

wochinge commented Nov 20, 2020

dakshvar22 commented Nov 23, 2020

dakshvar22 commented Nov 30, 2020

wochinge commented Dec 2, 2020

dakshvar22 commented Dec 2, 2020

wochinge commented Dec 2, 2020

dakshvar22 commented Dec 2, 2020

Incremental training #6971

Incremental training #6971

Comments

tabergma commented Oct 8, 2020

evgeniiaraz commented Oct 21, 2020

tabergma commented Oct 22, 2020

evgeniiaraz commented Oct 22, 2020 • edited Loading

dakshvar22 commented Nov 7, 2020 • edited Loading

dakshvar22 commented Nov 9, 2020 • edited Loading

wochinge commented Nov 12, 2020

dakshvar22 commented Nov 13, 2020 • edited Loading

dakshvar22 commented Nov 15, 2020 • edited Loading

wochinge commented Nov 16, 2020

dakshvar22 commented Nov 16, 2020

wochinge commented Nov 16, 2020

dakshvar22 commented Nov 16, 2020

wochinge commented Nov 17, 2020

wochinge commented Nov 20, 2020

dakshvar22 commented Nov 23, 2020

dakshvar22 commented Nov 30, 2020

wochinge commented Dec 2, 2020

dakshvar22 commented Dec 2, 2020

wochinge commented Dec 2, 2020

dakshvar22 commented Dec 2, 2020

evgeniiaraz commented Oct 22, 2020 •

edited

Loading

dakshvar22 commented Nov 7, 2020 •

edited

Loading

dakshvar22 commented Nov 9, 2020 •

edited

Loading

dakshvar22 commented Nov 13, 2020 •

edited

Loading

dakshvar22 commented Nov 15, 2020 •

edited

Loading