-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental training #6971
Comments
@tabergma how urgent is this one? |
@evgeniiaraz We want to tackle this issue this quarter. @dakshvar22 is leading this topic. Why do you ask? |
@tabergma I wanted to work on it not to lose shape :) but if it is urgent, I'll pick something non-essential |
Based on the discussion in the document, here are more fine-grained implementation tasks that are needed - Changes to CLI and
Changes to ML components CountVectorsFeaturizer(CVF)
DIETClassifier, ResponseSelector and TEDPolicy
|
A working version(very draft) of the above steps is implemented on this branch. From early observations, what needs to be improved/additionally done to make this mergeable as a feature -
Of course docs, code quality and tests also need to be added. |
Next steps based on the call with @dakshvar22 @joejuzl
Other things to keep in mind:
|
I ran some initial experiments using the working version on this branch - Setup Data: Financial Bot NLU data split into 80:20 train test split. The train split is further divided into 2 sets - split 80:20. The first set is used for training an initial model from scratch. The second set is used for finetuning the first model trained. Consider the second set as new annotations that a user added to their training data. Size of Set 1: 233 Training: We train the first model from scratch for 100 epochs. Then add the second set to the training data and further train the first model for 30 more epochs. Note: Finetuning is done by mixing the new data with the old data and then training on batches from the combined data. Results:
|
Experiments on Sara data - Size of Set 1: 3166 Note: Results:
|
@dakshvar22 Do I understand it correctly that the incremental training is in total faster than training everything at once? This seems somewhat counterintuitive for me as I'd expect overhead from loading training data / pipelines etc. |
@wochinge The time mentioned above are the time to train |
Thanks for clarifying! Even if we measure the |
@wochinge The small overhead(11s) that you see when trained in one go is because of the increase in input feature vector size and hence bigger matrix multiplications. The first two experiments on Sara data have an input feature vector of size 11752(actual vocabulary size + buffer added). The third experiment has an input feature vector of size 12752(actual vocabulary size + buffer added). The additional 1000 dimensions are present because the model is trained from scratch and hence new buffer space is added in |
Thanks a lot for digging into and clarifying this! 🙌 |
I had a short look on the |
@wochinge The only change that we need for incremental training inside |
@wochinge @joejuzl Created a shared branch named |
@dakshvar22 cc @joejuzl Can we finetune a core model when NLU was finetuned previously? Or do we have to train Core from scratch as the featurization of messages will change? |
Not sure if I understand the case completely. Do you mean that |
|
Ohh, we can finetune the core model as long as we are inside our current constraints, i.e. no change to labels(intents, actions, slots, entities, etc.). Why do you think we would need to train it from scratch? |
Description of Problem:
Once a model is trained it cannot be updated. It is not possible to continue training the model on new data that came in. Instead the model needs to be retrained from scratch, which takes up a lot of time.
Overview of the Solution:
It should be possible to load a model from a previous checkpoint and continue training with new data added.
The text was updated successfully, but these errors were encountered: