Merge pull request #7498 from RasaHQ/continuous_training

Incremental Training inside Rasa Open Source
RasaHQ · Dec 15, 2020 · 9f94cbc · 9f94cbc
2 parents 9bb9033 + bfedf85
commit 9f94cbc
Show file tree

Hide file tree

Showing 49 changed files with 2,958 additions and 398 deletions.
diff --git a/changelog/6971.feature.md b/changelog/6971.feature.md
@@ -0,0 +1,14 @@
+Incremental training of models in a pipeline is now supported.
+
+If you have added new NLU training examples or new stories/rules for  
+dialogue manager, you don't need to train the pipeline from scratch.
+Instead, you can initialize the pipeline with a previously trained model
+and continue finetuning the model on the complete dataset consisting of
+new training examples. To do so, use `rasa train --finetune`. For more
+detailed explanation of the command, check out the docs on [incremental  
+training](./command-line-interface.mdx#incremental-training).
+
+Added a configuration parameter `additional_vocabulary_size` to  
+[`CountVectorsFeaturizer`](./components.mdx#countvectorsfeaturizer)  
+and `number_additional_patterns` to [`RegexFeaturizer`](./components.mdx#regexfeaturizer).
+These parameters are useful to configure when using incremental training for your pipelines.
diff --git a/changelog/7458.removal.md b/changelog/7458.removal.md
@@ -0,0 +1,2 @@
+Interfaces for `Policy.__init__` and `Policy.load` have changed.
+See [migration guide](./migration-guide.mdx#rasa-21-to-rasa-22) for details.
diff --git a/docs/docs/command-line-interface.mdx b/docs/docs/command-line-interface.mdx
@@ -85,6 +85,59 @@ The following arguments can be used to configure the training process:
 ```text [rasa train --help]
 ```
 
+### Incremental training
+
+:::caution
+This feature is experimental.
+We introduce experimental features to get feedback from our community, so we encourage you to try it out!
+However, the functionality might be changed or removed in the future.
+If you have feedback (positive or negative) please share it with us on the [Rasa Forum](https://forum.rasa.com).
+
+:::
+
+In order to improve the performance of an assistant, it's helpful to practice [CDD](./conversation-driven-development.mdx)
+and add new training examples based on how your users have talked to your assistant. You can use `rasa train --finetune`
+to initialize the pipeline with an already trained model and further finetune it on the
+new training dataset that includes the additional training examples. This will help reduce the
+training time of the new model.
+
+By default, the command picks up the latest model in the `models/` directory. If you have a specific model
+which you want to improve, you may specify the path to this by
+running `rasa train --finetune <path to model to finetune>`. Finetuning a model usually
+requires fewer epochs to train machine learning components like `DIETClassifier`, `ResponseSelector` and `TEDPolicy` compared to training from scratch.
+Either use a model configuration for finetuning
+which defines fewer epochs than before or use the flag
+`--epoch-fraction`. `--epoch-fraction` will use a fraction of the epochs specified for each machine learning component
+in the model configuration file. For example, if `DIETClassifier` is configured to use 100 epochs,
+specifying `--epoch-fraction 0.5` will only use 50 epochs for finetuning.
+
+You can also finetune an NLU-only or dialogue management-only model by using
+`rasa train nlu --finetune` and `rasa train core --finetune` respectively.
+
+To be able to fine tune a model, the following conditions must be met:
+
+1. The configuration supplied should be exactly the same as the
+configuration used to train the model which is being finetuned.
+The only parameter that you can change is `epochs` for the individual machine learning components and policies.
+
+2. The set of labels(intents, actions, entities and slots) for which the base model is trained
+should be exactly the same as the ones present in the training data used for finetuning. This
+means that you cannot add new intent, action, entity or slot labels to your training data
+during incremental training. You can still add new training examples for each of the existing
+labels. If you have added/removed labels in the training data, the pipeline needs to be trained
+from scratch.
+
+3. The model to be finetuned is trained with `MINIMUM_COMPATIBLE_VERSION` of the currently installed rasa version.
+
+Checkout the docs for [`CountVectorsFeaturizer`](./components.mdx#countvectorsfeaturizer) and
+[`RegexFeaturizer`](./components.mdx#regexfeaturizer) to understand how to configure them appropriately for incremental training.
+
+:::note
+Finetuned models are expected to be on-par with performance of models trained from scratch. However,
+make sure to train your pipelines from scratch frequently to avoid running out of additional
+vocabulary slots for the models.
+:::
+
 ## rasa interactive
 
 You can [use Rasa X in local mode](https://rasa.com/docs/rasa-x) to do interactive learning in a UI,

diff --git a/docs/docs/components.mdx b/docs/docs/components.mdx
@@ -836,6 +836,29 @@ Note: The `feature-dimension` for sequence and sentence features does not have t
     "use_word_boundaries": True
   ```
 
+  **Configuring for incremental training**
+
+  To ensure that `sparse_features` are of fixed size during
+  [incremental training](./command-line-interface.mdx#incremental-training), the
+  component should be configured to account for additional patterns that may be
+  added to the training data in future. To do so, configure the `number_additional_patterns`
+  parameter while training the base model from scratch:
+
+  ```yaml-rasa {3}
+  pipeline:
+  - name: RegexFeaturizer
+    number_additional_patterns: 10
+  ```
+
+  If not configured by the user, the component will use twice the number of
+  patterns currently present in the training data (including lookup tables and regex patterns)
+  as the default value for `number_additional_patterns`.
+  This number is kept at a minimum of 10 in order to avoid running out of additional
+  slots for new patterns too frequently during incremental training.
+  Once the component runs out of additional pattern slots, the new patterns are dropped
+  and not considered during featurization. At this point, it is advisable
+  to retrain a new model from scratch.
+
 
 ### CountVectorsFeaturizer
 
@@ -960,58 +983,91 @@ Note: The `feature-dimension` for sequence and sentence features does not have t
     "use_shared_vocab": False
   ```
 
+  **Configuring for incremental training**
+
+  To ensure that `sparse_features` are of fixed size during
+  [incremental training](./command-line-interface.mdx#incremental-training), the
+  component should be configured to account for additional vocabulary tokens
+  that may be added as part of new training examples in the future.
+  To do so, configure the `additional_vocabulary_size` parameter while training the base model from scratch:
+
+  ```yaml-rasa {3-6}
+  pipeline:
+  - name: CountVectorsFeaturizer
+    additional_vocabulary_size:
+      text: 1000
+      response: 1000
+      action_text: 1000
+  ```
+
+  As in the above example, you can define additional vocabulary size for each of
+  `text` (user messages), `response` (bot responses used by `ResponseSelector`) and
+  `action_text` (bot responses not used by `ResponseSelector`). If you are building a shared
+  vocabulary (`use_shared_vocab=True`), you only need to define a value for the `text` attribute.
+  If any of the attribute is not configured by the user, the component takes half of the current
+  vocabulary size as the default value for the attribute's `additional_vocabulary_size`.
+  This number is kept at a minimum of 1000 in order to avoid running out of additional vocabulary
+  slots too frequently during incremental training. Once the component runs out of additional vocabulary slots,
+  the new vocabulary tokens are dropped and not considered during featurization. At this point,
+  it is advisable to retrain a new model from scratch.
+
+
 The above configuration parameters are the ones you should configure to fit your model to your data.
 However, additional parameters exist that can be adapted.
 
 <details><summary>More configurable parameters</summary>
 
 ```
-+-------------------+-------------------------+--------------------------------------------------------------+
-| Parameter         | Default Value           | Description                                                  |
-+===================+=========================+==============================================================+
-| use_shared_vocab  | False                   | If set to 'True' a common vocabulary is used for labels      |
-|                   |                         | and user message.                                            |
-+-------------------+-------------------------+--------------------------------------------------------------+
-| analyzer          | word                    | Whether the features should be made of word n-gram or        |
-|                   |                         | character n-grams. Option 'char_wb' creates character        |
-|                   |                         | n-grams only from text inside word boundaries;               |
-|                   |                         | n-grams at the edges of words are padded with space.         |
-|                   |                         | Valid values: 'word', 'char', 'char_wb'.                     |
-+-------------------+-------------------------+--------------------------------------------------------------+
-| strip_accents     | None                    | Remove accents during the pre-processing step.               |
-|                   |                         | Valid values: 'ascii', 'unicode', 'None'.                    |
-+-------------------+-------------------------+--------------------------------------------------------------+
-| stop_words        | None                    | A list of stop words to use.                                 |
-|                   |                         | Valid values: 'english' (uses an internal list of            |
-|                   |                         | English stop words), a list of custom stop words, or         |
-|                   |                         | 'None'.                                                      |
-+-------------------+-------------------------+--------------------------------------------------------------+
-| min_df            | 1                       | When building the vocabulary ignore terms that have a        |
-|                   |                         | document frequency strictly lower than the given threshold.  |
-+-------------------+-------------------------+--------------------------------------------------------------+
-| max_df            | 1                       | When building the vocabulary ignore terms that have a        |
-|                   |                         | document frequency strictly higher than the given threshold  |
-|                   |                         | (corpus-specific stop words).                                |
-+-------------------+-------------------------+--------------------------------------------------------------+
-| min_ngram         | 1                       | The lower boundary of the range of n-values for different    |
-|                   |                         | word n-grams or char n-grams to be extracted.                |
-+-------------------+-------------------------+--------------------------------------------------------------+
-| max_ngram         | 1                       | The upper boundary of the range of n-values for different    |
-|                   |                         | word n-grams or char n-grams to be extracted.                |
-+-------------------+-------------------------+--------------------------------------------------------------+
-| max_features      | None                    | If not 'None', build a vocabulary that only consider the top |
-|                   |                         | max_features ordered by term frequency across the corpus.    |
-+-------------------+-------------------------+--------------------------------------------------------------+
-| lowercase         | True                    | Convert all characters to lowercase before tokenizing.       |
-+-------------------+-------------------------+--------------------------------------------------------------+
-| OOV_token         | None                    | Keyword for unseen words.                                    |
-+-------------------+-------------------------+--------------------------------------------------------------+
-| OOV_words         | []                      | List of words to be treated as 'OOV_token' during training.  |
-+-------------------+-------------------------+--------------------------------------------------------------+
-| alias             | CountVectorFeaturizer   | Alias name of featurizer.                                    |
-+-------------------+-------------------------+--------------------------------------------------------------+
-| use_lemma         | True                    | Use the lemma of words for featurization.                    |
-+-------------------+-------------------------+--------------------------------------------------------------+
++---------------------------+-------------------------+--------------------------------------------------------------+
+| Parameter                 | Default Value           | Description                                                  |
++===========================+=========================+==============================================================+
+| use_shared_vocab          | False                   | If set to 'True' a common vocabulary is used for labels      |
+|                           |                         | and user message.                                            |
++---------------------------+-------------------------+--------------------------------------------------------------+
+| analyzer                  | word                    | Whether the features should be made of word n-gram or        |
+|                           |                         | character n-grams. Option 'char_wb' creates character        |
+|                           |                         | n-grams only from text inside word boundaries;               |
+|                           |                         | n-grams at the edges of words are padded with space.         |
+|                           |                         | Valid values: 'word', 'char', 'char_wb'.                     |
++---------------------------+-------------------------+--------------------------------------------------------------+
+| strip_accents             | None                    | Remove accents during the pre-processing step.               |
+|                           |                         | Valid values: 'ascii', 'unicode', 'None'.                    |
++---------------------------+-------------------------+--------------------------------------------------------------+
+| stop_words                | None                    | A list of stop words to use.                                 |
+|                           |                         | Valid values: 'english' (uses an internal list of            |
+|                           |                         | English stop words), a list of custom stop words, or         |
+|                           |                         | 'None'.                                                      |
++---------------------------+-------------------------+--------------------------------------------------------------+
+| min_df                    | 1                       | When building the vocabulary ignore terms that have a        |
+|                           |                         | document frequency strictly lower than the given threshold.  |
++---------------------------+-------------------------+--------------------------------------------------------------+
+| max_df                    | 1                       | When building the vocabulary ignore terms that have a        |
+|                           |                         | document frequency strictly higher than the given threshold  |
+|                           |                         | (corpus-specific stop words).                                |
++---------------------------+-------------------------+--------------------------------------------------------------+
+| min_ngram                 | 1                       | The lower boundary of the range of n-values for different    |
+|                           |                         | word n-grams or char n-grams to be extracted.                |
++---------------------------+-------------------------+--------------------------------------------------------------+
+| max_ngram                 | 1                       | The upper boundary of the range of n-values for different    |
+|                           |                         | word n-grams or char n-grams to be extracted.                |
++---------------------------+-------------------------+--------------------------------------------------------------+
+| max_features              | None                    | If not 'None', build a vocabulary that only consider the top |
+|                           |                         | max_features ordered by term frequency across the corpus.    |
++---------------------------+-------------------------+--------------------------------------------------------------+
+| lowercase                 | True                    | Convert all characters to lowercase before tokenizing.       |
++---------------------------+-------------------------+--------------------------------------------------------------+
+| OOV_token                 | None                    | Keyword for unseen words.                                    |
++---------------------------+-------------------------+--------------------------------------------------------------+
+| OOV_words                 | []                      | List of words to be treated as 'OOV_token' during training.  |
++---------------------------+-------------------------+--------------------------------------------------------------+
+| alias                     | CountVectorFeaturizer   | Alias name of featurizer.                                    |
++---------------------------+-------------------------+--------------------------------------------------------------+
+| use_lemma                 | True                    | Use the lemma of words for featurization.                    |
++---------------------------+-------------------------+--------------------------------------------------------------+
+| additional_vocabulary_size| text: 1000              | Size of additional vocabulary to account for incremental     |
+|                           | response: 1000          | training while training a model from scratch                 |
+|                           | action_text: 1000       |                                                              |
++---------------------------+-------------------------+--------------------------------------------------------------+
 ```
 
 </details>

diff --git a/docs/docs/migration-guide.mdx b/docs/docs/migration-guide.mdx
@@ -24,6 +24,13 @@ Support for Markdown data will be removed entirely in Rasa Open Source 3.0.0.
 Please convert your existing Markdown data by using the commands
 described [here](./migration-guide.mdx#training-data-files).
 
+### Policies
+
+[Policies](./policies.mdx) now require a `**kwargs` argument in their constructor and `load` method.
+Policies without `**kwargs` will be supported until Rasa version `3.0.0`.
+However when using [incremental training](./command-line-interface.mdx#incremental-training)
+`**kwargs` **must** be included.
+
 ## Rasa 2.0 to Rasa 2.1
 
 ### Deprecations

diff --git a/docs/docs/telemetry/events.json b/docs/docs/telemetry/events.json
@@ -88,6 +88,10 @@
         "num_regexes": {
           "type": "integer",
           "description": "Total number of regexes defined."
+        },
+        "is_finetuning": {
+          "type": "boolean",
+          "description": "True if a model is trained by finetuning an existing model."
         }
       },
       "additionalProperties": false,
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		Interfaces for `Policy.__init__` and `Policy.load` have changed.
		See [migration guide](./migration-guide.mdx#rasa-21-to-rasa-22) for details.