Skip to content

Commit

Permalink
Merge pull request #7498 from RasaHQ/continuous_training
Browse files Browse the repository at this point in the history
Incremental Training inside Rasa Open Source
  • Loading branch information
dakshvar22 authored Dec 15, 2020
2 parents 9bb9033 + bfedf85 commit 9f94cbc
Show file tree
Hide file tree
Showing 49 changed files with 2,958 additions and 398 deletions.
14 changes: 14 additions & 0 deletions changelog/6971.feature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Incremental training of models in a pipeline is now supported.

If you have added new NLU training examples or new stories/rules for
dialogue manager, you don't need to train the pipeline from scratch.
Instead, you can initialize the pipeline with a previously trained model
and continue finetuning the model on the complete dataset consisting of
new training examples. To do so, use `rasa train --finetune`. For more
detailed explanation of the command, check out the docs on [incremental
training](./command-line-interface.mdx#incremental-training).

Added a configuration parameter `additional_vocabulary_size` to
[`CountVectorsFeaturizer`](./components.mdx#countvectorsfeaturizer)
and `number_additional_patterns` to [`RegexFeaturizer`](./components.mdx#regexfeaturizer).
These parameters are useful to configure when using incremental training for your pipelines.
2 changes: 2 additions & 0 deletions changelog/7458.removal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Interfaces for `Policy.__init__` and `Policy.load` have changed.
See [migration guide](./migration-guide.mdx#rasa-21-to-rasa-22) for details.
53 changes: 53 additions & 0 deletions docs/docs/command-line-interface.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,59 @@ The following arguments can be used to configure the training process:
```text [rasa train --help]
```

### Incremental training

:::caution
This feature is experimental.
We introduce experimental features to get feedback from our community, so we encourage you to try it out!
However, the functionality might be changed or removed in the future.
If you have feedback (positive or negative) please share it with us on the [Rasa Forum](https://forum.rasa.com).

:::

In order to improve the performance of an assistant, it's helpful to practice [CDD](./conversation-driven-development.mdx)
and add new training examples based on how your users have talked to your assistant. You can use `rasa train --finetune`
to initialize the pipeline with an already trained model and further finetune it on the
new training dataset that includes the additional training examples. This will help reduce the
training time of the new model.

By default, the command picks up the latest model in the `models/` directory. If you have a specific model
which you want to improve, you may specify the path to this by
running `rasa train --finetune <path to model to finetune>`. Finetuning a model usually
requires fewer epochs to train machine learning components like `DIETClassifier`, `ResponseSelector` and `TEDPolicy` compared to training from scratch.
Either use a model configuration for finetuning
which defines fewer epochs than before or use the flag
`--epoch-fraction`. `--epoch-fraction` will use a fraction of the epochs specified for each machine learning component
in the model configuration file. For example, if `DIETClassifier` is configured to use 100 epochs,
specifying `--epoch-fraction 0.5` will only use 50 epochs for finetuning.

You can also finetune an NLU-only or dialogue management-only model by using
`rasa train nlu --finetune` and `rasa train core --finetune` respectively.

To be able to fine tune a model, the following conditions must be met:

1. The configuration supplied should be exactly the same as the
configuration used to train the model which is being finetuned.
The only parameter that you can change is `epochs` for the individual machine learning components and policies.

2. The set of labels(intents, actions, entities and slots) for which the base model is trained
should be exactly the same as the ones present in the training data used for finetuning. This
means that you cannot add new intent, action, entity or slot labels to your training data
during incremental training. You can still add new training examples for each of the existing
labels. If you have added/removed labels in the training data, the pipeline needs to be trained
from scratch.

3. The model to be finetuned is trained with `MINIMUM_COMPATIBLE_VERSION` of the currently installed rasa version.

Checkout the docs for [`CountVectorsFeaturizer`](./components.mdx#countvectorsfeaturizer) and
[`RegexFeaturizer`](./components.mdx#regexfeaturizer) to understand how to configure them appropriately for incremental training.

:::note
Finetuned models are expected to be on-par with performance of models trained from scratch. However,
make sure to train your pipelines from scratch frequently to avoid running out of additional
vocabulary slots for the models.
:::

## rasa interactive

You can [use Rasa X in local mode](https://rasa.com/docs/rasa-x) to do interactive learning in a UI,
Expand Down
148 changes: 102 additions & 46 deletions docs/docs/components.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -836,6 +836,29 @@ Note: The `feature-dimension` for sequence and sentence features does not have t
"use_word_boundaries": True
```

**Configuring for incremental training**

To ensure that `sparse_features` are of fixed size during
[incremental training](./command-line-interface.mdx#incremental-training), the
component should be configured to account for additional patterns that may be
added to the training data in future. To do so, configure the `number_additional_patterns`
parameter while training the base model from scratch:

```yaml-rasa {3}
pipeline:
- name: RegexFeaturizer
number_additional_patterns: 10
```

If not configured by the user, the component will use twice the number of
patterns currently present in the training data (including lookup tables and regex patterns)
as the default value for `number_additional_patterns`.
This number is kept at a minimum of 10 in order to avoid running out of additional
slots for new patterns too frequently during incremental training.
Once the component runs out of additional pattern slots, the new patterns are dropped
and not considered during featurization. At this point, it is advisable
to retrain a new model from scratch.


### CountVectorsFeaturizer

Expand Down Expand Up @@ -960,58 +983,91 @@ Note: The `feature-dimension` for sequence and sentence features does not have t
"use_shared_vocab": False
```

**Configuring for incremental training**

To ensure that `sparse_features` are of fixed size during
[incremental training](./command-line-interface.mdx#incremental-training), the
component should be configured to account for additional vocabulary tokens
that may be added as part of new training examples in the future.
To do so, configure the `additional_vocabulary_size` parameter while training the base model from scratch:

```yaml-rasa {3-6}
pipeline:
- name: CountVectorsFeaturizer
additional_vocabulary_size:
text: 1000
response: 1000
action_text: 1000
```

As in the above example, you can define additional vocabulary size for each of
`text` (user messages), `response` (bot responses used by `ResponseSelector`) and
`action_text` (bot responses not used by `ResponseSelector`). If you are building a shared
vocabulary (`use_shared_vocab=True`), you only need to define a value for the `text` attribute.
If any of the attribute is not configured by the user, the component takes half of the current
vocabulary size as the default value for the attribute's `additional_vocabulary_size`.
This number is kept at a minimum of 1000 in order to avoid running out of additional vocabulary
slots too frequently during incremental training. Once the component runs out of additional vocabulary slots,
the new vocabulary tokens are dropped and not considered during featurization. At this point,
it is advisable to retrain a new model from scratch.


The above configuration parameters are the ones you should configure to fit your model to your data.
However, additional parameters exist that can be adapted.

<details><summary>More configurable parameters</summary>

```
+-------------------+-------------------------+--------------------------------------------------------------+
| Parameter | Default Value | Description |
+===================+=========================+==============================================================+
| use_shared_vocab | False | If set to 'True' a common vocabulary is used for labels |
| | | and user message. |
+-------------------+-------------------------+--------------------------------------------------------------+
| analyzer | word | Whether the features should be made of word n-gram or |
| | | character n-grams. Option 'char_wb' creates character |
| | | n-grams only from text inside word boundaries; |
| | | n-grams at the edges of words are padded with space. |
| | | Valid values: 'word', 'char', 'char_wb'. |
+-------------------+-------------------------+--------------------------------------------------------------+
| strip_accents | None | Remove accents during the pre-processing step. |
| | | Valid values: 'ascii', 'unicode', 'None'. |
+-------------------+-------------------------+--------------------------------------------------------------+
| stop_words | None | A list of stop words to use. |
| | | Valid values: 'english' (uses an internal list of |
| | | English stop words), a list of custom stop words, or |
| | | 'None'. |
+-------------------+-------------------------+--------------------------------------------------------------+
| min_df | 1 | When building the vocabulary ignore terms that have a |
| | | document frequency strictly lower than the given threshold. |
+-------------------+-------------------------+--------------------------------------------------------------+
| max_df | 1 | When building the vocabulary ignore terms that have a |
| | | document frequency strictly higher than the given threshold |
| | | (corpus-specific stop words). |
+-------------------+-------------------------+--------------------------------------------------------------+
| min_ngram | 1 | The lower boundary of the range of n-values for different |
| | | word n-grams or char n-grams to be extracted. |
+-------------------+-------------------------+--------------------------------------------------------------+
| max_ngram | 1 | The upper boundary of the range of n-values for different |
| | | word n-grams or char n-grams to be extracted. |
+-------------------+-------------------------+--------------------------------------------------------------+
| max_features | None | If not 'None', build a vocabulary that only consider the top |
| | | max_features ordered by term frequency across the corpus. |
+-------------------+-------------------------+--------------------------------------------------------------+
| lowercase | True | Convert all characters to lowercase before tokenizing. |
+-------------------+-------------------------+--------------------------------------------------------------+
| OOV_token | None | Keyword for unseen words. |
+-------------------+-------------------------+--------------------------------------------------------------+
| OOV_words | [] | List of words to be treated as 'OOV_token' during training. |
+-------------------+-------------------------+--------------------------------------------------------------+
| alias | CountVectorFeaturizer | Alias name of featurizer. |
+-------------------+-------------------------+--------------------------------------------------------------+
| use_lemma | True | Use the lemma of words for featurization. |
+-------------------+-------------------------+--------------------------------------------------------------+
+---------------------------+-------------------------+--------------------------------------------------------------+
| Parameter | Default Value | Description |
+===========================+=========================+==============================================================+
| use_shared_vocab | False | If set to 'True' a common vocabulary is used for labels |
| | | and user message. |
+---------------------------+-------------------------+--------------------------------------------------------------+
| analyzer | word | Whether the features should be made of word n-gram or |
| | | character n-grams. Option 'char_wb' creates character |
| | | n-grams only from text inside word boundaries; |
| | | n-grams at the edges of words are padded with space. |
| | | Valid values: 'word', 'char', 'char_wb'. |
+---------------------------+-------------------------+--------------------------------------------------------------+
| strip_accents | None | Remove accents during the pre-processing step. |
| | | Valid values: 'ascii', 'unicode', 'None'. |
+---------------------------+-------------------------+--------------------------------------------------------------+
| stop_words | None | A list of stop words to use. |
| | | Valid values: 'english' (uses an internal list of |
| | | English stop words), a list of custom stop words, or |
| | | 'None'. |
+---------------------------+-------------------------+--------------------------------------------------------------+
| min_df | 1 | When building the vocabulary ignore terms that have a |
| | | document frequency strictly lower than the given threshold. |
+---------------------------+-------------------------+--------------------------------------------------------------+
| max_df | 1 | When building the vocabulary ignore terms that have a |
| | | document frequency strictly higher than the given threshold |
| | | (corpus-specific stop words). |
+---------------------------+-------------------------+--------------------------------------------------------------+
| min_ngram | 1 | The lower boundary of the range of n-values for different |
| | | word n-grams or char n-grams to be extracted. |
+---------------------------+-------------------------+--------------------------------------------------------------+
| max_ngram | 1 | The upper boundary of the range of n-values for different |
| | | word n-grams or char n-grams to be extracted. |
+---------------------------+-------------------------+--------------------------------------------------------------+
| max_features | None | If not 'None', build a vocabulary that only consider the top |
| | | max_features ordered by term frequency across the corpus. |
+---------------------------+-------------------------+--------------------------------------------------------------+
| lowercase | True | Convert all characters to lowercase before tokenizing. |
+---------------------------+-------------------------+--------------------------------------------------------------+
| OOV_token | None | Keyword for unseen words. |
+---------------------------+-------------------------+--------------------------------------------------------------+
| OOV_words | [] | List of words to be treated as 'OOV_token' during training. |
+---------------------------+-------------------------+--------------------------------------------------------------+
| alias | CountVectorFeaturizer | Alias name of featurizer. |
+---------------------------+-------------------------+--------------------------------------------------------------+
| use_lemma | True | Use the lemma of words for featurization. |
+---------------------------+-------------------------+--------------------------------------------------------------+
| additional_vocabulary_size| text: 1000 | Size of additional vocabulary to account for incremental |
| | response: 1000 | training while training a model from scratch |
| | action_text: 1000 | |
+---------------------------+-------------------------+--------------------------------------------------------------+
```

</details>
Expand Down
7 changes: 7 additions & 0 deletions docs/docs/migration-guide.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,13 @@ Support for Markdown data will be removed entirely in Rasa Open Source 3.0.0.
Please convert your existing Markdown data by using the commands
described [here](./migration-guide.mdx#training-data-files).

### Policies

[Policies](./policies.mdx) now require a `**kwargs` argument in their constructor and `load` method.
Policies without `**kwargs` will be supported until Rasa version `3.0.0`.
However when using [incremental training](./command-line-interface.mdx#incremental-training)
`**kwargs` **must** be included.

## Rasa 2.0 to Rasa 2.1

### Deprecations
Expand Down
4 changes: 4 additions & 0 deletions docs/docs/telemetry/events.json
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,10 @@
"num_regexes": {
"type": "integer",
"description": "Total number of regexes defined."
},
"is_finetuning": {
"type": "boolean",
"description": "True if a model is trained by finetuning an existing model."
}
},
"additionalProperties": false,
Expand Down
Loading

0 comments on commit 9f94cbc

Please sign in to comment.