Skip to content

Commit

Permalink
Merge pull request #7616 from RasaHQ/sigmoid_loss
Browse files Browse the repository at this point in the history
Add sigmoid to softmax loss
  • Loading branch information
rasabot authored Feb 9, 2021
2 parents aaf7de2 + 1f60b8f commit 7d79298
Show file tree
Hide file tree
Showing 24 changed files with 968 additions and 134 deletions.
24 changes: 24 additions & 0 deletions changelog/7616.improvement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Added two new parameters `constrain_similarities` and `model_confidence` to machine learning (ML) components - [DIETClassifier](components.mdx#dietclassifier), [ResponseSelector](components.mdx#dietclassifier) and [TEDPolicy](policies.mdx#ted-policy).

Setting `constrain_similarities=True` adds a sigmoid cross-entropy loss on all similarity values to restrict them to an approximate range in `DotProductLoss`. This should help the models to perform better on real world test sets.
By default, the parameter is set to `False` to preserve the old behaviour, but users are encouraged to set it to `True` and re-train their assistants as it will be set to `True` by default from Rasa Open Source 3.0.0 onwards.

Parameter `model_confidence` affects how model's confidence for each label is computed during inference. It can take three values:
1. `softmax` - Similarities between input and label embeddings are post-processed with a softmax function, as a result of which confidence for all labels sum up to 1.
2. `cosine` - Cosine similarity between input label embeddings. Confidence for each label will be in the range `[-1,1]`.
3. `inner` - Dot product similarity between input and label embeddings. Confidence for each label will be in an unbounded range.

Setting `model_confidence=cosine` should help users tune the fallback thresholds of their assistant better. The default value is `softmax` to preserve the old behaviour, but we recommend using `cosine` as that will be the new default value from Rasa Open Source 3.0.0 onwards. The value of this option does not affect how confidences are computed for entity predictions in `DIETClassifier` and `TEDPolicy`.

With both the above recommendations, users should configure their ML component, e.g. `DIETClassifier`, as
```yaml
- name: DIETClassifier
model_confidence: cosine
constrain_similarities: True
...
```
Once the assistant is re-trained with the above configuration, users should also tune fallback confidence thresholds.

Configuration option `loss_type=softmax` is now deprecated and will be removed in Rasa Open Source 3.0.0 . Use `loss_type=cross_entropy` instead.

The default [auto-configuration](model-configuration.mdx#suggested-config) is changed to use `constrain_similarities=True` and `model_confidence=cosine` in ML components so that new users start with the recommended configuration.
6 changes: 6 additions & 0 deletions data/test_config/config_empty_en_after_dumping.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,13 @@ pipeline:
# max_ngram: 4
# - name: DIETClassifier
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: EntitySynonymMapper
# - name: ResponseSelector
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: FallbackClassifier
# threshold: 0.3
# ambiguity_threshold: 0.1
Expand All @@ -27,4 +31,6 @@ policies:
# - name: TEDPolicy
# max_history: 5
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: RulePolicy
2 changes: 2 additions & 0 deletions data/test_config/config_empty_en_after_dumping_core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,6 @@ policies:
# - name: TEDPolicy
# max_history: 5
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: RulePolicy
4 changes: 4 additions & 0 deletions data/test_config/config_empty_en_after_dumping_nlu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,13 @@ pipeline:
# max_ngram: 4
# - name: DIETClassifier
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: EntitySynonymMapper
# - name: ResponseSelector
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: FallbackClassifier
# threshold: 0.3
# ambiguity_threshold: 0.1
Expand Down
6 changes: 6 additions & 0 deletions data/test_config/config_empty_fr_after_dumping.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,13 @@ pipeline:
# max_ngram: 4
# - name: DIETClassifier
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: EntitySynonymMapper
# - name: ResponseSelector
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: FallbackClassifier
# threshold: 0.3
# ambiguity_threshold: 0.1
Expand All @@ -27,4 +31,6 @@ policies:
# - name: TEDPolicy
# max_history: 5
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: RulePolicy
2 changes: 2 additions & 0 deletions data/test_config/config_with_comments_after_dumping.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ policies: # even here
# - name: TEDPolicy
# max_history: 5
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: RulePolicy

# comments everywhere
50 changes: 44 additions & 6 deletions docs/docs/components.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -1531,10 +1531,12 @@ However, additional parameters exist that can be adapted.
| similarity_type | "auto" | Type of similarity measure to use, either 'auto' or 'cosine' |
| | | or 'inner'. |
+---------------------------------+------------------+--------------------------------------------------------------+
| loss_type | "softmax" | The type of the loss function, either 'softmax' or 'margin'. |
| loss_type | "cross_entropy" | The type of the loss function, either 'cross_entropy' |
| | | or 'margin'. |
+---------------------------------+------------------+--------------------------------------------------------------+
| ranking_length | 10 | Number of top actions to normalize scores for loss type |
| | | 'softmax'. Set to 0 to turn off normalization. |
| ranking_length | 10 | Number of top intents to normalize scores for. Applicable |
| | | only with loss type 'cross_entropy' and 'softmax' |
| | | confidences. Set to 0 to disable normalization. |
+---------------------------------+------------------+--------------------------------------------------------------+
| maximum_positive_similarity | 0.8 | Indicates how similar the algorithm should try to make |
| | | embedding vectors for correct labels. |
Expand Down Expand Up @@ -1616,6 +1618,24 @@ However, additional parameters exist that can be adapted.
| | | ... |
| | | ``` |
+---------------------------------+------------------+--------------------------------------------------------------+
| constrain_similarities | False | If `True`, applies sigmoid on all similarity terms and adds |
| | | it to the loss function to ensure that similarity values are |
| | | approximately bounded. Used only if `loss_type=cross_entropy`|
+---------------------------------+------------------+--------------------------------------------------------------+
| model_confidence | "softmax" | Affects how model's confidence for each intent |
| | | is computed. It can take three values |
| | | 1. `softmax` - Similarities between input and intent |
| | | embeddings are post-processed with a softmax function, |
| | | as a result of which confidence for all intents sum up to 1. |
| | | 2. `cosine` - Cosine similarity between input and intent |
| | | embeddings. Confidence for each intent is in the |
| | | range `[-1,1]`. |
| | | 3. `inner` - Dot product similarity between input and intent |
| | | embeddings. Confidence for each intent is in an unbounded |
| | | range. |
| | | This parameter does not affect the confidence for entity |
| | | prediction. |
+---------------------------------+------------------+--------------------------------------------------------------+
```

:::note
Expand Down Expand Up @@ -2742,10 +2762,12 @@ However, additional parameters exist that can be adapted.
| similarity_type | "auto" | Type of similarity measure to use, either 'auto' or 'cosine' |
| | | or 'inner'. |
+---------------------------------+-------------------+--------------------------------------------------------------+
| loss_type | "softmax" | The type of the loss function, either 'softmax' or 'margin'. |
| loss_type | "cross_entropy" | The type of the loss function, either 'cross_entropy' |
| | | or 'margin'. |
+---------------------------------+-------------------+--------------------------------------------------------------+
| ranking_length | 10 | Number of top actions to normalize scores for loss type |
| | | 'softmax'. Set to 0 to turn off normalization. |
| ranking_length | 10 | Number of top responses to normalize scores for. Applicable |
| | | only with loss type 'cross_entropy' and 'softmax' |
| | | confidences. Set to 0 to disable normalization. |
+---------------------------------+-------------------+--------------------------------------------------------------+
| maximum_positive_similarity | 0.8 | Indicates how similar the algorithm should try to make |
| | | embedding vectors for correct labels. |
Expand Down Expand Up @@ -2814,6 +2836,22 @@ However, additional parameters exist that can be adapted.
| | | Requires `evaluate_on_number_of_examples > 0` and |
| | | `evaluate_every_number_of_epochs > 0` |
+---------------------------------+-------------------+--------------------------------------------------------------+
| constrain_similarities | False | If `True`, applies sigmoid on all similarity terms and adds |
| | | it to the loss function to ensure that similarity values are |
| | | approximately bounded. Used only if `loss_type=cross_entropy`|
+---------------------------------+-------------------+--------------------------------------------------------------+
| model_confidence | "softmax" | Affects how model's confidence for each response label |
| | | is computed. It can take three values |
| | | 1. `softmax` - Similarities between input and response label |
| | | embeddings are post-processed with a softmax function, |
| | | as a result of which confidence for all labels sum up to 1. |
| | | 2. `cosine` - Cosine similarity between input and response |
| | | label embeddings. Confidence for each label is in the |
| | | range `[-1,1]`. |
| | | 3. `inner` - Dot product similarity between input and |
| | | response label embeddings. Confidence for each label is in an|
| | | unbounded range. |
+---------------------------------+-------------------+--------------------------------------------------------------+
```

:::note
Expand Down
27 changes: 27 additions & 0 deletions docs/docs/migration-guide.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,33 @@ description: |
This page contains information about changes between major versions and
how you can migrate from one version to another.

## Rasa 2.2 to Rasa 2.3

### Machine Learning Components

A few changes have been made to the loss function inside machine learning (ML)
components `DIETClassifier`, `ResponseSelector` and `TEDPolicy`. These include:
1. Configuration option `loss_type=softmax` is now deprecated and will be removed in Rasa Open Source 3.0.0. Use `loss_type=cross_entropy` instead.
2. The default loss function (`loss_type=cross_entropy`) can add an optional sigmoid cross-entropy loss of all similarity values to constrain
them to an approximate range. You can turn on this option by setting `constrain_similarities=True`. This should help the models to perform better on real world test sets.

Also, a new option `model_confidence` has been added to each ML component. It affects how a model's confidence for each label is computed during inference. It can take one of three values:
1. `softmax` - Similarities between input and label embeddings are post-processed with a softmax function, as a result of which confidence for all labels sum up to 1.
2. `cosine` - Cosine similarity between input and label embeddings. Confidence for each label will be in the range `[-1,1]`.
3. `inner` - Dot product similarity between input and label embeddings. Confidence for each label will be in an unbounded range.
The default value is `softmax`, but we recommend using `cosine` as that will be the new default value from Rasa Open Source 3.0.0 onwards.
The value of this option does not affect how confidences are computed for entity predictions in `DIETClassifier` and `TEDPolicy`.

With both the above recommendations, users should configure their ML component, e.g. `DIETClassifier`, as:
```
- name: DIETClassifier
model_confidence: cosine
constrain_similarities: True
...
```
Once the assistant is re-trained with the above configuration, users should also tune fallback confidence thresholds.


## Rasa 2.1 to Rasa 2.2

### General
Expand Down
24 changes: 21 additions & 3 deletions docs/docs/policies.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -268,10 +268,12 @@ However, additional parameters exist that can be adapted.
| similarity_type | "auto" | Type of similarity measure to use, either 'auto' or 'cosine' |
| | | or 'inner'. |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| loss_type | "softmax" | The type of the loss function, either 'softmax' or 'margin'. |
| loss_type | "cross_entropy" | The type of the loss function, either 'cross_entropy' |
| | | or 'margin'. |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| ranking_length | 10 | Number of top actions to normalize scores for loss type |
| | | 'softmax'. Set to 0 to turn off normalization. |
| ranking_length | 10 | Number of top actions to normalize scores for. Applicable |
| | | only with loss type 'cross_entropy' and 'softmax' |
| | | confidences. Set to 0 to disable normalization. |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| maximum_positive_similarity | 0.8 | Indicates how similar the algorithm should try to make |
| | | embedding vectors for correct labels. |
Expand Down Expand Up @@ -344,6 +346,22 @@ However, additional parameters exist that can be adapted.
| entity_recognition | True | If 'True' entity recognition is trained and entities are |
| | | extracted. |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| constrain_similarities | False | If `True`, applies sigmoid on all similarity terms and adds |
| | | it to the loss function to ensure that similarity values are |
| | | approximately bounded. Used only when `loss_type=softmax`. |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| model_confidence | "softmax" | Affects how model's confidence for each action |
| | | is computed. It can take three values |
| | | 1. `softmax` - Similarities between input and action |
| | | embeddings are post-processed with a softmax function, |
| | | as a result of which confidence for all labels sum up to 1. |
| | | 2. `cosine` - Cosine similarity between input and action |
| | | embeddings. Confidence for each label is in the |
| | | range `[-1,1]`. |
| | | 3. `inner` - Dot product similarity between input and action |
| | | embeddings. Confidence for each label is in an |
| | | unbounded range. |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| BILOU_flag | True | If 'True', additional BILOU tags are added to entity labels. |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| split_entities_by_comma | True | Splits a list of extracted entities by comma to treat each |
Expand Down
Loading

0 comments on commit 7d79298

Please sign in to comment.