Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sigmoid to softmax loss #7616

Merged
merged 61 commits into from
Feb 9, 2021
Merged
Show file tree
Hide file tree
Changes from 55 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
5c3870f
first version
dakshvar22 Dec 19, 2020
8cff4ec
remove extra terms from softmax
dakshvar22 Dec 21, 2020
1d29d00
Merge branch 'master' into sigmoid_loss
dakshvar22 Jan 6, 2021
7d971d8
refactor based on config option. Ready for test
dakshvar22 Jan 6, 2021
bcecf41
add sigmoid based prediction during inference
dakshvar22 Jan 7, 2021
6746bbd
docs, changelog, docstrings
dakshvar22 Jan 7, 2021
b674d73
add tests
dakshvar22 Jan 7, 2021
4d4d52e
review comments
dakshvar22 Jan 12, 2021
8dc36e9
Merge branch 'master' into sigmoid_loss
dakshvar22 Jan 12, 2021
db17411
review comments
dakshvar22 Jan 12, 2021
1d5527a
remove sim_neg_ii to run experiments
dakshvar22 Jan 12, 2021
8bcb4bd
merge main
dakshvar22 Jan 25, 2021
763c2cd
revert back experimental change
dakshvar22 Jan 25, 2021
2b9d532
update similarity computation during prediction, to be tested.
dakshvar22 Jan 31, 2021
e8b5eac
Merge branch 'main' into sigmoid_loss
dakshvar22 Jan 31, 2021
28e8c26
update docs, test various options
dakshvar22 Jan 31, 2021
7eeb251
assertive
dakshvar22 Jan 31, 2021
0d175d4
fix plotting
dakshvar22 Jan 31, 2021
af82d21
fix ted, add line to migration
dakshvar22 Feb 1, 2021
d11ab35
dummy change to trigger tests
dakshvar22 Feb 1, 2021
98ced0b
merge main
dakshvar22 Feb 1, 2021
d5e0199
Merge branch 'main' into sigmoid_loss
dakshvar22 Feb 2, 2021
4cf0750
add changes for autoconfig, defaults
dakshvar22 Feb 3, 2021
65e0ecf
Merge branch 'main' into sigmoid_loss
dakshvar22 Feb 3, 2021
827dc2b
fix test
dakshvar22 Feb 4, 2021
6e44c2f
Apply suggestions from code review
dakshvar22 Feb 5, 2021
f5d26e7
remove parallel iter and complex op
dakshvar22 Feb 5, 2021
789f290
merge other review comments
dakshvar22 Feb 5, 2021
ba7300b
Merge branch 'main' into sigmoid_loss
dakshvar22 Feb 5, 2021
6c71556
Merge branch 'sigmoid_loss' of github.com:RasaHQ/rasa into sigmoid_loss
dakshvar22 Feb 5, 2021
a5286eb
more review comments
dakshvar22 Feb 5, 2021
bdadebf
fix tests
dakshvar22 Feb 5, 2021
3a3b0f3
add conditions
dakshvar22 Feb 5, 2021
cf27ec4
add tests for diet and ted
dakshvar22 Feb 7, 2021
476e598
add types
dakshvar22 Feb 7, 2021
3d554e3
added tests for TED
dakshvar22 Feb 7, 2021
54f9ee4
change plotting strategy, testing
dakshvar22 Feb 7, 2021
5734612
change function call
dakshvar22 Feb 7, 2021
1dc6930
self review, add types, docformats
dakshvar22 Feb 7, 2021
ab1e7b3
revert back plotting changes
dakshvar22 Feb 7, 2021
f2da6bb
final plotting style
dakshvar22 Feb 8, 2021
8fb7ea2
change epochs to 1
dakshvar22 Feb 8, 2021
2724b19
Partial suggestions from code review
dakshvar22 Feb 8, 2021
8c66bd8
Apply doc suggestions from code review
dakshvar22 Feb 8, 2021
06a70ee
refactor loss, add docstrings
dakshvar22 Feb 8, 2021
e3a548e
merge other comments
dakshvar22 Feb 8, 2021
6f9cd90
remove none for similarity_type
dakshvar22 Feb 8, 2021
6697c0d
override defaults during load so that new parameters are filled in be…
dakshvar22 Feb 8, 2021
13d8aa8
change call to deprecated function check
dakshvar22 Feb 8, 2021
9ea25dd
more comments
dakshvar22 Feb 8, 2021
ffcfdd1
add tests for config checks
dakshvar22 Feb 8, 2021
bacc4bc
Merge branch 'main' into sigmoid_loss
dakshvar22 Feb 8, 2021
25abb8d
remove prints
dakshvar22 Feb 8, 2021
c2e74e1
Update docs/docs/migration-guide.mdx
dakshvar22 Feb 8, 2021
c2b9b93
fix test
dakshvar22 Feb 8, 2021
c69fdbb
Update rasa/utils/tensorflow/layers.py
dakshvar22 Feb 9, 2021
fbf9a71
Update docs/docs/migration-guide.mdx
dakshvar22 Feb 9, 2021
b963b52
add tflayerconfigexception
dakshvar22 Feb 9, 2021
e4872c6
add full stop
dakshvar22 Feb 9, 2021
80db144
Merge branch 'main' into sigmoid_loss
dakshvar22 Feb 9, 2021
1f60b8f
last review comments
dakshvar22 Feb 9, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions changelog/7616.improvement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Added two new parameters `constrain_similarities` and `model_confidence` to machine learning (ML) components - [DIETClassifier](components.mdx#dietclassifier), [ResponseSelector](components.mdx#dietclassifier) and [TEDPolicy](policies.mdx#ted-policy).

Setting `constrain_similarities=True` adds a sigmoid cross-entropy loss on all similarity values to restrict them to an approximate range in `DotProductLoss`. This should help the models to perform better on real world test sets.
By default, the parameter is set to `False` to preserve the old behaviour, but users are encouraged to set it to `True` and re-train their assistants as it will be set to `True` by default from Rasa Open Source 3.0.0 onwards.

Parameter `model_confidence` affects how model's confidence for each label is computed during inference. It can take three values:
1. `softmax` - Similarities between input and label embeddings are post-processed with a softmax function, as a result of which confidence for all labels sum up to 1.
2. `cosine` - Cosine similarity between input label embeddings. Confidence for each label will be in the range `[-1,1]`.
3. `inner` - Dot product similarity between input and label embeddings. Confidence for each label will be in an unbounded range.

Setting `model_confidence=cosine` should help users tune the fallback thresholds of their assistant better. The default value is `softmax` to preserve the old behaviour, but we recommend using `cosine` as that will be the new default value from Rasa Open Source 3.0.0 onwards. The value of this option does not affect how confidences are computed for entity predictions in `DIETClassifier` and `TEDPolicy`.

With both the above recommendations, users should configure their ML component, e.g. `DIETClassifier`, as
```yaml
- name: DIETClassifier
model_confidence: cosine
constrain_similarities: True
...
```
Once the assistant is re-trained with the above configuration, users should also tune fallback confidence thresholds.

Configuration option `loss_type=softmax` is now deprecated and will be removed in Rasa Open Source 3.0.0 . Use `loss_type=cross_entropy` instead.

The default [auto-configuration](model-configuration.mdx#suggested-config) is changed to use `constrain_similarities=True` and `model_confidence=cosine` in ML components so that new users start with the recommended configuration.
6 changes: 6 additions & 0 deletions data/test_config/config_empty_en_after_dumping.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,13 @@ pipeline:
# max_ngram: 4
# - name: DIETClassifier
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: EntitySynonymMapper
# - name: ResponseSelector
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: FallbackClassifier
# threshold: 0.3
# ambiguity_threshold: 0.1
Expand All @@ -27,4 +31,6 @@ policies:
# - name: TEDPolicy
# max_history: 5
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: RulePolicy
2 changes: 2 additions & 0 deletions data/test_config/config_empty_en_after_dumping_core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,6 @@ policies:
# - name: TEDPolicy
# max_history: 5
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: RulePolicy
4 changes: 4 additions & 0 deletions data/test_config/config_empty_en_after_dumping_nlu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,13 @@ pipeline:
# max_ngram: 4
# - name: DIETClassifier
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: EntitySynonymMapper
# - name: ResponseSelector
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: FallbackClassifier
# threshold: 0.3
# ambiguity_threshold: 0.1
Expand Down
6 changes: 6 additions & 0 deletions data/test_config/config_empty_fr_after_dumping.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,13 @@ pipeline:
# max_ngram: 4
# - name: DIETClassifier
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: EntitySynonymMapper
# - name: ResponseSelector
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: FallbackClassifier
# threshold: 0.3
# ambiguity_threshold: 0.1
Expand All @@ -27,4 +31,6 @@ policies:
# - name: TEDPolicy
# max_history: 5
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: RulePolicy
2 changes: 2 additions & 0 deletions data/test_config/config_with_comments_after_dumping.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ policies: # even here
# - name: TEDPolicy
# max_history: 5
# epochs: 100
# constrain_similarities: true
# model_confidence: cosine
# - name: RulePolicy

# comments everywhere
50 changes: 44 additions & 6 deletions docs/docs/components.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -1531,10 +1531,12 @@ However, additional parameters exist that can be adapted.
| similarity_type | "auto" | Type of similarity measure to use, either 'auto' or 'cosine' |
| | | or 'inner'. |
+---------------------------------+------------------+--------------------------------------------------------------+
| loss_type | "softmax" | The type of the loss function, either 'softmax' or 'margin'. |
| loss_type | "cross_entropy" | The type of the loss function, either 'cross_entropy' |
| | | or 'margin'. |
+---------------------------------+------------------+--------------------------------------------------------------+
| ranking_length | 10 | Number of top actions to normalize scores for loss type |
| | | 'softmax'. Set to 0 to turn off normalization. |
| ranking_length | 10 | Number of top intents to normalize scores for. Applicable |
| | | only with loss type 'cross_entropy' and 'softmax' |
| | | confidences. Set to 0 to disable normalization. |
+---------------------------------+------------------+--------------------------------------------------------------+
| maximum_positive_similarity | 0.8 | Indicates how similar the algorithm should try to make |
| | | embedding vectors for correct labels. |
Expand Down Expand Up @@ -1616,6 +1618,24 @@ However, additional parameters exist that can be adapted.
| | | ... |
| | | ``` |
+---------------------------------+------------------+--------------------------------------------------------------+
| constrain_similarities | False | If `True`, applies sigmoid on all similarity terms and adds |
| | | it to the loss function to ensure that similarity values are |
| | | approximately bounded. Used only if `loss_type=cross_entropy`|
+---------------------------------+------------------+--------------------------------------------------------------+
| model_confidence | "softmax" | Affects how model's confidence for each intent |
| | | is computed. It can take three values |
| | | 1. `softmax` - Similarities between input and intent |
| | | embeddings are post-processed with a softmax function, |
| | | as a result of which confidence for all intents sum up to 1. |
| | | 2. `cosine` - Cosine similarity between input and intent |
| | | embeddings. Confidence for each intent is in the |
| | | range `[-1,1]`. |
| | | 3. `inner` - Dot product similarity between input and intent |
| | | embeddings. Confidence for each intent is in an unbounded |
| | | range. |
| | | This parameter does not affect the confidence for entity |
| | | prediction. |
+---------------------------------+------------------+--------------------------------------------------------------+
```

:::note
Expand Down Expand Up @@ -2742,10 +2762,12 @@ However, additional parameters exist that can be adapted.
| similarity_type | "auto" | Type of similarity measure to use, either 'auto' or 'cosine' |
| | | or 'inner'. |
+---------------------------------+-------------------+--------------------------------------------------------------+
| loss_type | "softmax" | The type of the loss function, either 'softmax' or 'margin'. |
| loss_type | "cross_entropy" | The type of the loss function, either 'cross_entropy' |
| | | or 'margin'. |
+---------------------------------+-------------------+--------------------------------------------------------------+
| ranking_length | 10 | Number of top actions to normalize scores for loss type |
| | | 'softmax'. Set to 0 to turn off normalization. |
| ranking_length | 10 | Number of top responses to normalize scores for. Applicable |
| | | only with loss type 'cross_entropy' and 'softmax' |
| | | confidences. Set to 0 to disable normalization. |
+---------------------------------+-------------------+--------------------------------------------------------------+
| maximum_positive_similarity | 0.8 | Indicates how similar the algorithm should try to make |
| | | embedding vectors for correct labels. |
Expand Down Expand Up @@ -2814,6 +2836,22 @@ However, additional parameters exist that can be adapted.
| | | Requires `evaluate_on_number_of_examples > 0` and |
| | | `evaluate_every_number_of_epochs > 0` |
+---------------------------------+-------------------+--------------------------------------------------------------+
| constrain_similarities | False | If `True`, applies sigmoid on all similarity terms and adds |
| | | it to the loss function to ensure that similarity values are |
| | | approximately bounded. Used only if `loss_type=cross_entropy`|
+---------------------------------+-------------------+--------------------------------------------------------------+
| model_confidence | "softmax" | Affects how model's confidence for each response label |
| | | is computed. It can take three values |
| | | 1. `softmax` - Similarities between input and response label |
| | | embeddings are post-processed with a softmax function, |
| | | as a result of which confidence for all labels sum up to 1. |
| | | 2. `cosine` - Cosine similarity between input and response |
| | | label embeddings. Confidence for each label is in the |
| | | range `[-1,1]`. |
| | | 3. `inner` - Dot product similarity between input and |
| | | response label embeddings. Confidence for each label is in an|
| | | unbounded range. |
+---------------------------------+-------------------+--------------------------------------------------------------+
```

:::note
Expand Down
27 changes: 27 additions & 0 deletions docs/docs/migration-guide.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,33 @@ description: |
This page contains information about changes between major versions and
how you can migrate from one version to another.

## Rasa 2.2 to Rasa 2.3

### Machine Learning Components

A few changes have been made to the loss function inside machine learning (ML)
components `DIETClassifier`, `ResponseSelector` and `TEDPolicy`. These include:
1. Configuration option `loss_type=softmax` is now deprecated and will be removed in Rasa Open Source 3.0.0. Use `loss_type=cross_entropy` instead.
2. The default loss function (`loss_type=cross_entropy`) can add an optional sigmoid cross-entropy loss of all similarity values to constrain
them to an approximate range. You can turn on this option by setting `constrain_similarities=True`. This should help the models to perform better on real world test sets.

Also, a new option `model_confidence` has been added to each ML component. It affects how model's confidence for each label is computed during inference. It can take one of three values:
dakshvar22 marked this conversation as resolved.
Show resolved Hide resolved
1. `softmax` - Similarities between input and label embeddings are post-processed with a softmax function, as a result of which confidence for all labels sum up to 1.
2. `cosine` - Cosine similarity between input and label embeddings. Confidence for each label will be in the range `[-1,1]`.
3. `inner` - Dot product similarity between input and label embeddings. Confidence for each label will be in an unbounded range.
The default value is `softmax`, but we recommend using `cosine` as that will be the new default value from Rasa Open Source 3.0.0 onwards.
The value of this option does not affect how confidences are computed for entity predictions in `DIETClassifier` and `TEDPolicy`.

With both the above recommendations, users should configure their ML component, e.g. `DIETClassifier`, as:
```
- name: DIETClassifier
model_confidence: cosine
constrain_similarities: True
...
```
Once the assistant is re-trained with the above configuration, users should also tune fallback confidence thresholds.


## Rasa 2.1 to Rasa 2.2

### General
Expand Down
24 changes: 21 additions & 3 deletions docs/docs/policies.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -268,10 +268,12 @@ However, additional parameters exist that can be adapted.
| similarity_type | "auto" | Type of similarity measure to use, either 'auto' or 'cosine' |
| | | or 'inner'. |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| loss_type | "softmax" | The type of the loss function, either 'softmax' or 'margin'. |
| loss_type | "cross_entropy" | The type of the loss function, either 'cross_entropy' |
| | | or 'margin'. |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| ranking_length | 10 | Number of top actions to normalize scores for loss type |
| | | 'softmax'. Set to 0 to turn off normalization. |
| ranking_length | 10 | Number of top actions to normalize scores for. Applicable |
| | | only with loss type 'cross_entropy' and 'softmax' |
wochinge marked this conversation as resolved.
Show resolved Hide resolved
| | | confidences. Set to 0 to disable normalization. |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| maximum_positive_similarity | 0.8 | Indicates how similar the algorithm should try to make |
| | | embedding vectors for correct labels. |
Expand Down Expand Up @@ -344,6 +346,22 @@ However, additional parameters exist that can be adapted.
| entity_recognition | True | If 'True' entity recognition is trained and entities are |
| | | extracted. |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| constrain_similarities | False | If `True`, applies sigmoid on all similarity terms and adds |
| | | it to the loss function to ensure that similarity values are |
| | | approximately bounded. Used only when `loss_type=softmax`. |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| model_confidence | "softmax" | Affects how model's confidence for each action |
| | | is computed. It can take three values |
| | | 1. `softmax` - Similarities between input and action |
| | | embeddings are post-processed with a softmax function, |
| | | as a result of which confidence for all labels sum up to 1. |
| | | 2. `cosine` - Cosine similarity between input and action |
| | | embeddings. Confidence for each label is in the |
| | | range `[-1,1]`. |
| | | 3. `inner` - Dot product similarity between input and action |
| | | embeddings. Confidence for each label is in an |
| | | unbounded range. |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| BILOU_flag | True | If 'True', additional BILOU tags are added to entity labels. |
+---------------------------------------+------------------------+--------------------------------------------------------------+
| split_entities_by_comma | True | Splits a list of extracted entities by comma to treat each |
Expand Down
Loading