New Response Selector training data format #6591

dakshvar22 · 2020-09-07T16:26:05Z

Proposed changes:

Fixes Proposal to change training data format for ResponseSelector #6480

The new training data format for using retrieval intents looks like this -

nlu.yaml

- intent: chitchat/ask_name
  examples: |
    - What do people call you?
    - Do you have a name for yourself?

- intent: chitchat/ask_weather
  examples: |
    - Oh, do you mind checking the weather for me please?
    - I like sunny days in Berlin.

responses.yaml -

responses:
  utter_chitchat/ask_name:
  - image: "https://i.imgur.com/zTvA58i.jpeg"
    text: hello, my name is retrieval bot.
  - text: Oh yeah, I am called the retrieval bot.

  utter_chitchat/ask_weather:
  - text: Oh, it does look sunny right now in Berlin.
    image: "https://i.imgur.com/vwv7aHN.png"
  - text: I am not sure of the whole week but I can see the sun is out today.

domain.yaml

intents:
    - chitchat

rules.yml

- rule: Response with a chitchat utterance whenever user indulges in some chitchat
  steps:
  - intent: chitchat
  - action: utter_chitchat

Two major changes -

The bot builder doesn't need to know any difference between an utterance action and a retrieval action. Both of these actions now have the same utter_ prefix
Response templates for response selector also now start with utter_ prefix and hence are more consistent with other utterance templates.

Status (please check what you already did):

added some tests for the functionality
updated the documentation
updated the changelog (please check changelog for instructions)
reformat files using black (please check Readme for instructions)

dakshvar22 · 2020-09-07T16:46:28Z

@tmbo Can you please review the implementation logic? If the way in which the information exchange between NLU training data and Domain looks fine, I'll clean up the code, add tests and documentation.

rasa/core/actions/action.py

…rmat

tmbo

the approach looks good 👍 - I think this is a great improvement to the usability of the response selector ⭐

examples/responseselectorbot/domain.yml

rasa/core/actions/action.py

rasa/importers/importer.py

tmbo · 2020-09-08T08:53:28Z

really excited about this 💯

…rmat

github-actions · 2020-09-10T16:15:06Z

Hey @dakshvar22! 👋 To run model regression tests, comment with the /modeltest command and a configuration.

Tips 💡: The model regression test will be run on push events. You can re-run the tests by re-add status:model-regression-tests label or use a Re-run jobs button in Github Actions workflow.

Tips 💡: Every time when you want to change a configuration you should edit the comment with the previous configuration.

You can copy this in your comment and customize:

/modeltest

```yml
##########
## Available datasets
##########
# - "Carbon Bot"
# - "Hermit"
# - "Private 1"
# - "Private 2"
# - "Private 3"
# - "Sara"

##########
## Available configurations
##########
# - "BERT + DIET(bow) + ResponseSelector(bow)"
# - "BERT + DIET(seq) + ResponseSelector(t2t)"
# - "ConveRT + DIET(bow) + ResponseSelector(bow)"
# - "ConveRT + DIET(seq) + ResponseSelector(t2t)"
# - "Spacy + DIET(bow) + ResponseSelector(bow)"
# - "Spacy + DIET(seq) + ResponseSelector(t2t)"
# - "Sparse + ConveRT + DIET(bow) + ResponseSelector(bow)"
# - "Sparse + ConveRT + DIET(seq) + ResponseSelector(t2t)"
# - "Sparse + DIET(bow) + ResponseSelector(bow)"
# - "Sparse + DIET(seq) + ResponseSelector(t2t)"
# - "Sparse + Spacy + DIET(bow) + ResponseSelector(bow)"
# - "Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)"

## Example configuration
#################### syntax #################
## include:
##   - dataset: ["<dataset_name>"]
##     config: ["<configuration_name>"]
#
## Example:
## include:
##  - dataset: ["Carbon Bot"]
##    config: ["Sparse + DIET(bow) + ResponseSelector(bow)"]
#
## Shortcut:
## You can use the "all" shortcut to include all available configurations or datasets
#
## Example: Use the "Sparse + EmbeddingIntent + ResponseSelector(bow)" configuration
## for all available datasets
## include:
##  - dataset: ["all"]
##    config: ["Sparse + DIET(bow) + ResponseSelector(bow)"]
#
## Example: Use all available configurations for the "Carbon Bot" and "Sara" datasets
## and for the "Hermit" dataset use the "Sparse + DIET + ResponseSelector(T2T)" and
## "Sparse + ConveRT + DIET + ResponseSelector(T2T)" configurations:
## include:
##  - dataset: ["Carbon Bot", "Sara"]
##    config: ["all"]
##  - dataset: ["Hermit"]
##    config: ["Sparse + DIET(seq) + ResponseSelector(t2t)", "Sparse + ConveRT + DIET(seq) + ResponseSelector(t2t)"]

include:
 - dataset: ["Carbon Bot"]
   config: ["Sparse + DIET(bow) + ResponseSelector(bow)"]

```

github-actions · 2020-09-10T16:15:09Z

/modeltest

include:
 - dataset: ["Carbon Bot", "Sara"]
   config: ["all"]

github-actions · 2020-09-10T16:15:11Z

The model regression tests have started. It might take a while, please be patient.
As soon as results are ready you'll see a new comment with the results.

Used configuration can be found in the comment.

github-actions · 2020-09-10T18:32:20Z

Commit: b7606ca, The full report is available as an artifact.

Dataset: Carbon Bot

Configuration	Intent Classification Micro F1	Entity Recognition Micro F1	Response Selection Micro F1
`BERT + DIET(bow) + ResponseSelector(bow)` test: `1m1s`, train: `5m59s`, total: `6m59s`	0.7709	0.6260	0.5133
`BERT + DIET(seq) + ResponseSelector(t2t)` test: `1m10s`, train: `4m42s`, total: `5m51s`	0.7748	0.8591	0.5316
`ConveRT + DIET(bow) + ResponseSelector(bow)` test: `38s`, train: `2m32s`, total: `3m10s`	0.8427	0.6260	0.6291
`ConveRT + DIET(seq) + ResponseSelector(t2t)` test: `48s`, train: `3m29s`, total: `4m16s`	0.8136	0.8609	0.6026
`Sparse + ConveRT + DIET(bow) + ResponseSelector(bow)` test: `43s`, train: `3m29s`, total: `4m11s`	0.8311	0.6260	0.6026
`Sparse + ConveRT + DIET(seq) + ResponseSelector(t2t)` test: `51s`, train: `3m55s`, total: `4m46s`	0.8408	0.8831	0.6424
`Sparse + DIET(bow) + ResponseSelector(bow)` test: `21s`, train: `2m49s`, total: `3m9s`	0.7359	0.6260	0.3377
`Sparse + DIET(seq) + ResponseSelector(t2t)` test: `30s`, train: `3m12s`, total: `3m42s`	0.7301	0.6618	0.4702

Dataset: Sara

Configuration	Intent Classification Micro F1	Entity Recognition Micro F1	Response Selection Micro F1
`BERT + DIET(bow) + ResponseSelector(bow)` test: `1m49s`, train: `9m12s`, total: `11m1s`	0.7777	0.8683	0.8196
`BERT + DIET(seq) + ResponseSelector(t2t)` test: `2m1s`, train: `6m44s`, total: `8m45s`	0.8413	0.8861	0.8152
`ConveRT + DIET(bow) + ResponseSelector(bow)` test: `57s`, train: `5m34s`, total: `6m30s`	0.8981	0.8683	0.9326
`ConveRT + DIET(seq) + ResponseSelector(t2t)` test: `1m11s`, train: `4m26s`, total: `5m37s`	0.8883	0.9095	0.9326
`Sparse + ConveRT + DIET(bow) + ResponseSelector(bow)` test: `1m3s`, train: `7m5s`, total: `8m8s`	0.9001	0.8683	0.9261
`Sparse + ConveRT + DIET(seq) + ResponseSelector(t2t)` test: `1m17s`, train: `5m17s`, total: `6m33s`	0.8962	0.9087	0.9304
`Sparse + DIET(bow) + ResponseSelector(bow)` test: `30s`, train: `5m51s`, total: `6m21s`	0.8384	0.8683	0.0196
`Sparse + DIET(seq) + ResponseSelector(t2t)` test: `43s`, train: `4m10s`, total: `4m52s`	0.8384	0.8428	0.8783

…rmat

wochinge

I only reviewed the rasa.shared changes. They are good to go apart from the comments.

I didn't check the rest, but shouldn't we still support old stories which use respond_ instead of utter_?

rasa/shared/core/domain.py

wochinge · 2020-09-11T09:00:00Z

rasa/shared/nlu/constants.py

@@ -24,3 +24,5 @@
 ENTITY_ATTRIBUTE_START = "start"
 ENTITY_ATTRIBUTE_END = "end"
 NO_ENTITY_TAG = "O"
+
+UTTER_PREFIX = "utter_"


Let's move it to rasa.shared then instead of re-adding this constant 👍

rasa/shared/nlu/training_data/training_data.py

rasa/shared/nlu/training_data/util.py

…rmat

dakshvar22 · 2020-09-11T16:51:20Z

@wochinge Since this was an experimental feature we had the liberty to deprecate the old stuff. It's not just respond_ actions that we need to then still support. Even response templates need a utter_ prefix just like usual utterance template. Both these changes are in the migration guide to help the user migrate.

dakshvar22 added 3 commits September 7, 2020 18:20

first implementation works

5593201

Merge branch 'master' into new_rs_format

97b06c7

more reuse

c4df077

dakshvar22 commented Sep 8, 2020

View reviewed changes

rasa/core/actions/action.py Outdated Show resolved Hide resolved

dakshvar22 added 3 commits September 8, 2020 08:42

Update rasa/core/actions/action.py

f9d4996

change if condition style

172245e

Merge branch 'new_rs_format' of github.com:RasaHQ/rasa into new_rs_fo…

f4622a4

…rmat

tmbo reviewed Sep 8, 2020

View reviewed changes

examples/responseselectorbot/domain.yml Show resolved Hide resolved

rasa/core/actions/action.py Outdated Show resolved Hide resolved

rasa/core/actions/action.py Outdated Show resolved Hide resolved

rasa/importers/importer.py Outdated Show resolved Hide resolved

dakshvar22 added this to the 2.0rc1 Rasa Open Source milestone Sep 8, 2020

dakshvar22 added 4 commits September 8, 2020 13:24

refactor call to update retrieval intent props

1af6cbe

merge master

2b34931

fix existing tests. TODO: add more tests

34ac797

Merge branch 'master' into new_rs_format

8f8c73a

dakshvar22 marked this pull request as ready for review September 8, 2020 15:42

more tests

9b7d25a

akelad mentioned this pull request Sep 9, 2020

Fix data split cli #6604

Merged

4 tasks

dakshvar22 added 3 commits September 9, 2020 10:54

add docsstrings, tests, changelog

ffe598e

add migration guide

7c69bb3

Merge branch 'master' into new_rs_format

763f300

dakshvar22 requested a review from tmbo September 9, 2020 09:33

dakshvar22 added 4 commits September 9, 2020 12:39

fix deepsource issues

8bfc171

Merge branch 'new_rs_format' of github.com:RasaHQ/rasa into new_rs_fo…

f20b872

…rmat

Merge branch 'master' into new_rs_format

0b430d0

merge master

e5ca3cc

dakshvar22 requested review from alwx, degiz, federicotdn and ricwo as code owners September 9, 2020 11:56

dakshvar22 added runner:gpu status:model-regression-tests and removed status:model-regression-tests labels Sep 10, 2020

Merge branch 'master' into new_rs_format

3af7bcd

github-actions bot removed status:model-regression-tests runner:gpu labels Sep 10, 2020

dakshvar22 added 2 commits September 10, 2020 21:02

merge master, resolve zillion conflicts

db54aa5

Merge branch 'master' into new_rs_format

39711d2

dakshvar22 mentioned this pull request Sep 10, 2020

Domain should be made available for rasa train nlu command if it's available #6635

Closed

dakshvar22 added 3 commits September 11, 2020 08:37

omit empty domain warning

05cee33

Merge branch 'new_rs_format' of github.com:RasaHQ/rasa into new_rs_fo…

00bebbb

…rmat

merge master, resolve conflicts

151e424

wochinge reviewed Sep 11, 2020

View reviewed changes

dakshvar22 added 5 commits September 11, 2020 14:14

review comments on shared

84db4cd

Merge branch 'master' into new_rs_format

f20928d

add missing import

4e04e4f

Merge branch 'new_rs_format' of github.com:RasaHQ/rasa into new_rs_fo…

2cc1b9d

…rmat

Merge branch 'master' into new_rs_format

a648d81

dakshvar22 added the status:ready-to-merge label Sep 11, 2020

indam23 approved these changes Sep 11, 2020

View reviewed changes

rasabot merged commit c1bdfd0 into master Sep 11, 2020

rasabot deleted the new_rs_format branch September 11, 2020 14:05

This was referenced Sep 19, 2020

Updated retrieval action docs with new prefix #6713

Merged

Rasa data validation not updated for new ResponseSelector format #6728

Closed

dakshvar22 mentioned this pull request Oct 1, 2020

Migration from old response selector format #6865

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Response Selector training data format #6591

New Response Selector training data format #6591

dakshvar22 commented Sep 7, 2020 •

edited

Loading

dakshvar22 commented Sep 7, 2020

tmbo left a comment

tmbo commented Sep 8, 2020

github-actions bot commented Sep 10, 2020

github-actions bot commented Sep 10, 2020

github-actions bot commented Sep 10, 2020

github-actions bot commented Sep 10, 2020

wochinge left a comment

wochinge Sep 11, 2020

dakshvar22 commented Sep 11, 2020

New Response Selector training data format #6591

New Response Selector training data format #6591

Conversation

dakshvar22 commented Sep 7, 2020 • edited Loading

dakshvar22 commented Sep 7, 2020

tmbo left a comment

Choose a reason for hiding this comment

tmbo commented Sep 8, 2020

github-actions bot commented Sep 10, 2020

github-actions bot commented Sep 10, 2020

github-actions bot commented Sep 10, 2020

github-actions bot commented Sep 10, 2020

wochinge left a comment

Choose a reason for hiding this comment

wochinge Sep 11, 2020

Choose a reason for hiding this comment

dakshvar22 commented Sep 11, 2020

dakshvar22 commented Sep 7, 2020 •

edited

Loading