Standardize testing output: Response Selection #5824

tabergma · 2020-05-14T13:43:02Z

Proposed changes:

Plot histogram with confidence distribution for response selection
Write correct and incorrect predictions for response selection

related to #5748

Status (please check what you already did):

added some tests for the functionality
updated the documentation
updated the changelog (please check changelog for instructions)
reformat files using black (please check Readme for instructions)

dakshvar22 · 2020-05-15T08:26:58Z

rasa/core/test.py

@@ -604,7 +604,7 @@ def plot_story_evaluation(
    from sklearn.metrics import confusion_matrix
    from sklearn.utils.multiclass import unique_labels
    import matplotlib.pyplot as plt
-    from rasa.nlu.test import plot_confusion_matrix


A bit unlikely that it happens but it could break someone's code relying on this function in some glue code of their own. Should we add a deprecation warning or keep a dummy function in rasa.nlu.test which just calls the one in rasa.utils.plotting?

Not sure if that is needed as it will be released with 2.0. I also removed two cli options for naming the histogram and confusion matrix file which are not useful anymore. If you think it is needed, I'll keep those and add a deprecation warning.

@tmbo Just wanted to double check if this is fine.

tests/nlu/test_evaluation.py

dakshvar22 · 2020-05-18T08:15:05Z

@tabergma I pushed a small fix to add the response_key to the intent part of response selection results so that when viewing the error file, the dict looks like -

{
    "text": "How do I know someone is actually doing something for my offsets?",
    "intent_target": "faq/is-this-legit",
    "response": "All offset projects have to pass a rigorous verification process set by https://goldstandard.org .",
    "response_prediction": {
      "name": "The cost per ton depends on the project you donate to. Some projects can remove a ton of CO2 for less than $1!",
      "confidence": 0.15156520903110504
    }

instead of

{
    "text": "How do I know someone is actually doing something for my offsets?",
    "intent_target": "faq",
    "response": "All offset projects have to pass a rigorous verification process set by https://goldstandard.org .",
    "response_prediction": {
      "name": "The cost per ton depends on the project you donate to. Some projects can remove a ton of CO2 for less than $1!",
      "confidence": 0.15156520903110504
    }

Otherwise all intents would just be faq which is a bit unhelpful.

dakshvar22 · 2020-05-18T08:19:56Z

@tabergma In the confusion matrix should we use the intent/response_key as the labels of the rows and columns instead of the actual responses? The mapping is deterministic since there is only one response per intent/response_key but it would make the confusion matrix fit well when you have large number of unique responses. What do you think?

tabergma · 2020-05-18T08:23:13Z

@dakshvar22 Good idea. Will try it now.

dakshvar22 · 2020-05-18T08:23:34Z

Ahh, that would need a bigger refactor. Because getting the intent/response_key of the predicted response is not trivial right now. Let's keep it as a separate issue.

tabergma · 2020-05-18T08:25:11Z

😄 Ok. As an alternative, we could also maybe use a legend that maps the actual response to A, B, etc. What do you think?

dakshvar22 · 2020-05-18T08:28:24Z

I would say let us directly replace them with the intent/response_key separately because that's the most ideal.

dakshvar22

Just a small key change for avoiding the ambiguity. Looks great! 💯

rasa/nlu/test.py

Co-authored-by: Daksh Varshneya <[email protected]>

* create plotting utils * write errors and successes for responses * fix file names * update plot filenames * add changelog entry * add missing docstrings * update tests * fix type * address deepsource issues * use io_utils * fix call to write_text_file * added complete intent and response key to the 'intent' key * Update rasa/nlu/test.py Co-authored-by: Daksh Varshneya <[email protected]> * Update rasa/nlu/test.py Co-authored-by: Daksh Varshneya <[email protected]> * use intent_target key for confustion matrix Co-authored-by: Daksh <[email protected]> Co-authored-by: Roberto <[email protected]>

tabergma added 7 commits May 14, 2020 10:54

create plotting utils

1cc86bb

write errors and successes for responses

279f6ab

fix file names

82d6e68

update plot filenames

55d1f28

add changelog entry

2047fc7

add missing docstrings

94cd682

update tests

0164763

tabergma requested a review from dakshvar22 May 14, 2020 13:43

tabergma added 2 commits May 14, 2020 17:09

fix type

4053b4c

address deepsource issues

68443a3

dakshvar22 reviewed May 15, 2020

View reviewed changes

tests/nlu/test_evaluation.py Show resolved Hide resolved

tabergma and others added 4 commits May 15, 2020 13:48

use io_utils

4bc4b46

fix call to write_text_file

ee15d91

Merge branch 'master' into response-testing-output

970e3be

added complete intent and response key to the 'intent' key

b32b27e

dakshvar22 approved these changes May 18, 2020

View reviewed changes

rasa/nlu/test.py Outdated Show resolved Hide resolved

rasa/nlu/test.py Outdated Show resolved Hide resolved

tabergma and others added 5 commits May 18, 2020 10:35

Update rasa/nlu/test.py

1e58904

Co-authored-by: Daksh Varshneya <[email protected]>

Update rasa/nlu/test.py

8161f13

Co-authored-by: Daksh Varshneya <[email protected]>

use intent_target key for confustion matrix

abff196

Merge branch 'master' into response-testing-output

4a16ff8

Merge branch 'master' into response-testing-output

2232147

tmbo added the status:ready-to-merge label May 18, 2020

Merge branch 'master' into response-testing-output

d1d38bf

rasabot added 2 commits May 18, 2020 14:35

Merge branch 'master' into response-testing-output

0a2e3b4

Merge branch 'master' into response-testing-output

43654ef

tmbo merged commit dd954c3 into master May 18, 2020

tmbo deleted the response-testing-output branch May 18, 2020 13:39

dakshvar22 mentioned this pull request May 29, 2020

Response Selector evaluation fails when testing on a test split #5916

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize testing output: Response Selection #5824

Standardize testing output: Response Selection #5824

tabergma commented May 14, 2020 •

edited

Loading

dakshvar22 May 15, 2020

tabergma May 15, 2020 •

edited

Loading

tabergma May 18, 2020

dakshvar22 commented May 18, 2020 •

edited

Loading

dakshvar22 commented May 18, 2020

tabergma commented May 18, 2020

dakshvar22 commented May 18, 2020

tabergma commented May 18, 2020

dakshvar22 commented May 18, 2020

dakshvar22 left a comment

Standardize testing output: Response Selection #5824

Standardize testing output: Response Selection #5824

Conversation

tabergma commented May 14, 2020 • edited Loading

dakshvar22 May 15, 2020

Choose a reason for hiding this comment

tabergma May 15, 2020 • edited Loading

Choose a reason for hiding this comment

tabergma May 18, 2020

Choose a reason for hiding this comment

dakshvar22 commented May 18, 2020 • edited Loading

dakshvar22 commented May 18, 2020

tabergma commented May 18, 2020

dakshvar22 commented May 18, 2020

tabergma commented May 18, 2020

dakshvar22 commented May 18, 2020

dakshvar22 left a comment

Choose a reason for hiding this comment

tabergma commented May 14, 2020 •

edited

Loading

tabergma May 15, 2020 •

edited

Loading

dakshvar22 commented May 18, 2020 •

edited

Loading