Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up and log to file Conversation level performance measures. #8000

Closed
kedz opened this issue Feb 19, 2021 · 1 comment · Fixed by #8030
Closed

Clean up and log to file Conversation level performance measures. #8000

kedz opened this issue Feb 19, 2021 · 1 comment · Fixed by #8030
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR

Comments

@kedz
Copy link
Contributor

kedz commented Feb 19, 2021

Description of Problem:
When running rasa test, performance measures for the core model are printed to the console and/or logged to results/story_report.json. What is printed to the console is split into two main result blocks, CONVERSATION (or E2E when evaluating E2E data) and ACTION level performance.

2021-02-19 14:07:43 INFO     rasa.core.test  - Evaluation Results on CONVERSATION level:     #<-- CONVERSATION LEVEL BLOCK
2021-02-19 14:07:43 INFO     rasa.core.test  - 	Correct:          42 / 43
2021-02-19 14:07:43 INFO     rasa.core.test  - 	F1-Score:         0.988
2021-02-19 14:07:43 INFO     rasa.core.test  - 	Precision:        1.000
2021-02-19 14:07:43 INFO     rasa.core.test  - 	Accuracy:         0.977
2021-02-19 14:07:43 INFO     rasa.core.test  - 	In-data fraction: 0.863
2021-02-19 14:07:43 INFO     rasa.core.test  - Stories report saved to results/story_report.json.
2021-02-19 14:07:43 INFO     rasa.core.test  - Evaluation Results on ACTION level:  #<-- ACTION LEVEL BLOCK
2021-02-19 14:07:43 INFO     rasa.core.test  -  Correct:          247 / 249
2021-02-19 14:07:43 INFO     rasa.core.test  -  F1-Score:         0.996
2021-02-19 14:07:43 INFO     rasa.core.test  -  Precision:        1.000
2021-02-19 14:07:43 INFO     rasa.core.test  -  Accuracy:         0.992
2021-02-19 14:07:43 INFO     rasa.core.test  -  In-data fraction: 0.863

Digging into the CONVERSATION level measures, it seems that what is being computed is not very informative or useful. Because of how these metrics are computed, precision is always 1.0 (unless no stories are correct, in which case it is 0), F1-Score is just the harmonic mean of 1 and the recall (recall is not printed to the console), and recall = accuracy. Additionally, in-data fraction is always the same in the CONVERSATION block as it is in ACTION block. The only thing that would currently be helpful to display at the CONVERSATION level block would be accuracy (number of correct stories / total stories).

Overview of the Solution:
I propose only printing Correct and Accuracy at the CONVERSATION level.

2021-02-18 10:55:04 INFO     rasa.core.test  - Evaluation Results on CONVERSATION level:
2021-02-18 10:55:04 INFO     rasa.core.test  -  Correct:          42 / 43
2021-02-18 10:55:04 INFO     rasa.core.test  -  Accuracy:         0.977
2021-02-18 10:55:04 INFO     rasa.core.test  - Stories report saved to results/story_report.json.
2021-02-18 10:55:04 INFO     rasa.core.test  - Evaluation Results on ACTION level:
2021-02-18 10:55:04 INFO     rasa.core.test  -  Correct:          247 / 249
2021-02-18 10:55:04 INFO     rasa.core.test  -  F1-Score:         0.996
2021-02-18 10:55:04 INFO     rasa.core.test  -  Precision:        1.000
2021-02-18 10:55:04 INFO     rasa.core.test  -  Accuracy:         0.992
2021-02-18 10:55:04 INFO     rasa.core.test  -  In-data fraction: 0.863

Additionally, I propose to include in the results/story_report.json an additional field for conversation level accuracy:

{
...
 "conversation_accuracy": {"accuracy": 0.977, "correct": 42, "total": 43}
...
}
@kedz kedz added type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Feb 19, 2021
@kedz kedz self-assigned this Feb 19, 2021
@sara-tagger
Copy link
Collaborator

Thanks for submitting this feature request 🚀 @joejuzl will get back to you about it soon! ✨

@kedz kedz linked a pull request Feb 23, 2021 that will close this issue
4 tasks
@kedz kedz closed this as completed in #8030 Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants