Clean up and log to file Conversation level performance measures. #8000
Labels
area:rasa-oss 🎡
Anything related to the open source Rasa framework
type:enhancement ✨
Additions of new features or changes to existing ones, should be doable in a single PR
Description of Problem:
When running rasa test, performance measures for the core model are printed to the console and/or logged to
results/story_report.json
. What is printed to the console is split into two main result blocks, CONVERSATION (or E2E when evaluating E2E data) and ACTION level performance.Digging into the CONVERSATION level measures, it seems that what is being computed is not very informative or useful. Because of how these metrics are computed, precision is always 1.0 (unless no stories are correct, in which case it is 0), F1-Score is just the harmonic mean of 1 and the recall (recall is not printed to the console), and recall = accuracy. Additionally, in-data fraction is always the same in the CONVERSATION block as it is in ACTION block. The only thing that would currently be helpful to display at the CONVERSATION level block would be accuracy (number of correct stories / total stories).
Overview of the Solution:
I propose only printing Correct and Accuracy at the CONVERSATION level.
Additionally, I propose to include in the
results/story_report.json
an additional field for conversation level accuracy:The text was updated successfully, but these errors were encountered: