-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics calculations to the inference pipeline #23
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #23 +/- ##
==========================================
+ Coverage 95.83% 96.32% +0.49%
==========================================
Files 3 3
Lines 120 136 +16
==========================================
+ Hits 115 131 +16
Misses 5 5 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Some comments/questions inline. Thx!
tests/test_main.py
Outdated
labels = [item["output"] for item in items] | ||
|
||
bleu, meteor = evaluate_documentation(labels, labels) | ||
assert bleu >= 0 and bleu <= 1, "BLEU score should be between 0 and 1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are bleu, meteor==1 when label==prediction?
It would actually be a bit clearer if you hard code some examples and assert specific values. Eg. all tokens match, extra token in the prediction, missing token in the prediction, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more tests cases, and code works as expected.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #23 +/- ##
==========================================
+ Coverage 96.15% 96.57% +0.42%
==========================================
Files 3 3
Lines 130 146 +16
==========================================
+ Hits 125 141 +16
Misses 5 5 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost done! See just a couple of suggestions in line.
Closing the Pull Request for evaluation metrics |
Change Description
Adding changes to add metrics to the inference pipeline in main.py. Added unit test case in test_main.py
closes #2
Solution Description
Added BLEU, METEOR evaluation metrics to the inference pipeline.
BLEU score calculation
https://www.baeldung.com/cs/nlp-bleu-score#:~:text=BLEU%20(Bilingual%20Evaluation%20Understudy)%20is,%2Danswering%20systems%2C%20and%20chatbots.
METEOR score calculation
https://huggingface.co/spaces/evaluate-metric/meteor
Code Quality
Project-Specific Pull Request Checklists