-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement DeepEvalEvaluator
#346
Conversation
e03b23f
to
053b71f
Compare
35bd9ee
to
60e5922
Compare
DeepEvalEvaluator
and DeepEvalMetrics
DeepEvalEvaluator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with deepeval but code looks good to me. I left a question about api docs now that we merged the CI job that will attempt to run hatch run docs
on integrations' changes merged on main
. Another option might be making that CI job more resilient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the docs, LGTM
Related to #250.
We introduce
DeepEvalEvaluator
, a component that uses the DeepEval LLM evaluation framework to calculate evaluation metrics for RAG pipelines (among others). Refer deepset-ai/haystack#6784 for an overview of the API design.This PR introduces the following user-facing classes:
DeepEvalMetric
- A enumeration that lists the supported DeepEval metrics. Currently, only those metrics that are related to RAG pipelines are supported.DeepEvalEvaluator
- Th pipeline component interfaces with the evaluation framework. It accepts a single metric and its optional parameters. The inputs to the pipeline are dynamically configured depending on the metric. This is done with help of a metric descriptor table that contains metadata concerning input/output conversion formats, expected inputs/outputs, etc.The output of the component is a nested list of metric results. Each input can have one or more results, depending on the metric. Each result is a dictionary containing the following keys and values:
name
- The name of the metric.score
- The score of the metric.explanation
- An optional explanation of the score.