Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More RAG and Summarization Metrics #701

Merged
merged 44 commits into from
Aug 29, 2024
Merged

Conversation

bnativi
Copy link
Contributor

@bnativi bnativi commented Aug 5, 2024

  • add Q&A and RAG metrics that also do text comparison, i.e. LLM guided metrics that use the groundtruth.
  • rework Coherence to be specific to summarization (now SummaryCoherence), as that is how coherence is used in the literature. The name was changed to SummaryCoherence because RAGAS recently introduced a non-summarization coherence metric.
  • improve documentation.

Metrics added:

  • AnswerCorrectness
    • Uses Prediction.annotation.text and Groundtruth.annotation.text
    • RAGAS's answer correctness score is computed as a weighted sum of an f1 score with an answer similarity score. The answer similarity score is computed as an inner product of embeddings. I implemented AnswerCorrectness as just the f1 score, so that we don't require embeddings or an embedding model from the user.
  • ContextPrecision
    • Uses Datum.text, Prediction.annotation.context_list and Groundtruth.annotation.text
    • Evaluates a RAG retrieval mechanism using a ground truth (no prediction required).
  • ContextRecall
    • Uses Prediction.annotation.context_list and Groundtruth.annotation.text
    • Evaluates a RAG retrieval mechanism using a ground truth (no prediction required).

b.nativi added 24 commits July 30, 2024 17:02
…rmat and terminology in llm client instructions
@bnativi bnativi changed the title reorganize _compute_text_generation_metrics to prepare for more RAG m… More RAG and Summarization metrics Aug 5, 2024
@bnativi bnativi changed the title More RAG and Summarization metrics More RAG and Summarization Metrics Aug 5, 2024
Base automatically changed from improve_text_gen_instructions_and_tests to main August 6, 2024 20:27
@bnativi bnativi marked this pull request as ready for review August 26, 2024 18:40
@bnativi bnativi self-assigned this Aug 26, 2024
api/valor_api/backend/core/llm_clients.py Outdated Show resolved Hide resolved
api/valor_api/backend/metrics/text_generation.py Outdated Show resolved Hide resolved
docs/metrics.md Outdated Show resolved Hide resolved
@bnativi bnativi merged commit 8de55e2 into main Aug 29, 2024
14 checks passed
@bnativi bnativi deleted the more_rag_and_summarization_metrics branch August 29, 2024 05:08
@bnativi bnativi restored the more_rag_and_summarization_metrics branch August 29, 2024 05:08
@bnativi bnativi deleted the more_rag_and_summarization_metrics branch August 29, 2024 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants