More RAG and Summarization Metrics #701

bnativi · 2024-08-05T20:34:34Z

add Q&A and RAG metrics that also do text comparison, i.e. LLM guided metrics that use the groundtruth.
rework Coherence to be specific to summarization (now SummaryCoherence), as that is how coherence is used in the literature. The name was changed to SummaryCoherence because RAGAS recently introduced a non-summarization coherence metric.
improve documentation.

Metrics added:

AnswerCorrectness
- Uses Prediction.annotation.text and Groundtruth.annotation.text
- RAGAS's answer correctness score is computed as a weighted sum of an f1 score with an answer similarity score. The answer similarity score is computed as an inner product of embeddings. I implemented AnswerCorrectness as just the f1 score, so that we don't require embeddings or an embedding model from the user.
ContextPrecision
- Uses Datum.text, Prediction.annotation.context_list and Groundtruth.annotation.text
- Evaluates a RAG retrieval mechanism using a ground truth (no prediction required).
ContextRecall
- Uses Prediction.annotation.context_list and Groundtruth.annotation.text
- Evaluates a RAG retrieval mechanism using a ground truth (no prediction required).

…est_llm_clients.py

…ode coverage in text_generation.py

…rmat and terminology in llm client instructions

…etrics

… part 2

…evance to example notebook

api/valor_api/backend/metrics/text_generation.py

api/valor_api/backend/core/llm_clients.py

api/valor_api/backend/metrics/text_generation.py

docs/metrics.md

api/valor_api/backend/core/llm_clients.py

api/valor_api/backend/core/llm_instructions_analysis.py

b.nativi added 24 commits July 30, 2024 17:02

change context to contexts where appropriate, minor improvements to t…

523bf5b

…est_llm_clients.py

add unit tests for Contexts

b1c5794

use openai and mistral client names in integration tests to improve c…

244a171

…ode coverage in text_generation.py

change contexts to context_list in most places, standardization of fo…

24e4a50

…rmat and terminology in llm client instructions

reorganize external tests

7aa1151

add bias to external integration tests

d0bd4bc

add context relevance and faithfulness to external integration tests

e6f373f

add hallucination to external integration tests

a845b2b

add toxicity to external integration tests

f833458

fix hallucination functional tests and external integration tests

74831aa

slight improvement to hallucination external integration test

6b1fc7d

Merge branch 'main' into improve_text_gen_instructions_and_tests

e36e355

minor cleanup and additions

74a2c7a

create version of llm instructions with and without analysis

37e7821

Merge branch 'main' into improve_text_gen_instructions_and_tests

e2a702e

grammar check context and contexts

b778059

more grammer context vs contexts

b632ac4

minor name changes and comment changes

b2952d6

minor name changes and comment changes part 2

be540b8

move BadValueInTestLLMClientsError to test_llm_clients.py

9b4dec0

docstrings for llm instructions

5544aa1

Merge branch 'main' into improve_text_gen_instructions_and_tests

51412a5

rename migrations after merge with main

ceba785

reorganize _compute_text_generation_metrics to prepare for more RAG m…

4b70278

…etrics

bnativi changed the title ~~reorganize _compute_text_generation_metrics to prepare for more RAG m…~~ More RAG and Summarization metrics Aug 5, 2024

bnativi changed the title ~~More RAG and Summarization metrics~~ More RAG and Summarization Metrics Aug 5, 2024

add AnswerCorrectness with tests

d6907c6

Base automatically changed from improve_text_gen_instructions_and_tests to main August 6, 2024 20:27

b.nativi added 2 commits August 15, 2024 18:58

merge with main

241537c

merge with main

ae8164c

b.nativi added 7 commits August 21, 2024 20:44

Merge branch 'main' into more_rag_and_summarization_metrics

1ee343e

external tests for AnswerCorrectness, ContextPrecision, ContextRecall…

a15ad81

… part 2

Merge branch 'main' into more_rag_and_summarization_metrics

a3937a0

add AnswerCorrectness, ContextPrecision, ContextRecall and ContextRel…

7770c33

…evance to example notebook

rework coherence to be summarization specific

dfc7552

example notebook and docs updated for coherence

d300dd6

Merge branch 'main' into more_rag_and_summarization_metrics

f6e9d21

bnativi commented Aug 25, 2024

View reviewed changes

api/valor_api/backend/metrics/text_generation.py Outdated Show resolved Hide resolved

b.nativi added 3 commits August 26, 2024 18:13

review - update code coverage, docs

002f56d

Merge branch 'main' into more_rag_and_summarization_metrics

e04337e

Improve clarity of summarization docs

4abc884

bnativi marked this pull request as ready for review August 26, 2024 18:40

bnativi requested review from czaloom, ntlind and ekorman as code owners August 26, 2024 18:40

bnativi self-assigned this Aug 26, 2024

bnativi commented Aug 26, 2024

View reviewed changes

api/valor_api/backend/core/llm_clients.py Outdated Show resolved Hide resolved

ntlind reviewed Aug 27, 2024

View reviewed changes

api/valor_api/backend/core/llm_clients.py Outdated Show resolved Hide resolved

api/valor_api/backend/metrics/text_generation.py Outdated Show resolved Hide resolved

docs/metrics.md Outdated Show resolved Hide resolved

czaloom reviewed Aug 27, 2024

View reviewed changes

api/valor_api/backend/core/llm_clients.py Show resolved Hide resolved

czaloom reviewed Aug 27, 2024

View reviewed changes

api/valor_api/backend/core/llm_instructions_analysis.py Show resolved Hide resolved

b.nativi added 4 commits August 27, 2024 18:08

rename Coherence to SummaryCoherence, metric docs additions

168a7ae

rename context_list to ordered_context_list for ContextPrecision

ee8db6e

rework ContextPrecision to aggregate over ground truths differently

dca76ea

Merge branch 'main' into more_rag_and_summarization_metrics

0c9f859

ntlind approved these changes Aug 28, 2024

View reviewed changes

Merge branch 'main' into more_rag_and_summarization_metrics

bafb1c2

bnativi merged commit 8de55e2 into main Aug 29, 2024
14 checks passed

bnativi deleted the more_rag_and_summarization_metrics branch August 29, 2024 05:08

bnativi restored the more_rag_and_summarization_metrics branch August 29, 2024 05:08

bnativi deleted the more_rag_and_summarization_metrics branch August 29, 2024 05:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More RAG and Summarization Metrics #701

More RAG and Summarization Metrics #701

bnativi commented Aug 5, 2024 •

edited

Loading

More RAG and Summarization Metrics #701

More RAG and Summarization Metrics #701

Conversation

bnativi commented Aug 5, 2024 • edited Loading

Metrics added:

bnativi commented Aug 5, 2024 •

edited

Loading