Add text generation task type #569

bnativi · 2024-04-29T17:05:52Z

Improvements

Added a new evaluate_text_generation task type that calculates four new metrics. These include text comparison metrics, which compare a prediction string to a groundtruth string. These also include llm-guided metrics which only sometimes require a groundtruth. The metrics are:
- AnswerRelevance (Q&A, llm-guided)
- Bias (general text generation, llm-guided)
- BLEU (text comparison)
- Coherence (general text generation, llm-guided)
- ContextRelevance (RAG, llm-guided)
- Faithfulness (RAG, llm-guided)
- Hallucination (RAG, llm-guided)
- ROUGE (text comparison)
- Toxicity (general text generation, llm-guided)
Added text generation notebook with three example use cases (RAG, summarization and content generation).
Added WrappedOpenAIClient and WrappedMistralAIClient to handle llm calls and llm-guided metric computations.
Changed from alpine to slim.

Testing

Added API functional and unit tests for the text generation metrics and llm clients.
Added client side integration tests for text generation metrics.
Added external integration tests to test the llm-guided metrics with OpenAI's API and Mistral's API. Because those API's do not give us fully deterministic control, the integration tests only check that valid metrics are returned and do not check the exact metric values.
- These should only run on merge to main, and not on pushes to other branches.
- They pass when run.
If I purposely try to make them fail, say by setting the OPENAI_API_KEY to "", then they fail as expected.
- The secret API keys should only be available to the external API integration tests and not to the rest of the integration tests. When I added a test to integration_tests/client/ that tries to make non-mocked OPENAI API calls, the test fails because the api key is not available for this test.

…llm_evaluation()

…ntegration tests

…eters on api side

…eration

api/valor_api/backend/metrics/text_generation.py

ntlind · 2024-07-15T19:31:29Z

the code, tests, and notebook all look good to me. I'm ready to approve once we figure out why the benchmarks are now failing.

api/valor_api/backend/metrics/metric_utils.py

This reverts commit 4014a62.

This reverts commit 1acda02.

api/valor_api/backend/models.py

Co-authored-by: b.nativi <[email protected]>

… into add_llm_guided_metrics

…_metrics

…ks/valor into add_llm_guided_metrics" This reverts commit 4d3e1c3, reversing changes made to 5527262.

b.nativi added 30 commits April 19, 2024 22:07

copy of Nicks draft for llm guided metric integration tests

1ad92be

updates to llm guided integration test

e5cc8ad

Merge branch 'main' into add_llm_guided_metrics

73b09d4

typo in llm guided integration test

f03fba5

LLM_EVALUATION task type, evaluate_llm_output(), first draft of test_…

283eb6e

…llm_evaluation()

Merge branch 'main' into add_llm_guided_metrics

0cc5f11

Merge branch 'main' into add_llm_guided_metrics

0615a8b

llm_url and llm_api_key added in multiple places in client, api and i…

660ee71

…ntegration tests

finalize integration test data llm_evaluation

36a9a17

Merge branch 'main' into add_llm_guided_metrics

d7b53b4

llm_evaluation functional tests

d2f2ebe

all llm evaluation metrics added

c054b47

commit with all tests except new tests passing

0ac0f4c

Merge branch 'main' into add_llm_guided_metrics

af90c95

first draft of coherence working

aa9c007

Merge branch 'main' into add_llm_guided_metrics

e1ae78c

Merge branch 'main' into add_llm_guided_metrics

b0657d1

updated metric specifications to metrics_to_return

631f3f9

add OpenAIClient class

a761d6c

fix on client side to match with metrics_to_return in EvaluationParam…

b041eb1

…eters on api side

fix type error

7b3d62e

add mocking for coherence

99d880b

add test for openai api connection

52acfa4

remove Metric from metric names, improved error handling in coherence

36eb7dd

mock openai api connection

15c4030

Merge branch 'main' into add_llm_guided_metrics

c9a29cb

use grouper_id = label.key for llm evaluation

72aaa7e

change CoherenceMetric from using label_key to using label

d31a26c

Merge branch 'main' into add_llm_guided_metrics

cbe79ec

Merge branch 'main' into add_llm_guided_metrics

ddcb7e3

b.nativi added 2 commits July 12, 2024 19:13

merge with main, datum filtering is no longer functional for text gen…

0375bae

…eration

add datum_filtering back into text generation metrics

e7d2179

bnativi commented Jul 12, 2024

View reviewed changes

api/valor_api/backend/metrics/text_generation.py Show resolved Hide resolved

b.nativi added 2 commits July 15, 2024 22:17

try removing slim for alpine

d87c05e

switch back to slim

e7823b0

czaloom reviewed Jul 16, 2024

View reviewed changes

api/valor_api/backend/metrics/metric_utils.py Outdated Show resolved Hide resolved

b.nativi added 8 commits July 16, 2024 20:02

revert valor_api/backend/core/evaluation.py

1acda02

comment out all text gen tests

53b87fc

revert valor_api/backend/models.py and migrations

4014a62

Revert "revert valor_api/backend/models.py and migrations"

02cbd52

This reverts commit 4014a62.

Revert "revert valor_api/backend/core/evaluation.py"

bbe0458

This reverts commit 1acda02.

set object detection benchmark timeout at 60

e13a1fd

uncomment text gen tests

88e8e0c

Merge branch 'main' into add_llm_guided_metrics

e601217

czaloom reviewed Jul 18, 2024

View reviewed changes

api/valor_api/backend/models.py Outdated Show resolved Hide resolved

czaloom reviewed Jul 18, 2024

View reviewed changes

api/valor_api/backend/models.py Outdated Show resolved Hide resolved

bnativi and others added 5 commits July 25, 2024 11:37

Add more llm guided metrics (#677)

be74b2a

Co-authored-by: b.nativi <[email protected]>

merge with main

03ddb6b

set object detection benchmark timeout back to the same time as main

fcc277b

reverted filter preparation

5527262

Merge branch 'add_llm_guided_metrics' of github.com:Striveworks/valor…

4d3e1c3

… into add_llm_guided_metrics

ntlind approved these changes Jul 26, 2024

View reviewed changes

b.nativi added 5 commits July 26, 2024 17:17

code coverage

6ea9d01

Merge branch 'text_comparison_llm_guided_metrics' into add_llm_guided…

67442ca

…_metrics

revert removal of _format_context()

3755cac

Revert "Merge branch 'add_llm_guided_metrics' of github.com:Strivewor…

0baa229

…ks/valor into add_llm_guided_metrics" This reverts commit 4d3e1c3, reversing changes made to 5527262.

revert to last stable branch

efbbe20

bnativi merged commit b1d5030 into main Jul 26, 2024
12 checks passed

bnativi deleted the add_llm_guided_metrics branch July 26, 2024 20:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add text generation task type #569

Add text generation task type #569

bnativi commented Apr 29, 2024 •

edited

Loading

ntlind commented Jul 15, 2024

Add text generation task type #569

Add text generation task type #569

Conversation

bnativi commented Apr 29, 2024 • edited Loading

Improvements

Testing

ntlind commented Jul 15, 2024

bnativi commented Apr 29, 2024 •

edited

Loading