Skip to content

Commit

Permalink
WIP: feat: Add how to run complete incremental evaluation
Browse files Browse the repository at this point in the history
TASK: IL-313
  • Loading branch information
MerlinKallenbornAA committed May 21, 2024
1 parent 20db23b commit c168619
Show file tree
Hide file tree
Showing 4 changed files with 75 additions and 4 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
...

### New Features
...
- Add `how_to_implement_complete_incremental_evaluation_flow`

### Fixes
- The document index client now correctly URL-encodes document names in its queries.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ The how-tos are quick lookups about how to do things. Compared to the tutorials,
| [...retrieve data for analysis](./src/documentation/how_tos/how_to_retrieve_data_for_analysis.ipynb) | Retrieve experiment data in multiple different ways |
| [...implement a custom human evaluation](./src/documentation/how_tos/how_to_human_evaluation_via_argilla.ipynb) | Necessary steps to create an evaluation with humans as a judge via Argilla |
| [...implement elo evaluations](./src/documentation/how_tos/how_to_implement_elo_evaluations.ipynb) | Evaluate runs and create ELO ranking for them |

| [...implement complete incremental evaluation flow](./src/documentation/how_tos/how_to_implement_complete_incremental_evaluation_flow.ipynb) | Run complete incremental evaluation flow from runner to aggretation
# Models

Currently, we support a bunch of models accessible via the Aleph Alpha API. Depending on your local setup, you may even have additional models available.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from documentation.how_tos.example_data import DummyEloEvaluationLogic, example_data\n",
"from intelligence_layer.evaluation import (\n",
" IncrementalEvaluator,\n",
" InMemoryEvaluationRepository,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#How to implement complete incremental evaluation flows from running (multiple) tasks to aggregation\n",
"\n",
"This notebook outlines how to:\n",
" - run multiple tasks and configurations on the same dataset\n",
" - perform evaluations in an incremental fashion, i.e., adding additional runs to your existing evaluations without the need for recalculation\n",
" - run aggregation on these evaluations\n",
" - "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Step 0 Define ne\n",
"\n",
"\n",
"my_example_data = example_data()\n",
"print()\n",
"run_ids = [my_example_data.run_overview_1.id, my_example_data.run_overview_2.id]\n",
"\n",
"# Step 1\n",
"dataset_repository = my_example_data.dataset_repository\n",
"run_repository = my_example_data.run_repository\n",
"evaluation_repository = InMemoryEvaluationRepository()\n",
"evaluation_logic = DummyEloEvaluationLogic()\n",
"\n",
"# Step 2\n",
"evaluator = IncrementalEvaluator(\n",
" dataset_repository,\n",
" run_repository,\n",
" evaluation_repository,\n",
" \"My dummy evaluation\",\n",
" evaluation_logic,\n",
")\n",
"\n",
"evaluation_overview = evaluator.evaluate_runs(*run_ids)\n",
"\n",
"# Step 3\n",
"print(evaluation_overview.id)"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@
"evaluation_repository = InMemoryEvaluationRepository()\n",
"evaluation_logic = DummyEloEvaluationLogic()\n",
"\n",
"# Step 3\n",
"# Step 2\n",
"evaluator = IncrementalEvaluator(\n",
" dataset_repository,\n",
" run_repository,\n",
Expand All @@ -67,7 +67,7 @@
"\n",
"evaluation_overview = evaluator.evaluate_runs(*run_ids)\n",
"\n",
"# Step 4\n",
"# Step 3\n",
"print(evaluation_overview.id)"
]
}
Expand Down

0 comments on commit c168619

Please sign in to comment.