WIP: feat: Add how to run complete incremental evaluation

TASK: IL-313
Aleph-Alpha · May 21, 2024 · c168619 · c168619
1 parent 20db23b
commit c168619
Show file tree

Hide file tree

Showing 4 changed files with 75 additions and 4 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,7 +6,7 @@
 ...
 
 ### New Features
-...
+ - Add `how_to_implement_complete_incremental_evaluation_flow`
 
 ### Fixes
 - The document index client now correctly URL-encodes document names in its queries.

diff --git a/README.md b/README.md
@@ -180,7 +180,7 @@ The how-tos are quick lookups about how to do things. Compared to the tutorials,
 | [...retrieve data for analysis](./src/documentation/how_tos/how_to_retrieve_data_for_analysis.ipynb)                                                   | Retrieve experiment data in multiple different ways                        |
 | [...implement a custom human evaluation](./src/documentation/how_tos/how_to_human_evaluation_via_argilla.ipynb)                                        | Necessary steps to create an evaluation with humans as a judge via Argilla |
 | [...implement elo evaluations](./src/documentation/how_tos/how_to_implement_elo_evaluations.ipynb)                                                     | Evaluate runs and create ELO ranking for them                               |
-
+| [...implement complete incremental evaluation flow](./src/documentation/how_tos/how_to_implement_complete_incremental_evaluation_flow.ipynb)           | Run complete incremental evaluation flow from runner to aggretation
 # Models
 
 Currently, we support a bunch of models accessible via the Aleph Alpha API. Depending on your local setup, you may even have additional models available.

diff --git a/src/documentation/how_tos/how_to_implement_complete_incremental_evaluation_flow.ipynb b/src/documentation/how_tos/how_to_implement_complete_incremental_evaluation_flow.ipynb
@@ -0,0 +1,71 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from documentation.how_tos.example_data import DummyEloEvaluationLogic, example_data\n",
+    "from intelligence_layer.evaluation import (\n",
+    "    IncrementalEvaluator,\n",
+    "    InMemoryEvaluationRepository,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#How to implement complete incremental evaluation flows from running (multiple) tasks to aggregation\n",
+    "\n",
+    "This notebook outlines how to:\n",
+    "    - run multiple tasks and configurations on the same dataset\n",
+    "    - perform evaluations in an incremental fashion, i.e., adding additional runs to your existing evaluations without the need for recalculation\n",
+    "    - run aggregation on these evaluations\n",
+    "    - "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Step 0 Define ne\n",
+    "\n",
+    "\n",
+    "my_example_data = example_data()\n",
+    "print()\n",
+    "run_ids = [my_example_data.run_overview_1.id, my_example_data.run_overview_2.id]\n",
+    "\n",
+    "# Step 1\n",
+    "dataset_repository = my_example_data.dataset_repository\n",
+    "run_repository = my_example_data.run_repository\n",
+    "evaluation_repository = InMemoryEvaluationRepository()\n",
+    "evaluation_logic = DummyEloEvaluationLogic()\n",
+    "\n",
+    "# Step 2\n",
+    "evaluator = IncrementalEvaluator(\n",
+    "    dataset_repository,\n",
+    "    run_repository,\n",
+    "    evaluation_repository,\n",
+    "    \"My dummy evaluation\",\n",
+    "    evaluation_logic,\n",
+    ")\n",
+    "\n",
+    "evaluation_overview = evaluator.evaluate_runs(*run_ids)\n",
+    "\n",
+    "# Step 3\n",
+    "print(evaluation_overview.id)"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/src/documentation/how_tos/how_to_implement_elo_evaluations.ipynb b/src/documentation/how_tos/how_to_implement_elo_evaluations.ipynb
@@ -56,7 +56,7 @@
     "evaluation_repository = InMemoryEvaluationRepository()\n",
     "evaluation_logic = DummyEloEvaluationLogic()\n",
     "\n",
-    "# Step 3\n",
+    "# Step 2\n",
     "evaluator = IncrementalEvaluator(\n",
     "    dataset_repository,\n",
     "    run_repository,\n",
@@ -67,7 +67,7 @@
     "\n",
     "evaluation_overview = evaluator.evaluate_runs(*run_ids)\n",
     "\n",
-    "# Step 4\n",
+    "# Step 3\n",
     "print(evaluation_overview.id)"
    ]
   }