Merge pull request #9 from aleph-alpha-intelligence-layer/improve-min…

…dset-classify Improve mindset classify
Aleph-Alpha · Oct 27, 2023 · f7da8d4 · f7da8d4
2 parents 7b3dd22 + aa89a4a
commit f7da8d4
Show file tree

Hide file tree

Showing 10 changed files with 310 additions and 101 deletions.
diff --git a/README.md b/README.md
@@ -18,13 +18,14 @@ The key features of the Intelligence Layer are:
 
 Not sure where to start? Familiarize yourself with the Intelligence Layer using the below notebooks.
 
-| Order | Task               | Description                             | Notebook 📓                                                   |
-| ----- | ------------------ | --------------------------------------- | ------------------------------------------------------------- |
-| 1     | Summarization      | Summarize a document                    | [summarize.ipynb](./src/examples/summarize.ipynb)             |
-| 2     | Question Answering | Various approaches for QA               | [qa.ipynb](./src/examples/qa.ipynb)                           |
-| 3     | Quickstart task    | Build a custom task for your use case   | [quickstart_task.ipynb](./src/examples/quickstart_task.ipynb) |
-| 4     | Classification     | Conduct zero-shot text classification   | [classify.ipynb](./src/examples/classify.ipynb)               |
-| 5     | Document Index     | Connect your proprietary knowledge base | [document_index.ipynb](./src/examples/document_index.ipynb)   |
+| Order | Task                           | Description                             | Notebook 📓                                                                     |
+| ----- | ------------------------------ | --------------------------------------- | ------------------------------------------------------------------------------- |
+| 1     | Summarization                  | Summarize a document                    | [summarize.ipynb](./src/examples/summarize.ipynb)                               |
+| 2     | Question Answering             | Various approaches for QA               | [qa.ipynb](./src/examples/qa.ipynb)                                             |
+| 3     | Quickstart task                | Build a custom task for your use case   | [quickstart_task.ipynb](./src/examples/quickstart_task.ipynb)                   |
+| 4     | Single label Classification    | Conduct zero-shot text classification   | [single_label_classify.ipynb](./src/examples/single_label_classify.ipynb)       |
+| 5     | Embedding based Classification | Classify texts on the basis of examples | [embedding_based_classify.ipynb](./src/examples/embedding_based_classify.ipynb) |
+| 6     | Document Index                 | Connect your proprietary knowledge base | [document_index.ipynb](./src/examples/document_index.ipynb)                     |
 
 ## Getting started with the Jupyter Notebooks
 

diff --git a/src/examples/embedding_based_classify.ipynb b/src/examples/embedding_based_classify.ipynb
@@ -0,0 +1,120 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Embedding-Based Classification\n",
+    "\n",
+    "Large language model embeddings offer a powerful approach to text classification.\n",
+    "In this method, each example from various classes is transformed into a vector representation using the embeddings from the language model.\n",
+    "These embedded vectors capture the semantic essence of the text.\n",
+    "Once this is done, clusters of embeddings are formed for each class, representing the centroid or the average meaning of the examples within that class.\n",
+    "When a new piece of text needs to be classified, it is first embedded using the same language model.\n",
+    "This new embedded vector is then compared to the pre-defined clusters for each class using a cosine similarity.\n",
+    "The class whose cluster is closest to the new text's embedding is then assigned to the text, thereby achieving classification.\n",
+    "This method leverages the deep semantic understanding of large language models to classify texts with high accuracy and nuance.\n",
+    "\n",
+    "### When should you use embedding-based classification?\n",
+    "\n",
+    "We recommend using this type of classification when...\n",
+    "- ...proper classification requires fine-grained control over the classes' definitions.\n",
+    "- ...the labels can be defined mostly or purely by the semantic meaning of the examples.\n",
+    "- ...examples for each label are readily available.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's start by instantiating a classifier for sentiment classification."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from os import getenv\n",
+    "\n",
+    "from aleph_alpha_client import Client\n",
+    "\n",
+    "from intelligence_layer.use_cases.classify.embedding_based_classify import EmbeddingBasedClassify, LabelWithExamples\n",
+    "\n",
+    "\n",
+    "client = Client(getenv(\"AA_TOKEN\"))\n",
+    "labels_with_examples = [\n",
+    "    LabelWithExamples(\n",
+    "        name=\"positive\",\n",
+    "        examples=[\n",
+    "            \"I really like this.\",\n",
+    "            \"Wow, your hair looks great!\",\n",
+    "            \"We're so in love.\",\n",
+    "            \"That truly was the best day of my life!\",\n",
+    "            \"What a great movie.\"\n",
+    "        ],\n",
+    "    ),\n",
+    "    LabelWithExamples(\n",
+    "        name=\"negative\",\n",
+    "        examples=[\n",
+    "            \"I really dislike this.\",\n",
+    "            \"Ugh, Your hair looks horrible!\",\n",
+    "            \"We're not in love anymore.\",\n",
+    "            \"My day was very bad, I did not have a good time.\",\n",
+    "            \"They make terrible food.\"\n",
+    "        ],\n",
+    "    ),\n",
+    "]\n",
+    "classify = EmbeddingBasedClassify(labels_with_examples, client)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Alright, let's classify a new example!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from intelligence_layer.core.logger import InMemoryDebugLogger\n",
+    "from intelligence_layer.use_cases.classify.classify import ClassifyInput\n",
+    "\n",
+    "\n",
+    "classify_input = ClassifyInput(\n",
+    "    chunk=\"It was very awkward with him, I did not enjoy it.\",\n",
+    "    labels=frozenset(l.name for l in labels_with_examples)\n",
+    ")\n",
+    "logger = InMemoryDebugLogger(name=\"Classify\")\n",
+    "result = classify.run(classify_input, logger)\n",
+    "result"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "3.10-intelligence",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/src/examples/quickstart_task.ipynb b/src/examples/quickstart_task.ipynb
@@ -436,7 +436,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.11.4"
   }
  },
  "nbformat": 4,

diff --git a/src/examples/classify.ipynb → src/examples/single_label_classify.ipynb b/src/examples/classify.ipynb → src/examples/single_label_classify.ipynb
@@ -4,20 +4,23 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Classify\n",
+    "# Single Label Classification\n",
     "\n",
-    "Classification is a methodology that tries to match a text to the correct label.\n",
+    "Single-label classification, also known as single-class or binary classification, refers to the task of categorizing data points into one of n distinct categories or classes.\n",
+    "In this type of classification, each input is assigned to only one class, ensuring that no overlap exists between categories.\n",
+    "Common applications of single-label classification include email spam detection, where emails are classified as either \"spam\" or \"not spam\", or sentiment classification, where a text can be \"positive\", \"negative\" or \"neutral\".\n",
+    "The primary goal is to train a model that can accurately predict the correct class for any given input based on its features.\n",
     "\n",
     "### Prompt-based classification\n",
     "\n",
-    "Prompt-based classification is a methodology that relies purely on prompting the LLM in a specific way.\n",
+    "Here, we'll use a purely prompt-based approach for classification.\n",
     "\n",
     "### When should you use prompt-based classification?\n",
     "\n",
-    "Some situations when you would use this methodology is when:\n",
-    "- The labels are easily understood (they don't require explanation or examples), for example sentiment analysis\n",
-    "- The labels are not recognized by their semantic meaning, e.g. \"reasoning\" tasks like classifying contradictions\n",
-    "- You don't have many examples\n",
+    "We recommend using this type of classification when...\n",
+    "- ...the labels are easily understood (they don't require explanation or examples).\n",
+    "- ...the labels cannot be recognized purely by their semantic meaning.\n",
+    "- ...many examples for each label aren't readily available.\n",
     "\n",
     "### Example snippet\n",
     "\n",
@@ -55,8 +58,7 @@
     "debug_log = InMemoryDebugLogger(name=\"classify\")\n",
     "output = task.run(input, debug_log)\n",
     "for label, score in output.scores.items():\n",
-    "    print(f\"{label}: {round(score, 4)}\")\n",
-    "# debug_log\n"
+    "    print(f\"{label}: {round(score, 4)}\")\n"
    ]
   },
   {
@@ -251,9 +253,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from intelligence_layer.use_cases.classify.single_label_classify import SingleLabelClassifyEvaluator\n",
+    "from intelligence_layer.use_cases.classify.classify import ClassifyEvaluator\n",
     "\n",
-    "evaluator = SingleLabelClassifyEvaluator(task)\n",
+    "evaluator = ClassifyEvaluator(task)\n",
     "classify_input = ClassifyInput(\n",
     "        chunk=Chunk(\"This is good\"),\n",
     "        labels=frozenset({\"positive\", \"negative\"}),\n",

diff --git a/src/intelligence_layer/use_cases/classify/classify.py b/src/intelligence_layer/use_cases/classify/classify.py
@@ -1,9 +1,12 @@
 from typing import (
     Mapping,
+    Sequence,
 )
 
 from pydantic import BaseModel
-from intelligence_layer.core.task import Chunk, Probability
+from intelligence_layer.core.evaluator import Evaluator
+from intelligence_layer.core.logger import DebugLogger
+from intelligence_layer.core.task import Chunk, Probability, Task
 
 
 class ClassifyInput(BaseModel):
@@ -29,3 +32,74 @@ class ClassifyOutput(BaseModel):
     """
 
     scores: Mapping[str, Probability]
+
+
+class Classify(Task[ClassifyInput, ClassifyOutput]):
+    """Placeholder class for any classifier implementation."""
+
+    pass
+
+
+class ClassifyEvaluation(BaseModel):
+    """The evaluation of a single label classification run.
+
+    Attributes:
+        correct: Was the highest scoring class from the output in the set of "correct classes"
+        output: The actual output from the task run
+    """
+
+    correct: bool
+    output: ClassifyOutput
+
+
+class AggregatedClassifyEvaluation(BaseModel):
+    """The aggregated evaluation of a single label classify implementation against a dataset.
+
+    Attributes:
+        percentage_correct: Percentage of answers that were considered to be correct
+        evaluation: The actual evaluations
+    """
+
+    percentage_correct: float
+    evaluations: Sequence[ClassifyEvaluation]
+
+
+class ClassifyEvaluator(
+    Evaluator[
+        ClassifyInput,
+        Sequence[str],
+        ClassifyEvaluation,
+        AggregatedClassifyEvaluation,
+    ]
+):
+    def __init__(self, task: Classify):
+        self.task = task
+
+    def evaluate(
+        self,
+        input: ClassifyInput,
+        logger: DebugLogger,
+        expected_output: Sequence[str],
+    ) -> ClassifyEvaluation:
+        output = self.task.run(input, logger)
+        sorted_classes = sorted(
+            output.scores.items(), key=lambda item: item[1], reverse=True
+        )
+        if sorted_classes[0][0] in expected_output:
+            correct = True
+        else:
+            correct = False
+        return ClassifyEvaluation(correct=correct, output=output)
+
+    def aggregate(
+        self, evaluations: Sequence[ClassifyEvaluation]
+    ) -> AggregatedClassifyEvaluation:
+        if len(evaluations) != 0:
+            correct_answers = len(
+                [eval.correct for eval in evaluations if eval.correct == True]
+            ) / len(evaluations)
+        else:
+            correct_answers = 0
+        return AggregatedClassifyEvaluation(
+            percentage_correct=correct_answers, evaluations=evaluations
+        )
diff --git a/src/intelligence_layer/use_cases/classify/embedding_based_classify.py b/src/intelligence_layer/use_cases/classify/embedding_based_classify.py
@@ -13,7 +13,11 @@
 )
 from intelligence_layer.core.logger import DebugLogger
 from intelligence_layer.core.task import Chunk, Probability, Task
-from intelligence_layer.use_cases.classify.classify import ClassifyInput, ClassifyOutput
+from intelligence_layer.use_cases.classify.classify import (
+    Classify,
+    ClassifyInput,
+    ClassifyOutput,
+)
 from intelligence_layer.use_cases.search.filter_search import (
     FilterSearch,
     FilterSearchInput,
@@ -46,7 +50,7 @@ class EmbeddingBasedClassifyScoring(Enum):
     MEAN_TOP_5 = 5
 
 
-class EmbeddingBasedClassify(Task[ClassifyInput, ClassifyOutput]):
+class EmbeddingBasedClassify(Classify):
     """Task that classifies a given input text based on examples.
 
     The input contains a complete set of all possible labels. The output will return a score
@@ -119,7 +123,7 @@ def run(self, input: ClassifyInput, logger: DebugLogger) -> ClassifyOutput:
         )
         unknown_labels = input.labels - available_labels
         if unknown_labels:
-            raise ValueError(f"Got unexpected labels: {unknown_labels}")
+            raise ValueError(f"Got unexpected labels: {', '.join(unknown_labels)}.")
         labels = list(input.labels)  # converting to list to preserve order
         results_per_label = [
             self._label_search(input.chunk, label, logger) for label in labels