-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #9 from aleph-alpha-intelligence-layer/improve-min…
…dset-classify Improve mindset classify
- Loading branch information
Showing
10 changed files
with
310 additions
and
101 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Embedding-Based Classification\n", | ||
"\n", | ||
"Large language model embeddings offer a powerful approach to text classification.\n", | ||
"In this method, each example from various classes is transformed into a vector representation using the embeddings from the language model.\n", | ||
"These embedded vectors capture the semantic essence of the text.\n", | ||
"Once this is done, clusters of embeddings are formed for each class, representing the centroid or the average meaning of the examples within that class.\n", | ||
"When a new piece of text needs to be classified, it is first embedded using the same language model.\n", | ||
"This new embedded vector is then compared to the pre-defined clusters for each class using a cosine similarity.\n", | ||
"The class whose cluster is closest to the new text's embedding is then assigned to the text, thereby achieving classification.\n", | ||
"This method leverages the deep semantic understanding of large language models to classify texts with high accuracy and nuance.\n", | ||
"\n", | ||
"### When should you use embedding-based classification?\n", | ||
"\n", | ||
"We recommend using this type of classification when...\n", | ||
"- ...proper classification requires fine-grained control over the classes' definitions.\n", | ||
"- ...the labels can be defined mostly or purely by the semantic meaning of the examples.\n", | ||
"- ...examples for each label are readily available.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Let's start by instantiating a classifier for sentiment classification." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from os import getenv\n", | ||
"\n", | ||
"from aleph_alpha_client import Client\n", | ||
"\n", | ||
"from intelligence_layer.use_cases.classify.embedding_based_classify import EmbeddingBasedClassify, LabelWithExamples\n", | ||
"\n", | ||
"\n", | ||
"client = Client(getenv(\"AA_TOKEN\"))\n", | ||
"labels_with_examples = [\n", | ||
" LabelWithExamples(\n", | ||
" name=\"positive\",\n", | ||
" examples=[\n", | ||
" \"I really like this.\",\n", | ||
" \"Wow, your hair looks great!\",\n", | ||
" \"We're so in love.\",\n", | ||
" \"That truly was the best day of my life!\",\n", | ||
" \"What a great movie.\"\n", | ||
" ],\n", | ||
" ),\n", | ||
" LabelWithExamples(\n", | ||
" name=\"negative\",\n", | ||
" examples=[\n", | ||
" \"I really dislike this.\",\n", | ||
" \"Ugh, Your hair looks horrible!\",\n", | ||
" \"We're not in love anymore.\",\n", | ||
" \"My day was very bad, I did not have a good time.\",\n", | ||
" \"They make terrible food.\"\n", | ||
" ],\n", | ||
" ),\n", | ||
"]\n", | ||
"classify = EmbeddingBasedClassify(labels_with_examples, client)\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Alright, let's classify a new example!" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from intelligence_layer.core.logger import InMemoryDebugLogger\n", | ||
"from intelligence_layer.use_cases.classify.classify import ClassifyInput\n", | ||
"\n", | ||
"\n", | ||
"classify_input = ClassifyInput(\n", | ||
" chunk=\"It was very awkward with him, I did not enjoy it.\",\n", | ||
" labels=frozenset(l.name for l in labels_with_examples)\n", | ||
")\n", | ||
"logger = InMemoryDebugLogger(name=\"Classify\")\n", | ||
"result = classify.run(classify_input, logger)\n", | ||
"result" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "3.10-intelligence", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.4" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.