Skip to content

Commit

Permalink
adding text embedder sample for python
Browse files Browse the repository at this point in the history
  • Loading branch information
PaulTR committed Feb 6, 2023
1 parent e91e8ca commit e6eae7a
Showing 1 changed file with 213 additions and 0 deletions.
213 changes: 213 additions & 0 deletions examples/text_embedder/python/text_embedder.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "h2q27gKz1H20"
},
"source": [
"**Copyright 2023 The MediaPipe Authors. All Rights Reserved.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "TUfAcER1oUS6"
},
"outputs": [],
"source": [
"#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "L_cQX8dWu4Dv"
},
"source": [
"# Text Embedding with MediaPipe Tasks"
]
},
{
"cell_type": "markdown",
"source": [
"This notebook will show you how to use the MediaPipe Tasks Python API to compare text items, giving you a value for how similar they are. These values will range from -1 to 1 with 1 being the same text. This is done through cosine similarity."
],
"metadata": {
"id": "jERPUr5-vbYk"
}
},
{
"cell_type": "markdown",
"source": [
"##Preparation\n",
"\n",
"Let's start with installing MediaPipe.\n",
"\n",
"* If you see an error about flatbuffers incompatibility, it's fine to ignore it.\n",
"MediaPipe requires a newer version of flatbuffers (v2), which is incompatible with the older version of Tensorflow (v2.9) currently preinstalled on Colab.\n",
"\n",
"* If you install MediaPipe outside of Colab, you only need to run pip install mediapipe. It isn't necessary to explicitly install flatbuffers."
],
"metadata": {
"id": "7LSy09XEvuJy"
}
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "gxbHBsF-8Y_l",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "a6944bf4-3d62-4810-9b1a-339d953c9f69"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
"tensorflow 2.9.2 requires flatbuffers<2,>=1.12, but you have flatbuffers 2.0 which is incompatible.\u001b[0m\u001b[31m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m33.0/33.0 MB\u001b[0m \u001b[31m13.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25h"
]
}
],
"source": [
"!pip install -q flatbuffers==2.0.0\n",
"!pip install -q mediapipe==0.9.0.1"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "a49D7h4TVmru"
},
"source": [
"After installing your dependencies, you can download the text embedder model that will be used for this example."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "OMjuVQiDYJKF"
},
"outputs": [],
"source": [
"#@title Start downloading here.\n",
"!wget -O embedder.tflite -q https://storage.googleapis.com/mediapipe-assets/mobilebert_embedding_with_metadata.tflite"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Iy4r2_ePylIa"
},
"source": [
"## Running inference\n",
"\n",
"To run inference using the MediaPipe Text Embedder task, you just need to initialize the `TextEmbedder` using the model that you downloaded earlier, and then use that `TextEmbedder` to compare two separate strings. You can edit the two strings that will be used on the side of this section where you see `first_text` and `second_text`."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"id": "Yl_Oiye4mUuo",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "d26b3a4b-c4d4-42d8-8ff0-38074817f4af"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"0.9332579993217945\n"
]
}
],
"source": [
"from mediapipe.tasks import python\n",
"from mediapipe.tasks.python.components import processors\n",
"from mediapipe.tasks.python import text\n",
"from scipy.spatial import distance\n",
"\n",
"# Create your base options with the model that was downloaded earlier\n",
"base_options = python.BaseOptions(model_asset_path='embedder.tflite')\n",
"\n",
"# Set your values for using normalization and quantization\n",
"l2_normalize = True #@param {type:\"boolean\"}\n",
"quantize = False #@param {type:\"boolean\"}\n",
"\n",
"# Create the final set of options for the Embedder\n",
"options = text.TextEmbedderOptions(\n",
" base_options=base_options, l2_normalize=l2_normalize, quantize=quantize)\n",
"\n",
"with text.TextEmbedder.create_from_options(options) as embedder:\n",
" # Retrieve the first and second sets of text that will be compared\n",
" first_text = \"I'm feeling so good\" #@param {type:\"string\"}\n",
" second_text = \"I'm okay I guess\" #@param {type:\"string\"}\n",
"\n",
" # Convert both sets of text to embeddings\n",
" first_embedding_result = embedder.embed(first_text)\n",
" second_embedding_result = embedder.embed(second_text)\n",
" first_embedding_result.embeddings[0]\n",
"\n",
" # Retrieve the cosine similarity value from both sets of text, then take the\n",
" # cosine of that value to receie a decimal similarity value.\n",
" similarity = 1 - distance.cosine(first_embedding_result.embeddings[0].embedding.astype('float'),\n",
" second_embedding_result.embeddings[0].embedding.astype('float'))\n",
" print(similarity)"
]
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "Yn0qjHhVVlc_"
},
"execution_count": null,
"outputs": []
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.13"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

0 comments on commit e6eae7a

Please sign in to comment.