Tutorial for using Asymmetric models #3258

brianf-aws · 2024-12-06T01:44:36Z

Description

This PR implements the tutorial required for local asymmetric model embeddings. In this specific tutorial it is done using a docker container, to help users take advantage of multi node clusters using ML Nodes.

Related Issues

Resolves #3255

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

After replicating the local model embeddings. I am able to provide a high level solution of what the tutorial entails. Signed-off-by: Brian Flores <[email protected]>

Provides more context to each step Signed-off-by: Brian Flores <[email protected]>

Expands on the context of the previous commit and improvements in grammar and structure Signed-off-by: Brian Flores <[email protected]>

Signed-off-by: Brian Flores <[email protected]>

brianf-aws · 2024-12-06T23:06:34Z

There is a flaky test in the CI, can I get a retry please? :

org.opensearch.client.ResponseException: method [DELETE], host [http://127.0.0.1:33403/], URI [/_plugins/_ml/models/ooAcnpMBnp1fQxLKssuf], status line [HTTP/1.1 400 Bad Request]
    {"error":{"root_cause":[{"type":"status_exception","reason":"Model cannot be deleted in deploying or deployed state. Try undeploy model first then delete"}],"type":"status_exception","reason":"Model cannot be deleted in deploying or deployed state. Try undeploy model first then delete"},"status":400}

Zhangxunmt · 2024-12-06T23:06:56Z

docs/tutorials/semantic_search/asymmetric_embedding_model.md

+
+## Step 1: Spin Up a Docker OpenSearch Cluster
+
+To run OpenSearch in a local development environment, you can use Docker and a pre-configured `docker-compose` file.


This is not a requirement? I think the step 2-6 can also be done if you run an OpenSearch cluster locally without a docker?

Thats true, I chose a docker setup since there aren't many tutorials using it. Also its better with creating the tutorial with not having to register and deploy it again when I go back to use docker.

mingshl · 2024-12-06T23:30:04Z

can you please add the configuring the knn index using ml inference ingest processors, and also search using ml inference request processors? So we can give the full tutorials about how to use this model during ingest and search

zane-neo · 2024-12-10T01:34:23Z

docs/tutorials/semantic_search/asymmetric_embedding_model.md

+
+### b. Zip the Model Files
+
+In order to upload the model to OpenSearch, you must zip the necessary model files (`model.onnx`, `sentencepiece.bpe.model`, and `tokenizer.json`). The `model.onnx` file is located in the `onnx` directory of the cloned repository.


Is the onnx the only format that this model provided? Can we add pytorch format tutorial here?

Hey Zane! I can see why you ask Ill clarify that this is only for onnx, I havent used pytorch models so thats I wrote it like that

Got it, my suggestion is we can add both onnx and pytorch cases so that user can choose between them based on their cases.

Signed-off-by: Brian Flores <[email protected]>

brianf-aws · 2024-12-17T00:08:26Z

In order for this tutorial for all users following it, the following PR has to be merged to avoid the MLInput being null
#3281

kolchfa-aws

Some suggestions to clarify the text. In general, use sentence case capitalization and refer to the user as "you". Thanks!

kolchfa-aws · 2024-12-20T20:17:27Z

docs/tutorials/semantic_search/asymmetric_embedding_model.md

@@ -0,0 +1,539 @@
+# Tutorial: Running Asymmetric Semnantic Search within OpenSearch
+
+This tutorial demonstrates how to generate text embeddings using an asymmetric embedding model in OpenSearch which will be used


Suggested change

This tutorial demonstrates how to generate text embeddings using an asymmetric embedding model in OpenSearch which will be used

This tutorial demonstrates generating text embeddings using an asymmetric embedding model in OpenSearch. The embeddings will be used

kolchfa-aws · 2024-12-20T20:19:03Z

docs/tutorials/semantic_search/asymmetric_embedding_model.md

+# Tutorial: Running Asymmetric Semnantic Search within OpenSearch
+
+This tutorial demonstrates how to generate text embeddings using an asymmetric embedding model in OpenSearch which will be used
+to run semantic search. This is implemented within a Docker container, the example model used in this tutorial is the multilingual


Can you replace "this" with a noun? Is it the search that is implemented within a Docker container?

Suggested change

to run semantic search. This is implemented within a Docker container, the example model used in this tutorial is the multilingual

to run semantic search, implemented within a Docker container. The example model used in this tutorial is the multilingual

kolchfa-aws · 2024-12-20T20:19:20Z

docs/tutorials/semantic_search/asymmetric_embedding_model.md

+This tutorial demonstrates how to generate text embeddings using an asymmetric embedding model in OpenSearch which will be used
+to run semantic search. This is implemented within a Docker container, the example model used in this tutorial is the multilingual
+`intfloat/multilingual-e5-small` model from Hugging Face. 
+You will learn how to prepare the model, register it in OpenSearch, and run inference to generate embeddings.


Suggested change

You will learn how to prepare the model, register it in OpenSearch, and run inference to generate embeddings.

In this tutorial, you'll learn how to prepare the model, register it in OpenSearch, and run inference to generate embeddings.

kolchfa-aws · 2024-12-20T20:19:49Z

docs/tutorials/semantic_search/asymmetric_embedding_model.md

+
+- Docker Desktop installed and running on your local machine.
+- Basic familiarity with Docker and OpenSearch.
+- Access to the Hugging Face model `intfloat/multilingual-e5-small` (or another model of your choice).


Suggested change

- Access to the Hugging Face model `intfloat/multilingual-e5-small` (or another model of your choice).

- Access to the Hugging Face `intfloat/multilingual-e5-small` model (or another model of your choice).

kolchfa-aws · 2024-12-20T20:21:12Z

docs/tutorials/semantic_search/asymmetric_embedding_model.md

+- Access to the Hugging Face model `intfloat/multilingual-e5-small` (or another model of your choice).
+---
+
+## Step 1: Spin Up a Docker OpenSearch Cluster


Suggested change

## Step 1: Spin Up a Docker OpenSearch Cluster

## Step 1: Spin up a Docker OpenSearch cluster

kolchfa-aws · 2024-12-20T20:39:19Z

docs/tutorials/semantic_search/asymmetric_embedding_model.md

+```
+
+### 2.4 Test ingest data
+Perform bulk ingestion, this will now trigger the ingest pipeline to have embeddings for each document.


Suggested change

Perform bulk ingestion, this will now trigger the ingest pipeline to have embeddings for each document.

When you perform bulk ingestion, the ingest pipeline will generate embeddings for each document:

kolchfa-aws · 2024-12-20T20:39:34Z

docs/tutorials/semantic_search/asymmetric_embedding_model.md

+
+## 3. Run Semantic Search
+
+### 3.1 Create the Search Pipeline


Suggested change

### 3.1 Create the Search Pipeline

### 3.1 Create a search pipeline

kolchfa-aws · 2024-12-20T20:40:07Z

docs/tutorials/semantic_search/asymmetric_embedding_model.md

+## 3. Run Semantic Search
+
+### 3.1 Create the Search Pipeline
+Create the search pipeline which will convert your query into a embedding and run KNN on the index to return the best documents.


Suggested change

Create the search pipeline which will convert your query into a embedding and run KNN on the index to return the best documents.

Create a search pipeline that will convert your query into a embedding and run a k-NN search on the index to return the best-matching documents:

kolchfa-aws · 2024-12-20T20:40:55Z

docs/tutorials/semantic_search/asymmetric_embedding_model.md

+```
+
+### 3.1 Run Semantic Search
+In this scenario we are going to see the top 3 results, when asking about sporting activities in New York City.


Suggested change

In this scenario we are going to see the top 3 results, when asking about sporting activities in New York City.

Run a query about sporting activities in New York City:

kolchfa-aws · 2024-12-20T20:41:14Z

docs/tutorials/semantic_search/asymmetric_embedding_model.md

+}
+```
+
+Which yields the following 


Suggested change

Which yields the following

The response contains the top 3 matching documents:

adds initial tutorial contents

f605636

After replicating the local model embeddings. I am able to provide a high level solution of what the tutorial entails. Signed-off-by: Brian Flores <[email protected]>

brianf-aws requested a deployment to ml-commons-cicd-env-require-approval December 6, 2024 01:44 — with GitHub Actions Waiting

brianf-aws added 3 commits December 6, 2024 11:34

Add: initial thoughts to highlevel steps

bcdd180

Provides more context to each step Signed-off-by: Brian Flores <[email protected]>

expand more details on tutorial

8cd8359

Expands on the context of the previous commit and improvements in grammar and structure Signed-off-by: Brian Flores <[email protected]>

fix typo

c792403

Signed-off-by: Brian Flores <[email protected]>

brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval December 6, 2024 21:36 — with GitHub Actions Inactive

brianf-aws had a problem deploying to ml-commons-cicd-env-require-approval December 6, 2024 21:36 — with GitHub Actions Failure

brianf-aws marked this pull request as ready for review December 6, 2024 21:36

brianf-aws requested review from b4sjoo, dhrubo-os, jngz-es, model-collapse, rbhavna, ylwu-amzn, zane-neo, Zhangxunmt, austintlee, HenryL27 and xinyual as code owners December 6, 2024 21:36

Zhangxunmt reviewed Dec 6, 2024

View reviewed changes

brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval December 6, 2024 23:21 — with GitHub Actions Inactive

zane-neo reviewed Dec 10, 2024

View reviewed changes

brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval December 10, 2024 07:09 — with GitHub Actions Inactive

brianf-aws mentioned this pull request Dec 13, 2024

[BUG] MLInferenceIngestProcessor has xContentRegistry as null #3276

Open

mingshl mentioned this pull request Dec 16, 2024

Add support for asymmetric embedding models opensearch-project/neural-search#710

Open

5 tasks

adds Asymmetric semantic search

d48f75c

Signed-off-by: Brian Flores <[email protected]>

brianf-aws temporarily deployed to ml-commons-cicd-env-require-approval December 17, 2024 00:02 — with GitHub Actions Inactive

brianf-aws had a problem deploying to ml-commons-cicd-env-require-approval December 17, 2024 01:03 — with GitHub Actions Failure

kolchfa-aws reviewed Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial for using Asymmetric models #3258

Tutorial for using Asymmetric models #3258

brianf-aws commented Dec 6, 2024

brianf-aws commented Dec 6, 2024

Zhangxunmt Dec 6, 2024

brianf-aws Dec 6, 2024 •

edited

Loading

mingshl commented Dec 6, 2024

zane-neo Dec 10, 2024

brianf-aws Dec 10, 2024

zane-neo Dec 10, 2024

brianf-aws commented Dec 17, 2024

kolchfa-aws left a comment

kolchfa-aws Dec 20, 2024

kolchfa-aws Dec 20, 2024

kolchfa-aws Dec 20, 2024

kolchfa-aws Dec 20, 2024

kolchfa-aws Dec 20, 2024

kolchfa-aws Dec 20, 2024

kolchfa-aws Dec 20, 2024

kolchfa-aws Dec 20, 2024

kolchfa-aws Dec 20, 2024

kolchfa-aws Dec 20, 2024


		## Step 1: Spin Up a Docker OpenSearch Cluster

		To run OpenSearch in a local development environment, you can use Docker and a pre-configured `docker-compose` file.


		### b. Zip the Model Files

		In order to upload the model to OpenSearch, you must zip the necessary model files (`model.onnx`, `sentencepiece.bpe.model`, and `tokenizer.json`). The `model.onnx` file is located in the `onnx` directory of the cloned repository.

		@@ -0,0 +1,539 @@
		# Tutorial: Running Asymmetric Semnantic Search within OpenSearch

		This tutorial demonstrates how to generate text embeddings using an asymmetric embedding model in OpenSearch which will be used

	This tutorial demonstrates how to generate text embeddings using an asymmetric embedding model in OpenSearch which will be used
	This tutorial demonstrates generating text embeddings using an asymmetric embedding model in OpenSearch. The embeddings will be used

	to run semantic search. This is implemented within a Docker container, the example model used in this tutorial is the multilingual
	to run semantic search, implemented within a Docker container. The example model used in this tutorial is the multilingual

	You will learn how to prepare the model, register it in OpenSearch, and run inference to generate embeddings.
	In this tutorial, you'll learn how to prepare the model, register it in OpenSearch, and run inference to generate embeddings.

	- Access to the Hugging Face model `intfloat/multilingual-e5-small` (or another model of your choice).
	- Access to the Hugging Face `intfloat/multilingual-e5-small` model (or another model of your choice).

	Perform bulk ingestion, this will now trigger the ingest pipeline to have embeddings for each document.
	When you perform bulk ingestion, the ingest pipeline will generate embeddings for each document:

	### 3.1 Create the Search Pipeline
	### 3.1 Create a search pipeline

	Create the search pipeline which will convert your query into a embedding and run KNN on the index to return the best documents.
	Create a search pipeline that will convert your query into a embedding and run a k-NN search on the index to return the best-matching documents:

	In this scenario we are going to see the top 3 results, when asking about sporting activities in New York City.
	Run a query about sporting activities in New York City:

	Which yields the following
	The response contains the top 3 matching documents:

Tutorial for using Asymmetric models #3258

Are you sure you want to change the base?

Tutorial for using Asymmetric models #3258

Conversation

brianf-aws commented Dec 6, 2024

Description

Related Issues

Check List

brianf-aws commented Dec 6, 2024

Choose a reason for hiding this comment

brianf-aws Dec 6, 2024 • edited Loading

Choose a reason for hiding this comment

mingshl commented Dec 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brianf-aws commented Dec 17, 2024

kolchfa-aws left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brianf-aws Dec 6, 2024 •

edited

Loading