Skip to content

Commit

Permalink
add titan embeeding v2 to blueprint (opensearch-project#2480)
Browse files Browse the repository at this point in the history
Signed-off-by: Yaliang Wu <[email protected]>
  • Loading branch information
ylwu-amzn authored May 28, 2024
1 parent 2c11e7f commit 9b072c4
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 10 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ PUT /_cluster/settings

If you are using self-managed Opensearch, you should supply AWS credentials:

If you are using Titan Text Embedding V2, change "model" to `amazon.titan-embed-text-v2:0`
```json
POST /_plugins/_ml/connectors/_create
{
Expand All @@ -28,7 +29,8 @@ POST /_plugins/_ml/connectors/_create
"protocol": "aws_sigv4",
"parameters": {
"region": "<PLEASE ADD YOUR AWS REGION HERE>",
"service_name": "bedrock"
"service_name": "bedrock",
"model": "amazon.titan-embed-text-v1"
},
"credential": {
"access_key": "<PLEASE ADD YOUR AWS ACCESS KEY HERE>",
Expand All @@ -39,14 +41,14 @@ POST /_plugins/_ml/connectors/_create
{
"action_type": "predict",
"method": "POST",
"url": "https://bedrock-runtime.us-east-1.amazonaws.com/model/amazon.titan-embed-text-v1/invoke",
"url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/${parameters.model}/invoke",
"headers": {
"content-type": "application/json",
"x-amz-content-sha256": "required"
},
"request_body": "{ \"inputText\": \"${parameters.inputText}\" }",
"pre_process_function": "\n StringBuilder builder = new StringBuilder();\n builder.append(\"\\\"\");\n String first = params.text_docs[0];\n builder.append(first);\n builder.append(\"\\\"\");\n def parameters = \"{\" +\"\\\"inputText\\\":\" + builder + \"}\";\n return \"{\" +\"\\\"parameters\\\":\" + parameters + \"}\";",
"post_process_function": "\n def name = \"sentence_embedding\";\n def dataType = \"FLOAT32\";\n if (params.embedding == null || params.embedding.length == 0) {\n return params.message;\n }\n def shape = [params.embedding.length];\n def json = \"{\" +\n \"\\\"name\\\":\\\"\" + name + \"\\\",\" +\n \"\\\"data_type\\\":\\\"\" + dataType + \"\\\",\" +\n \"\\\"shape\\\":\" + shape + \",\" +\n \"\\\"data\\\":\" + params.embedding +\n \"}\";\n return json;\n "
"pre_process_function": "connector.pre_process.bedrock.embedding",
"post_process_function": "connector.post_process.bedrock.embedding"
}
]
}
Expand All @@ -64,7 +66,8 @@ POST /_plugins/_ml/connectors/_create
"protocol": "aws_sigv4",
"parameters": {
"region": "<PLEASE ADD YOUR AWS REGION HERE>",
"service_name": "bedrock"
"service_name": "bedrock",
"model": "amazon.titan-embed-text-v1"
},
"credential": {
"roleArn": "<PLEASE ADD YOUR AWS ROLE ARN HERE>"
Expand All @@ -79,8 +82,8 @@ POST /_plugins/_ml/connectors/_create
"x-amz-content-sha256": "required"
},
"request_body": "{ \"inputText\": \"${parameters.inputText}\" }",
"pre_process_function": "\n StringBuilder builder = new StringBuilder();\n builder.append(\"\\\"\");\n String first = params.text_docs[0];\n builder.append(first);\n builder.append(\"\\\"\");\n def parameters = \"{\" +\"\\\"inputText\\\":\" + builder + \"}\";\n return \"{\" +\"\\\"parameters\\\":\" + parameters + \"}\";",
"post_process_function": "\n def name = \"sentence_embedding\";\n def dataType = \"FLOAT32\";\n if (params.embedding == null || params.embedding.length == 0) {\n return params.message;\n }\n def shape = [params.embedding.length];\n def json = \"{\" +\n \"\\\"name\\\":\\\"\" + name + \"\\\",\" +\n \"\\\"data_type\\\":\\\"\" + dataType + \"\\\",\" +\n \"\\\"shape\\\":\" + shape + \",\" +\n \"\\\"data\\\":\" + params.embedding +\n \"}\";\n return json;\n "
"pre_process_function": "connector.pre_process.bedrock.embedding",
"post_process_function": "connector.post_process.bedrock.embedding"
}
]
}
Expand Down Expand Up @@ -151,7 +154,7 @@ POST /_plugins/_ml/models/sKR9PIsBQRofe4CSlUov/_predict
}
```

Sample response:
Sample response of Titan Text Embedding V1:
```json
{
"inference_results": [
Expand All @@ -177,3 +180,29 @@ Sample response:
}
```

Sample response of Titan Text Embedding V2:
```json
{
"inference_results": [
{
"output": [
{
"name": "sentence_embedding",
"data_type": "FLOAT32",
"shape": [
1024
],
"data": [
-0.041385926,
0.08503958,
0.0026220535,
...
]
}
],
"status_code": 200
}
]
}
```

Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ PUT my_books
Create sub-pipeline to generate embedding for one item in the array.

This pipeline contains 3 processors
- set processor: The `text_embedding` processor is unable to identify "_ingest._value.title". You need to copy "_ingest._value.title" to a temporary field for text_embedding to process it.
- set processor: The `text_embedding` processor is unable to identify "_ingest._value.title". You need to copy "_ingest._value.title" to a non-existing temporary field for text_embedding to process it.
- text_embedding processor: convert value of the temporary field to embedding
- remove processor: remove temporary field
```
Expand Down Expand Up @@ -228,7 +228,6 @@ Response
"description": "This is first book"
},
{
"title": "second book",
"description": "This is second book"
}
]
Expand Down

0 comments on commit 9b072c4

Please sign in to comment.