[Inference API] Add Amazon Bedrock support to Inference API #110248

markjhoy · 2024-06-27T23:06:08Z

Adds Amazon Bedrock text embeddings and chat completion support for the inference API.

Prerequisites to Model Creation

An AWS Account with Amazon Bedrock access
A pair of access and secret keys used to access Amazon Bedrock

Inference Model Creation:

PUT _inference/{task_type}/{inference_model_id}
{
  "service": "amazonbedrock",
  "service_settings": {
    "access_key": "{{aws_access_key}}",
    "secret_key": "{{aws_secret_key}}",
    "region": "<<aws_region>>",
    "provider": "<<provider>>",
    "model": "<<model_id>>",
    <<ADDITIONAL SERVICE SETTINGS (see below)>>,
  }
  "task_settings": {
    <<TASK SETTINGS (see below)>>
  }
}

Service Settings

Where provider is one of:

for text embeddings:
- amazontitan
- cohere
for chat completions:
- amazontitan
- anthropic
- ai21labs
- cohere
- meta
- mistral

The model_id should match a model you have access to when using a base model, or the ARN of the model if you are using a custom model based on a base model.

Also, be sure that the model you are using is supported in your region that you specify.

text_embedding tasks for Amazon Bedrock have the following additional, optional settings:

dimensions: The output dimensions to use for the inference
max_input_tokens: the maximum number of input tokens
similarity: the similarity measure to use

completion tasks for Amazon Bedrock do not have any additional service settings.

Task Settings

text_embedding tasks for Amazon Bedrock do not have any settings, and none should be sent.

chat_completion tasks have the following task settings available (all optional):

max_new_tokens: the max new tokens to produce
temperature: the temperature (0.0 to 1.0) to use
top_p: the top P metric to use (0.0 to 1.0)
top_k: and alternative to top_p from 0.0 to 1.0 (only available for Anthropic, Cohere, and Mistral)

Inference Model Inference

POST _inference/{task_type}/{inference_model_id}
{
  "input": <single string, or array of strings>
}

Manual Testing Completed:

Tested Embeddings Providers:

Amazon Titan
Cohere

Tested Chat Completion Providers:

elasticsearchmachine · 2024-07-02T02:02:14Z

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine · 2024-07-02T02:02:15Z

Pinging @elastic/ent-search-eng (Team:Enterprise Search)

davidkyle

LGTM

Epic PR 👏

maxhniebergall

Overall, great PR! Very well organized and easy to understand. I do have a concern about the embedding task settings.

maxhniebergall · 2024-07-04T13:52:27Z

x-pack/plugin/inference/licenses/jaxb-NOTICE.txt

@@ -0,0 +1 @@
+


is this intentionally left empty?

I think so -- these license files were brought over from the respository-s3 module pieces which also use the AWS SDK (and is blank there too)

maxhniebergall · 2024-07-04T14:00:50Z

...pack/inference/services/amazonbedrock/embeddings/AmazonBedrockEmbeddingsServiceSettings.java

+    private final Integer dimensions;
+    private final Boolean dimensionsSetByUser;
+    private final Integer maxInputTokens;
+    private final SimilarityMeasure similarity;


Shouldn't these be task settings, since they only apply to the embedding task?

Intuitively, I would think so, but all our services (OpenAI, Cohere, etc.) all have these in the service settings - I assume as these are rarely (if ever) changed when performing inference... @jonathan-buttner or @davidkyle - thoughts here? I'll keep them in the service settings for now, but just a note for the future if we ever want to change these to be task settings...

Cool, we will have to think about refactoring this in the future.

Shouldn't these be task settings, since they only apply to the embedding task?

Hmm I'm not sure what you mean @maxhniebergall . I believe these settings are only used for embedding tasks.

Do you mean that they should be task settings because they should be modifiable after an inference endpoint is created?

I don't think we'd want a user to be able to change these after setting up an inference endpoint (I suppose max input tokens maybe?). If a user generated text embeddings using a certain number of dimensions and expected similarity I would think it'd break the index if we stored different dimension sizes or potentially a different similarity during an inference request.

Is that what the difference between task settings and service settings is? Maybe we should rename them to MutableSettings and ImmutableSettings? I had thought that service settings were for every task in a service, and task settings were for when the settings were specific to a task.

Is that what the difference between task settings and service settings is?

Yep!

Maybe we should rename them to MutableSettings and ImmutableSettings?

That's true, it isn't intuitive that service settings aren't mutable and task settings are. I think the convention we went with was to reflect the field name from body of the request. And we add the task type in the class name to indicate what it's for.

I guess we could have the reverse problem if we named them mutable and immutable as you wouldn't know immediately what they refer too without looking at the class contents. I suppose we could include it in the name too.

Right, these are also serialized as part of the request, so we can't change the naming without a breaking change.

maxhniebergall · 2024-07-04T15:06:18Z

.../elasticsearch/xpack/inference/services/amazonbedrock/AmazonBedrockProviderCapabilities.java

+import java.util.List;
+import java.util.Map;
+
+public final class AmazonBedrockProviderCapabilities {


Awesome! Love how all of this info is in one spot

maxhniebergall · 2024-07-04T15:14:12Z

...main/java/org/elasticsearch/xpack/inference/services/amazonbedrock/AmazonBedrockService.java

+        var actionCreator = new AmazonBedrockActionCreator(amazonBedrockSender, this.getServiceComponents(), timeout);
+        if (model instanceof AmazonBedrockModel baseAmazonBedrockModel) {
+            var maxBatchSize = getEmbeddingsMaxBatchSize(baseAmazonBedrockModel.provider());
+            var batchedRequests = new EmbeddingRequestChunker(input, maxBatchSize, EmbeddingRequestChunker.EmbeddingType.FLOAT)


Do all of the bedrock models use Float embeddings?

Yep - both Amazon Titan and Cohere use floats in their output vectors

maxhniebergall · 2024-07-04T15:32:22Z

.../elasticsearch/xpack/inference/services/amazonbedrock/AmazonBedrockProviderCapabilities.java

+import java.util.Map;
+
+public final class AmazonBedrockProviderCapabilities {
+    private static final int DEFAULT_MAX_CHUNK_SIZE = 2048;


Why is this value here instead of in AmazonBedrockConstants?

maxhniebergall · 2024-07-04T15:37:23Z

...erence/services/amazonbedrock/completion/AmazonBedrockChatCompletionRequestTaskSettings.java

+import static org.elasticsearch.xpack.inference.services.amazonbedrock.AmazonBedrockConstants.TOP_K_FIELD;
+import static org.elasticsearch.xpack.inference.services.amazonbedrock.AmazonBedrockConstants.TOP_P_FIELD;
+
+public record AmazonBedrockChatCompletionRequestTaskSettings(


Why do we need this class and AmazonBedrockChatCompletionTaskSettings?

The Request task settings are ones that can be overridden during an inference POST request to Elasticsearch... the AmazonBedrockChatCompletionTaskSettings are the default task settings a user can set up when they create the model (PUT)

…_bedrock_inference_api

jonathan-buttner

Great work Mark! Thanks for all the refactoring. I left a comment about the dimensions set by user. The others are just nits.

x-pack/plugin/inference/build.gradle

jonathan-buttner · 2024-07-05T14:12:44Z

...pack/inference/services/amazonbedrock/embeddings/AmazonBedrockEmbeddingsServiceSettings.java

+    static final String DIMENSIONS_SET_BY_USER = "dimensions_set_by_user";
+
+    private final Integer dimensions;
+    private final Boolean dimensionsSetByUser;


I might have missed it but do we or do we plan on sending this to an external provider? I think I only saw that we use it in the updateModelWithEmbeddingDetails() method. If we don't send it to an external provider then the user set dimensions might mess the semantic text field 🤔 . What I mean is, if a users sets it to 5 but the actual stored embeddings is 1000 I'm not sure what happens.

I think my vote would be to remove this if we don't plan on using it in the near future. If we do plan on using it soon, we might want to prevent a user from setting it for now 🤷‍♂️

I think this is an oversight on my part... for Amazon Titan embeddings - you can pass the dimensions, but Cohere does not have this... I'll fix this up, with some validations on this.

Ah ok cool 👍 yeah don't forget to pass it along in the titan request, or at least I missed where we are doing that.

Sooooo - here's something silly... Amazon Bedrock G1 models don't allow the dimensions parameter, but the version 2 does..:

Titan Embeddings G1 - Text doesn't support the use of inference parameters. The following sections detail the request and response formats and provides a code example.

I think for now, I'll omit the dimensions all together because there's no definitive way we can tell which model the end user is using easily (we could check on the model ID as the string to see if it matches the v2 version, but, if they are using a custom ARN, it'd require a lot more hoops to jump through that I don't think is worth it at this point in time...)

If we do plan on using it soon, we might want to prevent a user from setting it for now

I'll go this route for now as it's probably the safest.

I think for now, I'll omit the dimensions all together because there's no definitive way we can tell which model the end user is using easily

👍 sounds good

jonathan-buttner · 2024-07-05T14:17:38Z

...erence/services/amazonbedrock/completion/AmazonBedrockChatCompletionRequestTaskSettings.java

+
+        ValidationException validationException = new ValidationException();
+
+        var temperature = extractOptionalDoubleInRange(


nit: I know I previously (in a different PR) asked that we adding functionality to validate the range. I think I've changed my position on that 😅 . To better support provider changes in these fields it probably makes sense to not validate them on our end and let the upstream provide return an error if they're not in a specific range/positive/negative etc.

I know we have this in a few places so you don't need to change it here but maybe going forward we rely more on the upstream provider to do this for us and we try to ensure the returned error message is clear enough to the user to know what needs to be changed.

While this is still in the works, I'll see if it's easy enough to change this...

I think for now, I'll keep this as-is... providing an invalid value does have an error message available in the exception from the SDK, but it's quite buried ... :

(about 30 more lines above this here...) ... (RequestExecutorService.java:192)\n\tat [email protected]/org.elasticsearch.xpack.inference.external.http.sender.AmazonBedrockRequestExecutorService.start(AmazonBedrockRequestExecutorService.java:19)\n\tat [email protected]/org.elasticsearch.xpack.inference.external.amazonbedrock.AmazonBedrockRequestSender.lambda$start$0(AmazonBedrockRequestSender.java:89)\n\tat [email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1570)\nCaused by: java.util.concurrent.ExecutionException: com.amazonaws.services.bedrockruntime.model.ValidationException: 1 validation error detected: Value '500.0' at 'inferenceConfig.temperature' failed to satisfy constraint: Member must have value less than or equal to 1 (Service: AmazonBedrockRuntime; Status Code: 400; Error Code: ValidationException; Request ID: fe7b8dbd-31f9-4821-a663-d00319d36e89; Proxy: null)

Sounds good.

jonathan-buttner · 2024-07-05T14:25:06Z

...pack/inference/services/amazonbedrock/embeddings/AmazonBedrockEmbeddingsServiceSettings.java

+    private final Integer dimensions;
+    private final Boolean dimensionsSetByUser;
+    private final Integer maxInputTokens;
+    private final SimilarityMeasure similarity;


Shouldn't these be task settings, since they only apply to the embedding task?

Hmm I'm not sure what you mean @maxhniebergall . I believe these settings are only used for embedding tasks.

Do you mean that they should be task settings because they should be modifiable after an inference endpoint is created?

I don't think we'd want a user to be able to change these after setting up an inference endpoint (I suppose max input tokens maybe?). If a user generated text embeddings using a certain number of dimensions and expected similarity I would think it'd break the index if we stored different dimension sizes or potentially a different similarity during an inference request.

…110248) * Initial commit; setup Gradle; start service * initial commit * minor cleanups, builds green; needs tests * bug fixes; tested working embeddings & completion * use custom json builder for embeddings request * Ensure auto-close; fix forbidden API * start of adding unit tests; abstraction layers * adding additional tests; cleanups * add requests unit tests * all tests created * fix cohere embeddings response * fix cohere embeddings response * fix lint * better test coverage for secrets; inference client * update thread-safe syncs; make dims/tokens + int * add tests for dims and max tokens positive integer * use requireNonNull;override settings type;cleanups * use r/w lock for client cache * remove client reference counting * update locking in cache; client errors; noop doc * remove extra block in internalGetOrCreateClient * remove duplicate dependencies; cleanup * add fxn to get default embeddings similarity * use async calls to Amazon Bedrock; cleanups * use Clock in cache; simplify locking; cleanups * cleanups around executor; remove some instanceof * cleanups; use EmbeddingRequestChunker * move max chunk size to constants * oof - swapped transport vers w/ master node req * use XContent instead of Jackson JsonFactory * remove gradle versions; do not allow dimensions (cherry picked from commit 52e591d)

markjhoy · 2024-07-05T17:38:42Z

💚 All backports created successfully

Status	Branch	Result
✅	8.15

Questions ?

Please refer to the Backport tool documentation

…110248) * Initial commit; setup Gradle; start service * initial commit * minor cleanups, builds green; needs tests * bug fixes; tested working embeddings & completion * use custom json builder for embeddings request * Ensure auto-close; fix forbidden API * start of adding unit tests; abstraction layers * adding additional tests; cleanups * add requests unit tests * all tests created * fix cohere embeddings response * fix cohere embeddings response * fix lint * better test coverage for secrets; inference client * update thread-safe syncs; make dims/tokens + int * add tests for dims and max tokens positive integer * use requireNonNull;override settings type;cleanups * use r/w lock for client cache * remove client reference counting * update locking in cache; client errors; noop doc * remove extra block in internalGetOrCreateClient * remove duplicate dependencies; cleanup * add fxn to get default embeddings similarity * use async calls to Amazon Bedrock; cleanups * use Clock in cache; simplify locking; cleanups * cleanups around executor; remove some instanceof * cleanups; use EmbeddingRequestChunker * move max chunk size to constants * oof - swapped transport vers w/ master node req * use XContent instead of Jackson JsonFactory * remove gradle versions; do not allow dimensions

elasticsearchmachine · 2024-07-05T17:53:28Z

💚 Backport successful

Status	Branch	Result
✅	8.15

elasticsearchmachine · 2024-07-05T18:01:31Z

💔 Backport failed

The backport operation could not be completed due to the following error:

An unexpected error occurred when attempting to backport this PR.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 110248

…#110545) * Initial commit; setup Gradle; start service * initial commit * minor cleanups, builds green; needs tests * bug fixes; tested working embeddings & completion * use custom json builder for embeddings request * Ensure auto-close; fix forbidden API * start of adding unit tests; abstraction layers * adding additional tests; cleanups * add requests unit tests * all tests created * fix cohere embeddings response * fix cohere embeddings response * fix lint * better test coverage for secrets; inference client * update thread-safe syncs; make dims/tokens + int * add tests for dims and max tokens positive integer * use requireNonNull;override settings type;cleanups * use r/w lock for client cache * remove client reference counting * update locking in cache; client errors; noop doc * remove extra block in internalGetOrCreateClient * remove duplicate dependencies; cleanup * add fxn to get default embeddings similarity * use async calls to Amazon Bedrock; cleanups * use Clock in cache; simplify locking; cleanups * cleanups around executor; remove some instanceof * cleanups; use EmbeddingRequestChunker * move max chunk size to constants * oof - swapped transport vers w/ master node req * use XContent instead of Jackson JsonFactory * remove gradle versions; do not allow dimensions

elasticsearchmachine added the v8.15.0 label Jun 27, 2024

markjhoy force-pushed the markjhoy/add_amazon_bedrock_inference_api branch from 6deb3ba to 705f79d Compare June 29, 2024 15:44

markjhoy added 10 commits July 1, 2024 20:13

Initial commit; setup Gradle; start service

5d31643

initial commit

9e90393

minor cleanups, builds green; needs tests

f3f9d0f

bug fixes; tested working embeddings & completion

4220a2f

use custom json builder for embeddings request

c70d6db

Ensure auto-close; fix forbidden API

aa35530

start of adding unit tests; abstraction layers

d57288c

adding additional tests; cleanups

8f56793

add requests unit tests

ff37a88

all tests created

c39e552

markjhoy force-pushed the markjhoy/add_amazon_bedrock_inference_api branch from 4318dd4 to c39e552 Compare July 2, 2024 00:13

markjhoy added 3 commits July 1, 2024 20:40

fix cohere embeddings response

9782e15

fix cohere embeddings response

7124b2e

fix lint

c8170fb

markjhoy added >non-issue :ml Machine learning Team:ML Meta label for the ML team :EnterpriseSearch/Application Enterprise Search Team:Enterprise Search Meta label for Enterprise Search team Team:Search - Inference labels Jul 2, 2024

markjhoy requested review from a team, timgrein and jonathan-buttner July 2, 2024 02:00

markjhoy marked this pull request as ready for review July 2, 2024 02:01

markjhoy requested a review from a team as a code owner July 2, 2024 02:01

elasticsearchmachine removed the Team:Search - Inference label Jul 2, 2024

markjhoy requested a review from davidkyle July 4, 2024 14:46

davidkyle approved these changes Jul 4, 2024

View reviewed changes

maxhniebergall approved these changes Jul 4, 2024

View reviewed changes

markjhoy added 5 commits July 4, 2024 19:25

move max chunk size to constants

3a83432

Merge remote-tracking branch 'upstream/main' into markjhoy/add_amazon…

b11ab8a

…_bedrock_inference_api

Merge remote-tracking branch 'upstream/main' into markjhoy/add_amazon…

49c364a

…_bedrock_inference_api

oof - swapped transport vers w/ master node req

7ef9f70

use XContent instead of Jackson JsonFactory

cb3a1a7

jonathan-buttner approved these changes Jul 5, 2024

View reviewed changes

remove gradle versions; do not allow dimensions

97c7aa1

markjhoy requested a review from jonathan-buttner July 5, 2024 16:02

markjhoy added auto-backport-and-merge v8.15.0 labels Jul 5, 2024

jonathan-buttner approved these changes Jul 5, 2024

View reviewed changes

markjhoy merged commit 52e591d into elastic:main Jul 5, 2024
15 checks passed

markjhoy added v8.15.1 and removed v8.15.0 labels Jul 5, 2024

markjhoy mentioned this pull request Jul 5, 2024

[8.15] [Inference API] Add Amazon Bedrock support to Inference API (#110248) #110544

Closed

markjhoy mentioned this pull request Jul 5, 2024

[8.15] [Inference API] Add Amazon Bedrock support to Inference API (#110248) #110545

Merged

elasticsearchmachine added the backport pending label Jul 5, 2024

This was referenced Jul 8, 2024

Upgrade AWS Bedrock SDK to v2 as SDK v1 will be EOL end of 2025 #110590

Closed

[ML] Don't translate already chunked results #110592

Merged

[Inference API] Add Docs for Amazon Bedrock Support for the Inference API #110594

Merged

lkts mentioned this pull request Aug 13, 2024

Fix references to logsdb index mode in release highlights lkts/elasticsearch#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inference API] Add Amazon Bedrock support to Inference API #110248

[Inference API] Add Amazon Bedrock support to Inference API #110248

markjhoy commented Jun 27, 2024 •

edited

Loading

elasticsearchmachine commented Jul 2, 2024

elasticsearchmachine commented Jul 2, 2024

davidkyle left a comment

maxhniebergall left a comment

maxhniebergall Jul 4, 2024

markjhoy Jul 4, 2024

maxhniebergall Jul 4, 2024

markjhoy Jul 4, 2024

maxhniebergall Jul 4, 2024

jonathan-buttner Jul 5, 2024

maxhniebergall Jul 5, 2024 •

edited

Loading

jonathan-buttner Jul 5, 2024 •

edited

Loading

maxhniebergall Jul 8, 2024

maxhniebergall Jul 4, 2024

maxhniebergall Jul 4, 2024

markjhoy Jul 4, 2024

maxhniebergall Jul 4, 2024

maxhniebergall Jul 4, 2024

markjhoy Jul 4, 2024

jonathan-buttner left a comment

jonathan-buttner Jul 5, 2024

markjhoy Jul 5, 2024

jonathan-buttner Jul 5, 2024

markjhoy Jul 5, 2024

markjhoy Jul 5, 2024

jonathan-buttner Jul 5, 2024

jonathan-buttner Jul 5, 2024

markjhoy Jul 5, 2024

markjhoy Jul 5, 2024

jonathan-buttner Jul 5, 2024

jonathan-buttner Jul 5, 2024

markjhoy commented Jul 5, 2024

elasticsearchmachine commented Jul 5, 2024

elasticsearchmachine commented Jul 5, 2024


		ValidationException validationException = new ValidationException();

		var temperature = extractOptionalDoubleInRange(

[Inference API] Add Amazon Bedrock support to Inference API #110248

[Inference API] Add Amazon Bedrock support to Inference API #110248

Conversation

markjhoy commented Jun 27, 2024 • edited Loading

Prerequisites to Model Creation

Inference Model Creation:

Service Settings

Task Settings

Inference Model Inference

Manual Testing Completed:

elasticsearchmachine commented Jul 2, 2024

elasticsearchmachine commented Jul 2, 2024

davidkyle left a comment

Choose a reason for hiding this comment

maxhniebergall left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxhniebergall Jul 5, 2024 • edited Loading

Choose a reason for hiding this comment

jonathan-buttner Jul 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonathan-buttner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markjhoy commented Jul 5, 2024

💚 All backports created successfully

Questions ?

elasticsearchmachine commented Jul 5, 2024

💚 Backport successful

elasticsearchmachine commented Jul 5, 2024

💔 Backport failed

markjhoy commented Jun 27, 2024 •

edited

Loading

maxhniebergall Jul 5, 2024 •

edited

Loading

jonathan-buttner Jul 5, 2024 •

edited

Loading