-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inference API] Add Amazon Bedrock support to Inference API #110248
[Inference API] Add Amazon Bedrock support to Inference API #110248
Conversation
6deb3ba
to
705f79d
Compare
4318dd4
to
c39e552
Compare
Pinging @elastic/ml-core (Team:ML) |
Pinging @elastic/ent-search-eng (Team:Enterprise Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Epic PR 👏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, great PR! Very well organized and easy to understand. I do have a concern about the embedding task settings.
@@ -0,0 +1 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this intentionally left empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so -- these license files were brought over from the respository-s3 module pieces which also use the AWS SDK (and is blank there too)
private final Integer dimensions; | ||
private final Boolean dimensionsSetByUser; | ||
private final Integer maxInputTokens; | ||
private final SimilarityMeasure similarity; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these be task settings, since they only apply to the embedding task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intuitively, I would think so, but all our services (OpenAI, Cohere, etc.) all have these in the service settings - I assume as these are rarely (if ever) changed when performing inference... @jonathan-buttner or @davidkyle - thoughts here? I'll keep them in the service settings for now, but just a note for the future if we ever want to change these to be task settings...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, we will have to think about refactoring this in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these be task settings, since they only apply to the embedding task?
Hmm I'm not sure what you mean @maxhniebergall . I believe these settings are only used for embedding tasks.
Do you mean that they should be task settings because they should be modifiable after an inference endpoint is created?
I don't think we'd want a user to be able to change these after setting up an inference endpoint (I suppose max input tokens maybe?). If a user generated text embeddings using a certain number of dimensions and expected similarity I would think it'd break the index if we stored different dimension sizes or potentially a different similarity during an inference request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that what the difference between task settings and service settings is? Maybe we should rename them to MutableSettings
and ImmutableSettings
? I had thought that service settings were for every task in a service, and task settings were for when the settings were specific to a task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that what the difference between task settings and service settings is?
Yep!
Maybe we should rename them to MutableSettings and ImmutableSettings?
That's true, it isn't intuitive that service settings aren't mutable and task settings are. I think the convention we went with was to reflect the field name from body of the request. And we add the task type in the class name to indicate what it's for.
I guess we could have the reverse problem if we named them mutable and immutable as you wouldn't know immediately what they refer too without looking at the class contents. I suppose we could include it in the name too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, these are also serialized as part of the request, so we can't change the naming without a breaking change.
import java.util.List; | ||
import java.util.Map; | ||
|
||
public final class AmazonBedrockProviderCapabilities { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Love how all of this info is in one spot
var actionCreator = new AmazonBedrockActionCreator(amazonBedrockSender, this.getServiceComponents(), timeout); | ||
if (model instanceof AmazonBedrockModel baseAmazonBedrockModel) { | ||
var maxBatchSize = getEmbeddingsMaxBatchSize(baseAmazonBedrockModel.provider()); | ||
var batchedRequests = new EmbeddingRequestChunker(input, maxBatchSize, EmbeddingRequestChunker.EmbeddingType.FLOAT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do all of the bedrock models use Float embeddings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - both Amazon Titan and Cohere use floats in their output vectors
import java.util.Map; | ||
|
||
public final class AmazonBedrockProviderCapabilities { | ||
private static final int DEFAULT_MAX_CHUNK_SIZE = 2048; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this value here instead of in AmazonBedrockConstants?
import static org.elasticsearch.xpack.inference.services.amazonbedrock.AmazonBedrockConstants.TOP_K_FIELD; | ||
import static org.elasticsearch.xpack.inference.services.amazonbedrock.AmazonBedrockConstants.TOP_P_FIELD; | ||
|
||
public record AmazonBedrockChatCompletionRequestTaskSettings( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this class and AmazonBedrockChatCompletionTaskSettings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Request
task settings are ones that can be overridden during an inference POST request to Elasticsearch... the AmazonBedrockChatCompletionTaskSettings
are the default task settings a user can set up when they create the model (PUT)
…_bedrock_inference_api
…_bedrock_inference_api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work Mark! Thanks for all the refactoring. I left a comment about the dimensions set by user. The others are just nits.
static final String DIMENSIONS_SET_BY_USER = "dimensions_set_by_user"; | ||
|
||
private final Integer dimensions; | ||
private final Boolean dimensionsSetByUser; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have missed it but do we or do we plan on sending this to an external provider? I think I only saw that we use it in the updateModelWithEmbeddingDetails()
method. If we don't send it to an external provider then the user set dimensions might mess the semantic text field 🤔 . What I mean is, if a users sets it to 5 but the actual stored embeddings is 1000 I'm not sure what happens.
I think my vote would be to remove this if we don't plan on using it in the near future. If we do plan on using it soon, we might want to prevent a user from setting it for now 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an oversight on my part... for Amazon Titan embeddings - you can pass the dimensions, but Cohere does not have this... I'll fix this up, with some validations on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok cool 👍 yeah don't forget to pass it along in the titan request, or at least I missed where we are doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sooooo - here's something silly... Amazon Bedrock G1 models don't allow the dimensions
parameter, but the version 2 does..:
Titan Embeddings G1 - Text doesn't support the use of inference parameters. The following sections detail the request and response formats and provides a code example.
I think for now, I'll omit the dimensions all together because there's no definitive way we can tell which model the end user is using easily (we could check on the model ID as the string to see if it matches the v2 version, but, if they are using a custom ARN, it'd require a lot more hoops to jump through that I don't think is worth it at this point in time...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do plan on using it soon, we might want to prevent a user from setting it for now
I'll go this route for now as it's probably the safest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now, I'll omit the dimensions all together because there's no definitive way we can tell which model the end user is using easily
👍 sounds good
|
||
ValidationException validationException = new ValidationException(); | ||
|
||
var temperature = extractOptionalDoubleInRange( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I know I previously (in a different PR) asked that we adding functionality to validate the range. I think I've changed my position on that 😅 . To better support provider changes in these fields it probably makes sense to not validate them on our end and let the upstream provide return an error if they're not in a specific range/positive/negative etc.
I know we have this in a few places so you don't need to change it here but maybe going forward we rely more on the upstream provider to do this for us and we try to ensure the returned error message is clear enough to the user to know what needs to be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is still in the works, I'll see if it's easy enough to change this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now, I'll keep this as-is... providing an invalid value does have an error message available in the exception from the SDK, but it's quite buried ... :
(about 30 more lines above this here...)
...
(RequestExecutorService.java:192)\n\tat [email protected]/org.elasticsearch.xpack.inference.external.http.sender.AmazonBedrockRequestExecutorService.start(AmazonBedrockRequestExecutorService.java:19)\n\tat [email protected]/org.elasticsearch.xpack.inference.external.amazonbedrock.AmazonBedrockRequestSender.lambda$start$0(AmazonBedrockRequestSender.java:89)\n\tat [email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1570)\nCaused by: java.util.concurrent.ExecutionException: com.amazonaws.services.bedrockruntime.model.ValidationException: 1 validation error detected: Value '500.0' at 'inferenceConfig.temperature' failed to satisfy constraint: Member must have value less than or equal to 1 (Service: AmazonBedrockRuntime; Status Code: 400; Error Code: ValidationException; Request ID: fe7b8dbd-31f9-4821-a663-d00319d36e89; Proxy: null)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
private final Integer dimensions; | ||
private final Boolean dimensionsSetByUser; | ||
private final Integer maxInputTokens; | ||
private final SimilarityMeasure similarity; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these be task settings, since they only apply to the embedding task?
Hmm I'm not sure what you mean @maxhniebergall . I believe these settings are only used for embedding tasks.
Do you mean that they should be task settings because they should be modifiable after an inference endpoint is created?
I don't think we'd want a user to be able to change these after setting up an inference endpoint (I suppose max input tokens maybe?). If a user generated text embeddings using a certain number of dimensions and expected similarity I would think it'd break the index if we stored different dimension sizes or potentially a different similarity during an inference request.
…110248) * Initial commit; setup Gradle; start service * initial commit * minor cleanups, builds green; needs tests * bug fixes; tested working embeddings & completion * use custom json builder for embeddings request * Ensure auto-close; fix forbidden API * start of adding unit tests; abstraction layers * adding additional tests; cleanups * add requests unit tests * all tests created * fix cohere embeddings response * fix cohere embeddings response * fix lint * better test coverage for secrets; inference client * update thread-safe syncs; make dims/tokens + int * add tests for dims and max tokens positive integer * use requireNonNull;override settings type;cleanups * use r/w lock for client cache * remove client reference counting * update locking in cache; client errors; noop doc * remove extra block in internalGetOrCreateClient * remove duplicate dependencies; cleanup * add fxn to get default embeddings similarity * use async calls to Amazon Bedrock; cleanups * use Clock in cache; simplify locking; cleanups * cleanups around executor; remove some instanceof * cleanups; use EmbeddingRequestChunker * move max chunk size to constants * oof - swapped transport vers w/ master node req * use XContent instead of Jackson JsonFactory * remove gradle versions; do not allow dimensions (cherry picked from commit 52e591d)
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…110248) * Initial commit; setup Gradle; start service * initial commit * minor cleanups, builds green; needs tests * bug fixes; tested working embeddings & completion * use custom json builder for embeddings request * Ensure auto-close; fix forbidden API * start of adding unit tests; abstraction layers * adding additional tests; cleanups * add requests unit tests * all tests created * fix cohere embeddings response * fix cohere embeddings response * fix lint * better test coverage for secrets; inference client * update thread-safe syncs; make dims/tokens + int * add tests for dims and max tokens positive integer * use requireNonNull;override settings type;cleanups * use r/w lock for client cache * remove client reference counting * update locking in cache; client errors; noop doc * remove extra block in internalGetOrCreateClient * remove duplicate dependencies; cleanup * add fxn to get default embeddings similarity * use async calls to Amazon Bedrock; cleanups * use Clock in cache; simplify locking; cleanups * cleanups around executor; remove some instanceof * cleanups; use EmbeddingRequestChunker * move max chunk size to constants * oof - swapped transport vers w/ master node req * use XContent instead of Jackson JsonFactory * remove gradle versions; do not allow dimensions
💚 Backport successful
|
💔 Backport failedThe backport operation could not be completed due to the following error:
You can use sqren/backport to manually backport by running |
…#110545) * Initial commit; setup Gradle; start service * initial commit * minor cleanups, builds green; needs tests * bug fixes; tested working embeddings & completion * use custom json builder for embeddings request * Ensure auto-close; fix forbidden API * start of adding unit tests; abstraction layers * adding additional tests; cleanups * add requests unit tests * all tests created * fix cohere embeddings response * fix cohere embeddings response * fix lint * better test coverage for secrets; inference client * update thread-safe syncs; make dims/tokens + int * add tests for dims and max tokens positive integer * use requireNonNull;override settings type;cleanups * use r/w lock for client cache * remove client reference counting * update locking in cache; client errors; noop doc * remove extra block in internalGetOrCreateClient * remove duplicate dependencies; cleanup * add fxn to get default embeddings similarity * use async calls to Amazon Bedrock; cleanups * use Clock in cache; simplify locking; cleanups * cleanups around executor; remove some instanceof * cleanups; use EmbeddingRequestChunker * move max chunk size to constants * oof - swapped transport vers w/ master node req * use XContent instead of Jackson JsonFactory * remove gradle versions; do not allow dimensions
Adds Amazon Bedrock text embeddings and chat completion support for the inference API.
Prerequisites to Model Creation
Inference Model Creation:
Service Settings
Where
provider
is one of:amazontitan
cohere
amazontitan
anthropic
ai21labs
cohere
meta
mistral
The
model_id
should match a model you have access to when using a base model, or the ARN of the model if you are using a custom model based on a base model.Also, be sure that the model you are using is supported in your region that you specify.
text_embedding
tasks for Amazon Bedrock have the following additional, optional settings:dimensions
: The output dimensions to use for the inferencemax_input_tokens
: the maximum number of input tokenssimilarity
: the similarity measure to usecompletion
tasks for Amazon Bedrock do not have any additional service settings.Task Settings
text_embedding
tasks for Amazon Bedrock do not have any settings, and none should be sent.chat_completion
tasks have the following task settings available (all optional):max_new_tokens
: the max new tokens to producetemperature
: the temperature (0.0 to 1.0) to usetop_p
: the top P metric to use (0.0 to 1.0)top_k
: and alternative totop_p
from 0.0 to 1.0 (only available for Anthropic, Cohere, and Mistral)Inference Model Inference
Manual Testing Completed:
Tested Embeddings Providers:
Tested Chat Completion Providers: