diff --git a/docs/reference/inference/service-anthropic.asciidoc b/docs/reference/inference/service-anthropic.asciidoc new file mode 100644 index 0000000000000..41419db7a6069 --- /dev/null +++ b/docs/reference/inference/service-anthropic.asciidoc @@ -0,0 +1,124 @@ +[[infer-service-anthropic]] +=== Anthropic {infer} service + +Creates an {infer} endpoint to perform an {infer} task with the `anthropic` service. + + +[discrete] +[[infer-service-anthropic-api-request]] +==== {api-request-title} + +`PUT /_inference//` + +[discrete] +[[infer-service-anthropic-api-path-params]] +==== {api-path-parms-title} + +``:: +(Required, string) +include::inference-shared.asciidoc[tag=inference-id] + +``:: +(Required, string) +include::inference-shared.asciidoc[tag=task-type] ++ +-- +Available task types: + +* `completion` +-- + +[discrete] +[[infer-service-anthropic-api-request-body]] +==== {api-request-body-title} + +`service`:: +(Required, string) +The type of service supported for the specified task type. In this case, +`anthropic`. + +`service_settings`:: +(Required, object) +include::inference-shared.asciidoc[tag=service-settings] ++ +-- +These settings are specific to the `anthropic` service. +-- + +`api_key`::: +(Required, string) +A valid API key for the Anthropic API. + +`model_id`::: +(Required, string) +The name of the model to use for the {infer} task. +You can find the supported models at https://docs.anthropic.com/en/docs/about-claude/models#model-names[Anthropic models]. + +`rate_limit`::: +(Optional, object) +By default, the `anthropic` service sets the number of requests allowed per minute to `50`. +This helps to minimize the number of rate limit errors returned from Anthropic. +To modify this, set the `requests_per_minute` setting of this object in your service settings: ++ +-- +include::inference-shared.asciidoc[tag=request-per-minute-example] +-- + +`task_settings`:: +(Required, object) +include::inference-shared.asciidoc[tag=task-settings] ++ +.`task_settings` for the `completion` task type +[%collapsible%closed] +===== +`max_tokens`::: +(Required, integer) +The maximum number of tokens to generate before stopping. + +`temperature`::: +(Optional, float) +The amount of randomness injected into the response. ++ +For more details about the supported range, see the https://docs.anthropic.com/en/api/messages[Anthropic messages API]. + +`top_k`::: +(Optional, integer) +Specifies to only sample from the top K options for each subsequent token. ++ +Recommended for advanced use cases only. You usually only need to use `temperature`. ++ +For more details, see the https://docs.anthropic.com/en/api/messages[Anthropic messages API]. + +`top_p`::: +(Optional, float) +Specifies to use Anthropic's nucleus sampling. ++ +In nucleus sampling, Anthropic computes the cumulative distribution over all the options for each subsequent token in decreasing probability order and cut it off once it reaches a particular probability specified by `top_p`. You should either alter `temperature` or `top_p`, but not both. ++ +Recommended for advanced use cases only. You usually only need to use `temperature`. ++ +For more details, see the https://docs.anthropic.com/en/api/messages[Anthropic messages API]. +===== + +[discrete] +[[inference-example-anthropic]] +==== Anthropic service example + +The following example shows how to create an {infer} endpoint called +`anthropic_completion` to perform a `completion` task type. + +[source,console] +------------------------------------------------------------ +PUT _inference/completion/anthropic_completion +{ + "service": "anthropic", + "service_settings": { + "api_key": "", + "model_id": "" + }, + "task_settings": { + "max_tokens": 1024 + } +} +------------------------------------------------------------ +// TEST[skip:TBD]