Skip to content

Commit

Permalink
[ML DOCS]Timeout only applies to ELSER and built in E5 models (#111159)…
Browse files Browse the repository at this point in the history
… (#111182)
  • Loading branch information
davidkyle authored Jul 23, 2024
1 parent 4b5a510 commit 1cc0311
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 14 deletions.
8 changes: 0 additions & 8 deletions docs/reference/inference/put-inference.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,3 @@ The following services are available through the {infer} API, click the links to
* <<infer-service-hugging-face,Hugging Face>>
* <<infer-service-mistral,Mistral>>
* <<infer-service-openai,OpenAI>>

[NOTE]
====
You might see a 502 bad gateway error in the response when using the {kib} Console.
This error usually just reflects a timeout, while the model downloads in the background.
You can check the download progress in the {ml-app} UI.
If using the Python client, you can set the `timeout` parameter to a higher value.
====
14 changes: 11 additions & 3 deletions docs/reference/inference/service-elasticsearch.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Available task types:

`service`::
(Required, string)
The type of service supported for the specified task type. In this case,
The type of service supported for the specified task type. In this case,
`elasticsearch`.

`service_settings`::
Expand All @@ -58,7 +58,7 @@ The total number of allocations this model is assigned across machine learning n

`num_threads`:::
(Required, integer)
Sets the number of threads used by each model allocation during inference. This generally increases the speed per inference request. The inference process is a compute-bound process; `threads_per_allocations` must not exceed the number of available allocated processors per node.
Sets the number of threads used by each model allocation during inference. This generally increases the speed per inference request. The inference process is a compute-bound process; `threads_per_allocations` must not exceed the number of available allocated processors per node.
Must be a power of 2. Max allowed value is 32.

`task_settings`::
Expand Down Expand Up @@ -98,6 +98,14 @@ PUT _inference/text_embedding/my-e5-model
Valid values are `.multilingual-e5-small` and `.multilingual-e5-small_linux-x86_64`.
For further details, refer to the {ml-docs}/ml-nlp-e5.html[E5 model documentation].

[NOTE]
====
You might see a 502 bad gateway error in the response when using the {kib} Console.
This error usually just reflects a timeout, while the model downloads in the background.
You can check the download progress in the {ml-app} UI.
If using the Python client, you can set the `timeout` parameter to a higher value.
====

[discrete]
[[inference-example-eland]]
==== Models uploaded by Eland via the elasticsearch service
Expand All @@ -119,4 +127,4 @@ PUT _inference/text_embedding/my-msmarco-minilm-model
------------------------------------------------------------
// TEST[skip:TBD]
<1> The `model_id` must be the ID of a text embedding model which has already been
{ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland].
{ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland].
14 changes: 11 additions & 3 deletions docs/reference/inference/service-elser.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Available task types:

`service`::
(Required, string)
The type of service supported for the specified task type. In this case,
The type of service supported for the specified task type. In this case,
`elser`.

`service_settings`::
Expand All @@ -51,7 +51,7 @@ The total number of allocations this model is assigned across machine learning n

`num_threads`:::
(Required, integer)
Sets the number of threads used by each model allocation during inference. This generally increases the speed per inference request. The inference process is a compute-bound process; `threads_per_allocations` must not exceed the number of available allocated processors per node.
Sets the number of threads used by each model allocation during inference. This generally increases the speed per inference request. The inference process is a compute-bound process; `threads_per_allocations` must not exceed the number of available allocated processors per node.
Must be a power of 2. Max allowed value is 32.


Expand Down Expand Up @@ -92,4 +92,12 @@ Example response:
"task_settings": {}
}
------------------------------------------------------------
// NOTCONSOLE
// NOTCONSOLE

[NOTE]
====
You might see a 502 bad gateway error in the response when using the {kib} Console.
This error usually just reflects a timeout, while the model downloads in the background.
You can check the download progress in the {ml-app} UI.
If using the Python client, you can set the `timeout` parameter to a higher value.
====

0 comments on commit 1cc0311

Please sign in to comment.