Skip to content

Commit

Permalink
[DOCS] Resolves merge conflicts. (elastic#91610)
Browse files Browse the repository at this point in the history
  • Loading branch information
szabosteve authored Nov 16, 2022
1 parent 44421ff commit 30e51f1
Show file tree
Hide file tree
Showing 3 changed files with 48 additions and 37 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@ Controls the amount of time to wait for {infer} results. Defaults to 10 seconds.
`docs`::
(Required, array)
An array of objects to pass to the model for inference. The objects should
contain a field matching your configured trained model input. Typically, the field
name is `text_field`. Currently, only a single value is allowed.
contain a field matching your configured trained model input. Typically, the
field name is `text_field`. Currently, only a single value is allowed.

////
[[infer-trained-model-deployment-results]]
Expand All @@ -62,8 +62,8 @@ name is `text_field`. Currently, only a single value is allowed.
[[infer-trained-model-deployment-example]]
== {api-examples-title}

The response depends on the task the model is trained for. If it is a
text classification task, the response is the score. For example:
The response depends on the task the model is trained for. If it is a text
classification task, the response is the score. For example:

[source,console]
--------------------------------------------------
Expand Down Expand Up @@ -123,8 +123,8 @@ The API returns in this case:
----
// NOTCONSOLE

Zero-shot classification tasks require extra configuration defining the class labels.
These labels are passed in the zero-shot inference config.
Zero-shot classification tasks require extra configuration defining the class
labels. These labels are passed in the zero-shot inference config.

[source,console]
--------------------------------------------------
Expand All @@ -150,7 +150,8 @@ POST _ml/trained_models/model2/deployment/_infer
--------------------------------------------------
// TEST[skip:TBD]

The API returns the predicted label and the confidence, as well as the top classes:
The API returns the predicted label and the confidence, as well as the top
classes:

[source,console-result]
----
Expand Down Expand Up @@ -204,8 +205,8 @@ POST _ml/trained_models/model2/deployment/_infer
--------------------------------------------------
// TEST[skip:TBD]

When the input has been truncated due to the limit imposed by the model's `max_sequence_length`
the `is_truncated` field appears in the response.
When the input has been truncated due to the limit imposed by the model's
`max_sequence_length` the `is_truncated` field appears in the response.

[source,console-result]
----
Expand Down
19 changes: 13 additions & 6 deletions docs/reference/ml/trained-models/apis/infer-trained-model.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,11 @@
<titleabbrev>Infer trained model</titleabbrev>
++++

Evaluates a trained model. The model may be any supervised model either trained by {dfanalytics} or imported.
Evaluates a trained model. The model may be any supervised model either trained
by {dfanalytics} or imported.

NOTE: For model deployments with caching enabled, results may be returned
directly from the {infer} cache.

beta::[]

Expand Down Expand Up @@ -102,7 +106,8 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-fill-mask]
=====
`num_top_classes`::::
(Optional, integer)
Number of top predicted tokens to return for replacing the mask token. Defaults to `0`.
Number of top predicted tokens to return for replacing the mask token. Defaults
to `0`.

`results_field`::::
(Optional, string)
Expand Down Expand Up @@ -272,7 +277,8 @@ The maximum amount of words in the answer. Defaults to `15`.

`num_top_classes`::::
(Optional, integer)
The number the top found answers to return. Defaults to `0`, meaning only the best found answer is returned.
The number the top found answers to return. Defaults to `0`, meaning only the
best found answer is returned.

`question`::::
(Required, string)
Expand Down Expand Up @@ -368,7 +374,8 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-text-classific

`num_top_classes`::::
(Optional, integer)
Specifies the number of top class predictions to return. Defaults to all classes (-1).
Specifies the number of top class predictions to return. Defaults to all classes
(-1).

`results_field`::::
(Optional, string)
Expand Down Expand Up @@ -879,8 +886,8 @@ POST _ml/trained_models/model2/_infer
--------------------------------------------------
// TEST[skip:TBD]

When the input has been truncated due to the limit imposed by the model's `max_sequence_length`
the `is_truncated` field appears in the response.
When the input has been truncated due to the limit imposed by the model's
`max_sequence_length` the `is_truncated` field appears in the response.

[source,console-result]
----
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,20 +30,20 @@ in an ingest pipeline or directly in the <<infer-trained-model>> API.
Scaling inference performance can be achieved by setting the parameters
`number_of_allocations` and `threads_per_allocation`.

Increasing `threads_per_allocation` means more threads are used when
an inference request is processed on a node. This can improve inference speed
for certain models. It may also result in improvement to throughput.
Increasing `threads_per_allocation` means more threads are used when an
inference request is processed on a node. This can improve inference speed for
certain models. It may also result in improvement to throughput.

Increasing `number_of_allocations` means more threads are used to
process multiple inference requests in parallel resulting in throughput
improvement. Each model allocation uses a number of threads defined by
Increasing `number_of_allocations` means more threads are used to process
multiple inference requests in parallel resulting in throughput improvement.
Each model allocation uses a number of threads defined by
`threads_per_allocation`.

Model allocations are distributed across {ml} nodes. All allocations assigned
to a node share the same copy of the model in memory. To avoid
thread oversubscription which is detrimental to performance, model allocations
are distributed in such a way that the total number of used threads does not
surpass the node's allocated processors.
Model allocations are distributed across {ml} nodes. All allocations assigned to
a node share the same copy of the model in memory. To avoid thread
oversubscription which is detrimental to performance, model allocations are
distributed in such a way that the total number of used threads does not surpass
the node's allocated processors.

[[start-trained-model-deployment-path-params]]
== {api-path-parms-title}
Expand All @@ -57,33 +57,36 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-id]

`cache_size`::
(Optional, <<byte-units,byte value>>)
The inference cache size (in memory outside the JVM heap) per node for the model.
The default value is the same size as the `model_size_bytes`. To disable the cache, `0b` can be provided.
The inference cache size (in memory outside the JVM heap) per node for the
model. The default value is the size of the model as reported by the
`model_size_bytes` field in the <<get-trained-models-stats>>. To disable the
cache, `0b` can be provided.

`number_of_allocations`::
(Optional, integer)
The total number of allocations this model is assigned across {ml} nodes.
Increasing this value generally increases the throughput.
Defaults to 1.
Increasing this value generally increases the throughput. Defaults to 1.

`queue_capacity`::
(Optional, integer)
Controls how many inference requests are allowed in the queue at a time.
Every machine learning node in the cluster where the model can be allocated
has a queue of this size; when the number of requests exceeds the total value,
new requests are rejected with a 429 error. Defaults to 1024. Max allowed value is 1000000.
new requests are rejected with a 429 error. Defaults to 1024. Max allowed value
is 1000000.

`threads_per_allocation`::
(Optional, integer)
Sets the number of threads used by each model allocation during inference. This generally increases
the speed per inference request. The inference process is a compute-bound process;
`threads_per_allocations` must not exceed the number of available allocated processors per node.
Defaults to 1. Must be a power of 2. Max allowed value is 32.
Sets the number of threads used by each model allocation during inference. This
generally increases the speed per inference request. The inference process is a
compute-bound process; `threads_per_allocations` must not exceed the number of
available allocated processors per node. Defaults to 1. Must be a power of 2.
Max allowed value is 32.

`timeout`::
(Optional, time)
Controls the amount of time to wait for the model to deploy. Defaults
to 20 seconds.
Controls the amount of time to wait for the model to deploy. Defaults to 20
seconds.

`wait_for`::
(Optional, string)
Expand Down

0 comments on commit 30e51f1

Please sign in to comment.