[DOCS] tiny article name changes (#25910)

openvinotoolkit · Aug 5, 2024 · 5264c99 · 5264c99
1 parent 3cf2744
commit 5264c99
Show file tree

Hide file tree

Showing 9 changed files with 131 additions and 71 deletions.
diff --git a/docs/articles_en/get-started/install-openvino/install-openvino-docker-linux.rst b/docs/articles_en/get-started/install-openvino/install-openvino-docker-linux.rst
@@ -1,13 +1,13 @@
-Install Intel® Distribution of OpenVINO™ toolkit from a Docker Image
+Install Intel® Distribution of OpenVINO™ Toolkit From a Docker Image
 =======================================================================
 
 .. meta::
    :description: Learn how to use a prebuilt Docker image or create an image
                  manually to install OpenVINO™ Runtime on Linux and Windows operating systems.
 
-This guide presents information on how to use a pre-built Docker image/create an image manually to install OpenVINO™ Runtime.
-
-Supported host operating systems for the Docker Base image:
+This guide presents information on how to use a pre-built Docker image or create a new image
+manually, to install OpenVINO™ Runtime. The supported host operating systems for the Docker
+base image are:
 
 - Linux
 - Windows (WSL2)

diff --git a/docs/articles_en/learn-openvino/llm_inference_guide.rst b/docs/articles_en/learn-openvino/llm_inference_guide.rst
@@ -1,5 +1,3 @@
-.. {#native_vs_hugging_face_api}
-
 Large Language Model Inference Guide
 ========================================
 
@@ -14,8 +12,8 @@ Large Language Model Inference Guide
    :hidden:
 
    Run LLMs with Optimum Intel <llm_inference_guide/llm-inference-hf>
-   Run LLMs with OpenVINO GenAI Flavor <llm_inference_guide/genai-guide>
-   Run LLMs with Base OpenVINO <llm_inference_guide/llm-inference-native-ov>
+   Run LLMs on OpenVINO GenAI Flavor <llm_inference_guide/genai-guide>
+   Run LLMs on Base OpenVINO <llm_inference_guide/llm-inference-native-ov>
    OpenVINO Tokenizers <llm_inference_guide/ov-tokenizers>
 
 Large Language Models (LLMs) like GPT are transformative deep learning networks capable of a

diff --git a/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide.rst b/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide.rst
@@ -1,5 +1,5 @@
-Run LLMs with OpenVINO GenAI Flavor
-=====================================
+Run LLM Inference on OpenVINO with the GenAI Flavor
+===============================================================================================
 
 .. meta::
    :description: Learn how to use the OpenVINO GenAI flavor to execute LLM models.

diff --git a/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst b/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst
@@ -1,7 +1,9 @@
-.. {#llm_inference}
-
 Run LLMs with Hugging Face and Optimum Intel
-=====================================================
+===============================================================================================
+
+.. meta::
+   :description: Learn how to use the native OpenVINO package to execute LLM models.
+
 
 The steps below show how to load and infer LLMs from
 `Hugging Face <https://huggingface.co/models>`__ using Optimum Intel. They also show how to

diff --git a/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-native-ov.rst b/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-native-ov.rst
@@ -1,7 +1,5 @@
-.. {#llm_inference_native_ov}
-
-Run LLMs with Base OpenVINO
-===============================
+Run LLM Inference on Native OpenVINO (not recommended)
+===============================================================================================
 
 To run Generative AI models using native OpenVINO APIs you need to follow regular
 **Convert -> Optimize -> Deploy** path with a few simplifications.

diff --git a/docs/articles_en/learn-openvino/llm_inference_guide/ov-tokenizers.rst b/docs/articles_en/learn-openvino/llm_inference_guide/ov-tokenizers.rst
@@ -1,12 +1,10 @@
-.. {#tokenizers}
-
 OpenVINO Tokenizers
 ===============================
 
-Tokenization is a necessary step in text processing using various models, including text generation with LLMs.
-Tokenizers convert the input text into a sequence of tokens with corresponding IDs, so that
-the model can understand and process it during inference. The transformation of a sequence of numbers into a
-string is called detokenization.
+Tokenization is a necessary step in text processing using various models, including text
+generation with LLMs. Tokenizers convert the input text into a sequence of tokens with
+corresponding IDs, so that the model can understand and process it during inference. The
+transformation of a sequence of numbers into a string is called detokenization.
 
 .. image:: ../../assets/images/tokenization.svg
    :align: center
@@ -338,7 +336,7 @@ Additional Resources
 
 * `OpenVINO Tokenizers repo <https://github.com/openvinotoolkit/openvino_tokenizers>`__
 * `OpenVINO Tokenizers Notebook <https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/openvino-tokenizers>`__
-* `Text generation C++ samples that support most popular models like LLaMA 2 <https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/cpp/greedy_causal_lm>`__
+* `Text generation C++ samples that support most popular models like LLaMA 2 <https://github.com/openvinotoolkit/openvino.genai/tree/master/text_generation/causal_lm/cpp>`__
 * `OpenVINO GenAI Repo <https://github.com/openvinotoolkit/openvino.genai>`__
 
 
diff --git a/...ino-workflow/running-inference/inference-devices-and-modes/hetero-execution.rst b/...ino-workflow/running-inference/inference-devices-and-modes/hetero-execution.rst
@@ -1,6 +1,4 @@
-.. {#openvino_docs_OV_UG_Hetero_execution}
-
-Heterogeneous execution
+Heterogeneous Execution
 =======================
 
 
@@ -9,24 +7,28 @@ Heterogeneous execution
                  the inference of one model on several computing devices.
 
 
-Heterogeneous execution enables executing inference of one model on several devices. Its purpose is to:
+Heterogeneous execution enables executing inference of one model on several devices.
+Its purpose is to:
 
-* Utilize the power of accelerators to process the heaviest parts of the model and to execute unsupported operations on fallback devices, like the CPU.
+* Utilize the power of accelerators to process the heaviest parts of the model and to execute
+  unsupported operations on fallback devices, like the CPU.
 * Utilize all available hardware more efficiently during one inference.
 
 Execution via the heterogeneous mode can be divided into two independent steps:
 
 1. Setting hardware affinity to operations (`ov::Core::query_model <https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_core.html#doxid-classov-1-1-core-1acdf8e64824fe4cf147c3b52ab32c1aab>`__ is used internally by the Hetero device).
 2. Compiling a model to the Heterogeneous device assumes splitting the model to parts, compiling them on the specified devices (via `ov::device::priorities <https://docs.openvino.ai/2024/api/c_cpp_api/structov_1_1device_1_1_priorities.html>`__), and executing them in the Heterogeneous mode. The model is split to subgraphs in accordance with the affinities, where a set of connected operations with the same affinity is to be a dedicated subgraph. Each subgraph is compiled on a dedicated device and multiple `ov::CompiledModel <https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_compiled_model.html#doxid-classov-1-1-compiled-model>`__ objects are made, which are connected via automatically allocated intermediate tensors.
-   
+
    If you set pipeline parallelism (via ``ov::hint::model_distribution_policy``), the model is split into multiple stages, and each stage is assigned to a different device. The output of one stage is fed as input to the next stage.
 
 These two steps are not interconnected and affinities can be set in one of two ways, used separately or in combination (as described below): in the ``manual`` or the ``automatic`` mode.
 
-Defining and Configuring the Hetero Device
+Defining and configuring the Hetero device
 ##########################################
 
-Following the OpenVINO™ naming convention, the Hetero execution plugin is assigned the label of ``"HETERO".`` It may be defined with no additional parameters, resulting in defaults being used, or configured further with the following setup options:
+Following the OpenVINO™ naming convention, the Hetero execution plugin is assigned the label of
+ ``"HETERO".`` It may be defined with no additional parameters, resulting in defaults being used,
+ or configured further with the following setup options:
 
 
 +--------------------------------------------+-------------------------------------------------------------+-----------------------------------------------------------+

diff --git a/...timize-inference/optimize-preprocessing/torchvision-preprocessing-converter.rst b/...timize-inference/optimize-preprocessing/torchvision-preprocessing-converter.rst
@@ -1,6 +1,4 @@
-.. {#torchvision_preprocessing_converter}
-
-Torchvision preprocessing converter
+Torchvision Preprocessing Converter
 =======================================
 
 
@@ -9,13 +7,14 @@ Torchvision preprocessing converter
                  to optimize model inference.
 
 
-The Torchvision-to-OpenVINO converter enables automatic translation of operators from the torchvision
-preprocessing pipeline to the OpenVINO format and embed them in your model. It is often used to adjust
-images serving as input for AI models to have proper dimensions or data types.
+The Torchvision-to-OpenVINO converter enables automatic translation of operators from the
+torchvision preprocessing pipeline to the OpenVINO format and embed them in your model. It is
+often used to adjust images serving as input for AI models to have proper dimensions or data
+types.
 
-As the converter is fully based on the **openvino.preprocess** module, you can implement the **torchvision.transforms**
-feature easily and without the use of external libraries, reducing the overall application complexity
-and enabling additional performance optimizations.
+As the converter is fully based on the **openvino.preprocess** module, you can implement the
+**torchvision.transforms** feature easily and without the use of external libraries, reducing
+the overall application complexity and enabling additional performance optimizations.
 
 
 .. note::

diff --git a/docs/sphinx_setup/_static/download/llm_models_ovms.csv b/docs/sphinx_setup/_static/download/llm_models_ovms.csv
@@ -1,37 +1,100 @@
 Product,Model,Framework,Precision,Node,Request Rate,Throughput [tok/s],TPOT Mean Latency
-ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,inf,270.55,839.37
-ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,2.0,269.6,847.81
-ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,1.0,268.92,840.1
 ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.2,92.75,75.75
-ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,inf,702.42,307.82
-ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,2.0,680.45,302.09
-ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.0,442.69,169.24
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.3,137.89,98.6
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.4,182.68,144.36
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.5,227.02,238.54
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.6,259.06,679.07
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.7,267.24,785.75
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.8,267.77,815.11
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.9,270.01,827.09
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,1.0,268.92,840.1
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,2.0,269.6,847.81
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,inf,270.55,839.37
 ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,0.2,92.63,63.23
-ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,inf,701.91,305.9
-ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,2.0,684.4,299.41
-ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.0,442.46,170.65
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,0.4,183.51,105.0
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,0.6,272.59,95.34
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,0.8,359.28,126.61
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.0,442.69,169.24
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.2,521.61,195.94
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.4,589.34,267.43
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.6,650.25,291.68
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.8,655.39,308.64
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,2.0,680.45,302.09
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,inf,702.42,307.82
 ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,0.2,92.89,54.69
-ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,inf,290.39,793.52
-ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,2.0,284.14,815.09
-ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,1.0,290.67,783.47
-ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.2,87.18,74.96
-ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,inf,873.93,245.31
-ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,2.0,774.3,233.49
-ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.0,427.37,114.16
-ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.2,88.9,60.04
-ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,inf,839.74,253.74
-ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,2.0,771.17,232.08
-ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.0,427.85,128.33
-ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.2,88.92,56.33
-ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,inf,275.71,810.89
-ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,2.0,278.07,809.3
-ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,1.0,272.54,811.37
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,0.4,184.37,77.0
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,0.6,273.06,101.81
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,0.8,360.22,135.38
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.0,442.46,170.65
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.2,519.5,208.44
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.4,590.11,252.86
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.6,651.09,286.93
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.8,670.74,298.02
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,2.0,684.4,299.41
+ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,inf,701.91,305.9
 ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.2,79.24,73.06
-ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,inf,799.46,265.5
-ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,2.0,707.46,234.44
-ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.0,380.61,104.71
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.3,118.42,90.31
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.4,157.04,113.23
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.5,193.85,203.97
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.6,232.36,253.17
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.7,260.56,581.45
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.8,271.97,761.05
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.9,273.36,787.74
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,1.0,272.54,811.37
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,2.0,278.07,809.3
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,inf,275.71,810.89
 ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,0.2,78.3,60.37
-ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,inf,843.51,252.12
-ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,2.0,698.38,247.77
-ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.0,376.36,139.62
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,0.4,156.42,69.27
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,0.6,232.27,77.79
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,0.8,307.37,90.07
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.0,380.61,104.71
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.2,452.18,127.36
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.4,519.44,156.18
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.6,587.62,169.44
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.8,649.94,198.44
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,2.0,707.46,234.44
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,inf,799.46,265.5
 ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,0.2,78.61,54.12
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,0.4,156.19,70.38
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,0.6,232.36,81.83
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,0.8,307.01,101.66
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.0,376.36,139.62
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.2,447.75,158.53
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.4,519.74,160.26
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.6,582.37,190.22
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.8,635.46,231.31
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,2.0,698.38,247.77
+ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,inf,843.51,252.12
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.2,87.18,74.96
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.3,130.74,92.67
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.4,172.94,117.03
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.5,214.71,172.69
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.6,255.45,282.74
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.7,280.38,629.68
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.8,280.55,765.16
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.9,289.65,765.65
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,1.0,290.67,783.47
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,2.0,284.14,815.09
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,inf,290.39,793.52
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.2,88.9,60.04
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.4,176.5,70.24
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.6,262.04,77.01
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.8,346.01,95.29
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.0,427.37,114.16
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.2,507.86,138.56
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.4,582.58,150.72
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.6,655.61,166.64
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.8,717.9,216.76
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,2.0,774.3,233.49
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,inf,873.93,245.31
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.2,88.92,56.33
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.4,175.99,72.72
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.6,261.96,84.24
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.8,346.78,101.67
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.0,427.85,128.33
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.2,506.17,150.01
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.4,581.72,167.61
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.6,651.97,190.91
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.8,713.2,222.56
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,2.0,771.17,232.08
+ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,inf,839.74,253.74