Skip to content

Commit

Permalink
[DOCS] tiny article name changes (#25910)
Browse files Browse the repository at this point in the history
  • Loading branch information
kblaszczak-intel authored Aug 5, 2024
1 parent 3cf2744 commit 5264c99
Show file tree
Hide file tree
Showing 9 changed files with 131 additions and 71 deletions.
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
Install Intel® Distribution of OpenVINO™ toolkit from a Docker Image
Install Intel® Distribution of OpenVINO™ Toolkit From a Docker Image
=======================================================================

.. meta::
:description: Learn how to use a prebuilt Docker image or create an image
manually to install OpenVINO™ Runtime on Linux and Windows operating systems.

This guide presents information on how to use a pre-built Docker image/create an image manually to install OpenVINO™ Runtime.

Supported host operating systems for the Docker Base image:
This guide presents information on how to use a pre-built Docker image or create a new image
manually, to install OpenVINO™ Runtime. The supported host operating systems for the Docker
base image are:

- Linux
- Windows (WSL2)
Expand Down
6 changes: 2 additions & 4 deletions docs/articles_en/learn-openvino/llm_inference_guide.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
.. {#native_vs_hugging_face_api}
Large Language Model Inference Guide
========================================

Expand All @@ -14,8 +12,8 @@ Large Language Model Inference Guide
:hidden:

Run LLMs with Optimum Intel <llm_inference_guide/llm-inference-hf>
Run LLMs with OpenVINO GenAI Flavor <llm_inference_guide/genai-guide>
Run LLMs with Base OpenVINO <llm_inference_guide/llm-inference-native-ov>
Run LLMs on OpenVINO GenAI Flavor <llm_inference_guide/genai-guide>
Run LLMs on Base OpenVINO <llm_inference_guide/llm-inference-native-ov>
OpenVINO Tokenizers <llm_inference_guide/ov-tokenizers>

Large Language Models (LLMs) like GPT are transformative deep learning networks capable of a
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Run LLMs with OpenVINO GenAI Flavor
=====================================
Run LLM Inference on OpenVINO with the GenAI Flavor
===============================================================================================

.. meta::
:description: Learn how to use the OpenVINO GenAI flavor to execute LLM models.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
.. {#llm_inference}
Run LLMs with Hugging Face and Optimum Intel
=====================================================
===============================================================================================

.. meta::
:description: Learn how to use the native OpenVINO package to execute LLM models.


The steps below show how to load and infer LLMs from
`Hugging Face <https://huggingface.co/models>`__ using Optimum Intel. They also show how to
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
.. {#llm_inference_native_ov}
Run LLMs with Base OpenVINO
===============================
Run LLM Inference on Native OpenVINO (not recommended)
===============================================================================================

To run Generative AI models using native OpenVINO APIs you need to follow regular
**Convert -> Optimize -> Deploy** path with a few simplifications.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
.. {#tokenizers}
OpenVINO Tokenizers
===============================

Tokenization is a necessary step in text processing using various models, including text generation with LLMs.
Tokenizers convert the input text into a sequence of tokens with corresponding IDs, so that
the model can understand and process it during inference. The transformation of a sequence of numbers into a
string is calleddetokenization.
Tokenization is a necessary step in text processing using various models, including text
generation with LLMs. Tokenizers convert the input text into a sequence of tokens with
corresponding IDs, so that the model can understand and process it during inference. The
transformation of a sequence of numbers into a string is called detokenization.

.. image:: ../../assets/images/tokenization.svg
:align: center
Expand Down Expand Up @@ -338,7 +336,7 @@ Additional Resources

* `OpenVINO Tokenizers repo <https://github.com/openvinotoolkit/openvino_tokenizers>`__
* `OpenVINO Tokenizers Notebook <https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/openvino-tokenizers>`__
* `Text generation C++ samples that support most popular models like LLaMA 2 <https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/cpp/greedy_causal_lm>`__
* `Text generation C++ samples that support most popular models like LLaMA 2 <https://github.com/openvinotoolkit/openvino.genai/tree/master/text_generation/causal_lm/cpp>`__
* `OpenVINO GenAI Repo <https://github.com/openvinotoolkit/openvino.genai>`__


Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
.. {#openvino_docs_OV_UG_Hetero_execution}
Heterogeneous execution
Heterogeneous Execution
=======================


Expand All @@ -9,24 +7,28 @@ Heterogeneous execution
the inference of one model on several computing devices.


Heterogeneous execution enables executing inference of one model on several devices. Its purpose is to:
Heterogeneous execution enables executing inference of one model on several devices.
Its purpose is to:

* Utilize the power of accelerators to process the heaviest parts of the model and to execute unsupported operations on fallback devices, like the CPU.
* Utilize the power of accelerators to process the heaviest parts of the model and to execute
unsupported operations on fallback devices, like the CPU.
* Utilize all available hardware more efficiently during one inference.

Execution via the heterogeneous mode can be divided into two independent steps:

1. Setting hardware affinity to operations (`ov::Core::query_model <https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_core.html#doxid-classov-1-1-core-1acdf8e64824fe4cf147c3b52ab32c1aab>`__ is used internally by the Hetero device).
2. Compiling a model to the Heterogeneous device assumes splitting the model to parts, compiling them on the specified devices (via `ov::device::priorities <https://docs.openvino.ai/2024/api/c_cpp_api/structov_1_1device_1_1_priorities.html>`__), and executing them in the Heterogeneous mode. The model is split to subgraphs in accordance with the affinities, where a set of connected operations with the same affinity is to be a dedicated subgraph. Each subgraph is compiled on a dedicated device and multiple `ov::CompiledModel <https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_compiled_model.html#doxid-classov-1-1-compiled-model>`__ objects are made, which are connected via automatically allocated intermediate tensors.

If you set pipeline parallelism (via ``ov::hint::model_distribution_policy``), the model is split into multiple stages, and each stage is assigned to a different device. The output of one stage is fed as input to the next stage.

These two steps are not interconnected and affinities can be set in one of two ways, used separately or in combination (as described below): in the ``manual`` or the ``automatic`` mode.

Defining and Configuring the Hetero Device
Defining and configuring the Hetero device
##########################################

Following the OpenVINO™ naming convention, the Hetero execution plugin is assigned the label of ``"HETERO".`` It may be defined with no additional parameters, resulting in defaults being used, or configured further with the following setup options:
Following the OpenVINO™ naming convention, the Hetero execution plugin is assigned the label of
``"HETERO".`` It may be defined with no additional parameters, resulting in defaults being used,
or configured further with the following setup options:


+--------------------------------------------+-------------------------------------------------------------+-----------------------------------------------------------+
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
.. {#torchvision_preprocessing_converter}
Torchvision preprocessing converter
Torchvision Preprocessing Converter
=======================================


Expand All @@ -9,13 +7,14 @@ Torchvision preprocessing converter
to optimize model inference.


The Torchvision-to-OpenVINO converter enables automatic translation of operators from the torchvision
preprocessing pipeline to the OpenVINO format and embed them in your model. It is often used to adjust
images serving as input for AI models to have proper dimensions or data types.
The Torchvision-to-OpenVINO converter enables automatic translation of operators from the
torchvision preprocessing pipeline to the OpenVINO format and embed them in your model. It is
often used to adjust images serving as input for AI models to have proper dimensions or data
types.

As the converter is fully based on the **openvino.preprocess** module, you can implement the **torchvision.transforms**
feature easily and without the use of external libraries, reducing the overall application complexity
and enabling additional performance optimizations.
As the converter is fully based on the **openvino.preprocess** module, you can implement the
**torchvision.transforms** feature easily and without the use of external libraries, reducing
the overall application complexity and enabling additional performance optimizations.


.. note::
Expand Down
123 changes: 93 additions & 30 deletions docs/sphinx_setup/_static/download/llm_models_ovms.csv
Original file line number Diff line number Diff line change
@@ -1,37 +1,100 @@
Product,Model,Framework,Precision,Node,Request Rate,Throughput [tok/s],TPOT Mean Latency
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,inf,270.55,839.37
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,2.0,269.6,847.81
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,1.0,268.92,840.1
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.2,92.75,75.75
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,inf,702.42,307.82
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,2.0,680.45,302.09
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.0,442.69,169.24
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.3,137.89,98.6
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.4,182.68,144.36
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.5,227.02,238.54
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.6,259.06,679.07
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.7,267.24,785.75
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.8,267.77,815.11
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,0.9,270.01,827.09
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,1.0,268.92,840.1
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,2.0,269.6,847.81
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8380,inf,270.55,839.37
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,0.2,92.63,63.23
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,inf,701.91,305.9
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,2.0,684.4,299.41
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.0,442.46,170.65
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,0.4,183.51,105.0
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,0.6,272.59,95.34
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,0.8,359.28,126.61
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.0,442.69,169.24
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.2,521.61,195.94
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.4,589.34,267.43
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.6,650.25,291.68
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,1.8,655.39,308.64
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,2.0,680.45,302.09
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8480+,inf,702.42,307.82
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,0.2,92.89,54.69
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,inf,290.39,793.52
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,2.0,284.14,815.09
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,1.0,290.67,783.47
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.2,87.18,74.96
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,inf,873.93,245.31
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,2.0,774.3,233.49
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.0,427.37,114.16
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.2,88.9,60.04
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,inf,839.74,253.74
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,2.0,771.17,232.08
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.0,427.85,128.33
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.2,88.92,56.33
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,inf,275.71,810.89
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,2.0,278.07,809.3
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,1.0,272.54,811.37
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,0.4,184.37,77.0
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,0.6,273.06,101.81
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,0.8,360.22,135.38
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.0,442.46,170.65
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.2,519.5,208.44
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.4,590.11,252.86
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.6,651.09,286.93
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,1.8,670.74,298.02
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,2.0,684.4,299.41
ovms,meta-llama/Llama-2-7b-chat-hf,PT,INT8-CW,Xeon Platinum 8580,inf,701.91,305.9
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.2,79.24,73.06
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,inf,799.46,265.5
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,2.0,707.46,234.44
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.0,380.61,104.71
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.3,118.42,90.31
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.4,157.04,113.23
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.5,193.85,203.97
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.6,232.36,253.17
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.7,260.56,581.45
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.8,271.97,761.05
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,0.9,273.36,787.74
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,1.0,272.54,811.37
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,2.0,278.07,809.3
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8380,inf,275.71,810.89
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,0.2,78.3,60.37
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,inf,843.51,252.12
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,2.0,698.38,247.77
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.0,376.36,139.62
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,0.4,156.42,69.27
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,0.6,232.27,77.79
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,0.8,307.37,90.07
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.0,380.61,104.71
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.2,452.18,127.36
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.4,519.44,156.18
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.6,587.62,169.44
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,1.8,649.94,198.44
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,2.0,707.46,234.44
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8480+,inf,799.46,265.5
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,0.2,78.61,54.12
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,0.4,156.19,70.38
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,0.6,232.36,81.83
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,0.8,307.01,101.66
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.0,376.36,139.62
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.2,447.75,158.53
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.4,519.74,160.26
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.6,582.37,190.22
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,1.8,635.46,231.31
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,2.0,698.38,247.77
ovms,meta-llama/Meta-Llama-3-8B-Instruct,PT,INT8-CW,Xeon Platinum 8580,inf,843.51,252.12
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.2,87.18,74.96
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.3,130.74,92.67
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.4,172.94,117.03
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.5,214.71,172.69
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.6,255.45,282.74
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.7,280.38,629.68
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.8,280.55,765.16
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,0.9,289.65,765.65
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,1.0,290.67,783.47
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,2.0,284.14,815.09
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8380,inf,290.39,793.52
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.2,88.9,60.04
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.4,176.5,70.24
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.6,262.04,77.01
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,0.8,346.01,95.29
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.0,427.37,114.16
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.2,507.86,138.56
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.4,582.58,150.72
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.6,655.61,166.64
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,1.8,717.9,216.76
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,2.0,774.3,233.49
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8480+,inf,873.93,245.31
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.2,88.92,56.33
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.4,175.99,72.72
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.6,261.96,84.24
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,0.8,346.78,101.67
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.0,427.85,128.33
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.2,506.17,150.01
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.4,581.72,167.61
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.6,651.97,190.91
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,1.8,713.2,222.56
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,2.0,771.17,232.08
ovms,mistralai/Mistral-7B-v0.1,PT,INT8-CW,Xeon Platinum 8580,inf,839.74,253.74

0 comments on commit 5264c99

Please sign in to comment.