Skip to content

Commit

Permalink
Update genai-guide-npu.rst (#26125)
Browse files Browse the repository at this point in the history
### Details:
 - *item1*
 - *...*

### Tickets:
 - *ticket-id*

---------

Co-authored-by: Karol Blaszczak <[email protected]>
  • Loading branch information
TolyaTalamanov and kblaszczak-intel authored Aug 21, 2024
1 parent 808b7e9 commit 1176df0
Showing 1 changed file with 34 additions and 38 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -8,60 +8,57 @@ This guide will give you extra details on how to utilize NPU with the GenAI flav
:doc:`See the installation guide <../../get-started/install-openvino/install-openvino-genai>`
for information on how to start.

Export an LLM model via Hugging Face Optimum-Intel
##################################################

1. Create a python virtual environment and install the correct components for exporting a model:

.. code-block:: console
Prerequisites
#############

python -m venv export-npu-env
export-npu-env\Scripts\activate
pip install transformers>=4.42.4 openvino==2024.2.0 openvino-tokenizers==2024.2.0 nncf==2.11.0 onnx==1.16.1 optimum-intel@git+https://github.com/huggingface/optimum-intel.git
Install required dependencies:

2. A chat-tuned TinyLlama model is used in this example. The following conversion & optimization settings are recommended when using the NPU:
.. code-block:: console
.. code-block:: python
python -m venv npu-env
npu-env\Scripts\activate
pip install optimum-intel nncf==2.11 onnx==1.16.1
pip install --pre openvino==2024.3.0.dev20240807 openvino-tokenizers==2024.3.0.0.dev20240807 openvino-genai==2024.3.0.0.dev20240807 --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release
optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --group-size 128 --ratio 1.0 TinyLlama
Export an LLM model via Hugging Face Optimum-Intel
##################################################

Run generation using OpenVINO GenAI
##########################################
A chat-tuned TinyLlama model is used in this example. The following conversion & optimization settings are recommended when using the NPU:

1. Create a python virtual environment and install the correct components for running the model on the NPU via OpenVINO GenAI:
.. code-block:: python
.. code-block:: console
optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --weight-format int4 --sym --group-size 128 --ratio 1.0 TinyLlama
python -m venv run-npu-env
run-npu-env\Scripts\activate
pip install --pre openvino==2024.3.0.dev20240807 openvino-tokenizers==2024.3.0.0.dev20240807 openvino-genai==2024.3.0.0.dev20240807 --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release
Run generation using OpenVINO GenAI
###################################

2. Perform generation using the new GenAI API
Use the following code snippet to perform generation with OpenVINO GenAI API:

.. tab-set::
.. tab-set::

.. tab-item:: Python
:sync: py
.. tab-item:: Python
:sync: py

.. code-block:: python
.. code-block:: python
import openvino_genai as ov_genai
pipe = ov_genai.LLMPipeline(model_path, "NPU")
print(pipe.generate("What is OpenVINO?", max_new_tokens=100))
import openvino_genai as ov_genai
pipe = ov_genai.LLMPipeline(model_path, "NPU")
print(pipe.generate("The Sun is yellow because", max_new_tokens=100))
.. tab-item:: C++
:sync: cpp
.. tab-item:: C++
:sync: cpp

.. code-block:: cpp
.. code-block:: cpp
#include "openvino/genai/llm_pipeline.hpp"
#include <iostream>
#include "openvino/genai/llm_pipeline.hpp"
#include <iostream>
int main(int argc, char* argv[]) {
std::string model_path = argv[1];
ov::genai::LLMPipeline pipe(model_path, "NPU");
std::cout << pipe.generate("What is OpenVINO?", ov::genai::max_new_tokens(100));
}
int main(int argc, char* argv[]) {
std::string model_path = argv[1];
ov::genai::LLMPipeline pipe(model_path, "NPU");
std::cout << pipe.generate("The Sun is yellow because", ov::genai::max_new_tokens(100));
}
Additional configuration options
################################
Expand Down Expand Up @@ -97,4 +94,3 @@ Additional Resources
* :doc:`NPU Device <../../openvino-workflow/running-inference/inference-devices-and-modes/npu-device>`
* `OpenVINO GenAI Repo <https://github.com/openvinotoolkit/openvino.genai>`__
* `Neural Network Compression Framework <https://github.com/openvinotoolkit/nncf>`__

0 comments on commit 1176df0

Please sign in to comment.