From bed4e1711939d47b0318d80e1f713e7986b7d88e Mon Sep 17 00:00:00 2001 From: sgolebiewski-intel Date: Tue, 17 Dec 2024 13:05:20 +0100 Subject: [PATCH] Fixing code snippet for GenAI inference on NPU Signed-off-by: sgolebiewski-intel --- .../llm_inference_guide/genai-guide-npu.rst | 2 ++ .../learn-openvino/llm_inference_guide/genai-guide.rst | 8 ++++---- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst b/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst index bd18ed29860c00..25cc1ef59fbed5 100644 --- a/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst +++ b/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst @@ -90,6 +90,7 @@ which do not require specifying quantization parameters: | Below is a list of such models: * meta-llama/Meta-Llama-3-8B-Instruct +* meta-llama/Llama-3.1-8B * microsoft/Phi-3-mini-4k-instruct * Qwen/Qwen2-7B * mistralai/Mistral-7B-Instruct-v0.2 @@ -136,6 +137,7 @@ you need to add ``do_sample=False`` **to the** ``generate()`` **method:** ov::genai::GenerationConfig config; config.do_sample=false; config.max_new_tokens=100; + ov::genai::LLMPipeline pipe(models_path, "NPU"); std::cout << pipe.generate("The Sun is yellow because", config); } diff --git a/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide.rst b/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide.rst index eff30eed054295..2c2491de7b74cf 100644 --- a/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide.rst +++ b/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide.rst @@ -583,9 +583,9 @@ compression is done by NNCF at the model export stage. The exported model contai information necessary for execution, including the tokenizer/detokenizer and the generation config, ensuring that its results match those generated by Hugging Face. -The `LLMPipeline` is the main object to setup the model for text generation. You can provide the -converted model to this object, specify the device for inference, and provide additional -parameters. +The LLMPipeline is the main object to setup the model for the text generation. +You can provide converted model to this object, specifically device for inference +and provide other parameters. .. tab-set:: @@ -916,7 +916,7 @@ running the following code: GenAI API ####################################### -The use case described here uses the following OpenVINO GenAI API classes: +The use case described here regards the following OpenVINO GenAI API classes: * generation_config - defines a configuration class for text generation, enabling customization of the generation process such as the maximum length of