-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GenAI: Add a draft for NPU doc #25841
GenAI: Add a draft for NPU doc #25841
Conversation
openvino==2024.2.0 | ||
openvino-tokenizers==2024.2.0 | ||
nncf==2.11.0 | ||
optimum-intel @ git+https://github.com/huggingface/optimum-intel.git@439d61f79cf55d5d0b28334f577b6ac3c5ced28f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimum-intel
has different release cadence and usually taken from main
branch
docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst
Outdated
Show resolved
Hide resolved
.. code-block:: text | ||
|
||
# requirements.txt | ||
openvino==2024.3.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2024.3.1
hasn't been released yet!
docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst
Outdated
Show resolved
Hide resolved
openvino==2024.2.0 | ||
openvino-tokenizers==2024.2.0 | ||
nncf==2.11.0 | ||
optimum-intel @ git+https://github.com/huggingface/optimum-intel.git@439d61f79cf55d5d0b28334f577b6ac3c5ced28f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this point to? Is there a specific version or tag instead of a hash commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the hash commit from main
branch that was verified to work with NPU.
In general, we may not specify the exact commit, hopefully it won't break NPU (not guaranteed).
openvino-notebooks
takes optimum-intel
just from main
branch, see:
https://github.com/openvinotoolkit/openvino_notebooks/blob/a99c0ec648fc6414fb5c169e2dd0ef396c71f613/notebooks/llm-question-answering/llm-question-answering.ipynb#L28
%pip install -q "torch>=2.1" "nncf>=2.7" "transformers>=4.40.0" onnx "optimum>=1.16.1" "accelerate" "datasets>=2.14.6" "gradio>=4.19" "git+https://github.com/huggingface/optimum-intel.git" --extra-index-url https://download.pytorch.org/whl/cpu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you find a proper fixed version or tag here?
|
||
pip install -r requirements.txt | ||
|
||
2. A chat-tuned TinyLlama model is used in this example. The following conversion & optimization settings are recommended when using NPU: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is chat-tuned the right term here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not, model finetuned for chat scenarios
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that this term is used in the main article as well and can be found in the net scarcely. Changing it to "fine-tuned for chat" may be a good idea. We would change it in the genAI article too, in that case.
docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst
Outdated
Show resolved
Hide resolved
docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst
Outdated
Show resolved
Hide resolved
Additional configuration options | ||
################################ | ||
|
||
Compiling models for NPU may take a while. By default, LLMPipeline for NPU is configured for faster compilation, but it may result in lower performance. To achieve better performance at the expense of compilation time, you may try these settings: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't alter the way it looks in RST but please limit lines to ~80-100 characters long.
In emacs it is easy, in vim I don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it's your piece, didn't touch this. No prob, will format it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry too much about formatting, we can polish the whole thing, make sure line breaks are fine, references work, directives render properly, and all that stuff :)
docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst
Outdated
Show resolved
Hide resolved
@dmatveev Could you have a look one more time, please? |
Reviewed, looks good to me, but I think it is now a dry how-to rather some useful educational text. What I mean here, there's no explanation that or emphasis that a specific type of quantization preferable (asymmetric in this case). Why does this instruction exist at all? What makes it different from the default one except a device name? Also there's no other options covered, e.g. how to achieve better performance at the cost of compile time. |
The option to achieve better performance is covered in "Additional configuration options" part |
LGTM 👍 |
@kblaszczak-intel can you please put your approve here? It seems merging is still blocked for this PR as @TolyaTalamanov seem not enough |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some tweaks added but nothing major.
Details:
Tickets: