Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenAI: Add a draft for NPU doc #25841

Merged
merged 19 commits into from
Aug 13, 2024

Conversation

dmatveev
Copy link
Contributor

Details:

  • item1
  • ...

Tickets:

  • ticket-id

@dmatveev dmatveev self-assigned this Jul 31, 2024
@github-actions github-actions bot added the category: docs OpenVINO documentation label Jul 31, 2024
@dmatveev dmatveev added the category: NPU OpenVINO NPU plugin label Jul 31, 2024
@github-actions github-actions bot removed the category: NPU OpenVINO NPU plugin label Jul 31, 2024
openvino==2024.2.0
openvino-tokenizers==2024.2.0
nncf==2.11.0
optimum-intel @ git+https://github.com/huggingface/optimum-intel.git@439d61f79cf55d5d0b28334f577b6ac3c5ced28f
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optimum-intel has different release cadence and usually taken from main branch

.. code-block:: text

# requirements.txt
openvino==2024.3.1
Copy link
Contributor

@TolyaTalamanov TolyaTalamanov Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2024.3.1 hasn't been released yet!

openvino==2024.2.0
openvino-tokenizers==2024.2.0
nncf==2.11.0
optimum-intel @ git+https://github.com/huggingface/optimum-intel.git@439d61f79cf55d5d0b28334f577b6ac3c5ced28f
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this point to? Is there a specific version or tag instead of a hash commit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the hash commit from main branch that was verified to work with NPU.

In general, we may not specify the exact commit, hopefully it won't break NPU (not guaranteed).
openvino-notebooks takes optimum-intel just from main branch, see:
https://github.com/openvinotoolkit/openvino_notebooks/blob/a99c0ec648fc6414fb5c169e2dd0ef396c71f613/notebooks/llm-question-answering/llm-question-answering.ipynb#L28

%pip install -q "torch>=2.1" "nncf>=2.7" "transformers>=4.40.0" onnx "optimum>=1.16.1" "accelerate" "datasets>=2.14.6" "gradio>=4.19" "git+https://github.com/huggingface/optimum-intel.git" --extra-index-url https://download.pytorch.org/whl/cpu

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you find a proper fixed version or tag here?


pip install -r requirements.txt

2. A chat-tuned TinyLlama model is used in this example. The following conversion & optimization settings are recommended when using NPU:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is chat-tuned the right term here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not, model finetuned for chat scenarios

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that this term is used in the main article as well and can be found in the net scarcely. Changing it to "fine-tuned for chat" may be a good idea. We would change it in the genAI article too, in that case.

Additional configuration options
################################

Compiling models for NPU may take a while. By default, LLMPipeline for NPU is configured for faster compilation, but it may result in lower performance. To achieve better performance at the expense of compilation time, you may try these settings:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't alter the way it looks in RST but please limit lines to ~80-100 characters long.

In emacs it is easy, in vim I don't know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it's your piece, didn't touch this. No prob, will format it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry too much about formatting, we can polish the whole thing, make sure line breaks are fine, references work, directives render properly, and all that stuff :)

@TolyaTalamanov
Copy link
Contributor

@dmatveev
Copy link
Contributor Author

dmatveev commented Aug 8, 2024

Preview: http://openvino-doc.iotg.sclab.intel.com/genai-npu-preview/learn-openvino/llm_inference_guide/genai-guide-npu.html

@dmatveev Could you have a look one more time, please?

Reviewed, looks good to me, but I think it is now a dry how-to rather some useful educational text. What I mean here, there's no explanation that or emphasis that a specific type of quantization preferable (asymmetric in this case). Why does this instruction exist at all? What makes it different from the default one except a device name?

Also there's no other options covered, e.g. how to achieve better performance at the cost of compile time.

@dmatveev dmatveev marked this pull request as ready for review August 8, 2024 14:17
@dmatveev dmatveev requested a review from a team as a code owner August 8, 2024 14:17
@dmatveev dmatveev requested review from tsavina and removed request for a team August 8, 2024 14:17
@TolyaTalamanov
Copy link
Contributor

Preview: http://openvino-doc.iotg.sclab.intel.com/genai-npu-preview/learn-openvino/llm_inference_guide/genai-guide-npu.html
@dmatveev Could you have a look one more time, please?

Reviewed, looks good to me, but I think it is now a dry how-to rather some useful educational text. What I mean here, there's no explanation that or emphasis that a specific type of quantization preferable (asymmetric in this case). Why does this instruction exist at all? What makes it different from the default one except a device name?

Also there's no other options covered, e.g. how to achieve better performance at the cost of compile time.

The option to achieve better performance is covered in "Additional configuration options" part

@TolyaTalamanov
Copy link
Contributor

LGTM 👍

@dmatveev
Copy link
Contributor Author

@kblaszczak-intel can you please put your approve here? It seems merging is still blocked for this PR as @TolyaTalamanov seem not enough

Copy link
Contributor

@kblaszczak-intel kblaszczak-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tweaks added but nothing major.

@kblaszczak-intel kblaszczak-intel added this pull request to the merge queue Aug 13, 2024
Merged via the queue into openvinotoolkit:master with commit 55ffb33 Aug 13, 2024
103 checks passed
github-merge-queue bot pushed a commit that referenced this pull request Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: docs OpenVINO documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants