Update NPU example readme #11931

plusbang · 2024-08-27T03:02:46Z

Description

Update NPU readme

jason-dai · 2024-08-27T03:21:15Z

python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md

-> To run Qwen2 and Llama2 with IPEX-LLM on Intel NPUs, we recommend using version **32.0.100.2540** for the Intel NPU.
-> 
-> Go to https://www.intel.com/content/www/us/en/download/794734/825735/intel-npu-driver-windows.html to download and unzip the driver. Then follow the same steps on [Requirements](#0-requirements).


Do we still need this notice?

Do we still need this notice?

Our troubleshooting (https://github.com/intel-analytics/ipex-llm/blob/4fcffc40503bd92aad164bb2d4a9cbb69c66342d/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md#troubleshooting) involves transpose value setting workaround. And 32.0.100.2540 is mainly verified on MTL.

jason-dai · 2024-08-27T04:03:00Z

python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md

@@ -77,45 +76,9 @@ Inference time: xxxx s
 done
 ```

-## Example 2: Predict Tokens using `generate()` API using multi processes
+## 4. Run Optimized Models (Experimental)
 In the example [llama2.py](./llama2.py) and [qwen2.py](./qwen2.py), we show an experimental support for a Llama2 / Qwen2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimization and fused decoderlayer optimization on Intel NPUs.


The example below shows how to run the optimized model implementations on Intel NPU, including

Llama2-7B

Qwen2-1.5B

The example below shows how to run the optimized model implementations on Intel NPU, including

Llama2-7B

Qwen2-1.5B

Have updated.

jason-dai

LGTM

update

9c8b052

plusbang requested a review from jason-dai August 27, 2024 03:07

small fix

4fcffc4

jason-dai reviewed Aug 27, 2024

View reviewed changes

fix

cf02b15

plusbang mentioned this pull request Aug 27, 2024

Update llama3 npu example #11933

Merged

7 tasks

plusbang requested a review from jason-dai August 27, 2024 04:22

jason-dai approved these changes Aug 27, 2024

View reviewed changes

plusbang merged commit 14dddfc into intel-analytics:main Aug 27, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update NPU example readme #11931

Update NPU example readme #11931

plusbang commented Aug 27, 2024

jason-dai Aug 27, 2024

plusbang Aug 27, 2024

jason-dai Aug 27, 2024 •

edited

Loading

plusbang Aug 27, 2024

jason-dai left a comment

Update NPU example readme #11931

Update NPU example readme #11931

Conversation

plusbang commented Aug 27, 2024

Description

jason-dai Aug 27, 2024

Choose a reason for hiding this comment

plusbang Aug 27, 2024

Choose a reason for hiding this comment

jason-dai Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

plusbang Aug 27, 2024

Choose a reason for hiding this comment

jason-dai left a comment

Choose a reason for hiding this comment

jason-dai Aug 27, 2024 •

edited

Loading