Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update NPU example readme #11931

Merged
merged 3 commits into from
Aug 27, 2024
Merged

Conversation

plusbang
Copy link
Contributor

Description

Update NPU readme

@plusbang plusbang requested a review from jason-dai August 27, 2024 03:07
Comment on lines -84 to -86
> To run Qwen2 and Llama2 with IPEX-LLM on Intel NPUs, we recommend using version **32.0.100.2540** for the Intel NPU.
>
> Go to https://www.intel.com/content/www/us/en/download/794734/825735/intel-npu-driver-windows.html to download and unzip the driver. Then follow the same steps on [Requirements](#0-requirements).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this notice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this notice?

Our troubleshooting (https://github.com/intel-analytics/ipex-llm/blob/4fcffc40503bd92aad164bb2d4a9cbb69c66342d/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md#troubleshooting) involves transpose value setting workaround. And 32.0.100.2540 is mainly verified on MTL.

@@ -77,45 +76,9 @@ Inference time: xxxx s
done
```

## Example 2: Predict Tokens using `generate()` API using multi processes
## 4. Run Optimized Models (Experimental)
In the example [llama2.py](./llama2.py) and [qwen2.py](./qwen2.py), we show an experimental support for a Llama2 / Qwen2 model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimization and fused decoderlayer optimization on Intel NPUs.
Copy link
Contributor

@jason-dai jason-dai Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example below shows how to run the optimized model implementations on Intel NPU, including

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example below shows how to run the optimized model implementations on Intel NPU, including

Have updated.

@plusbang plusbang mentioned this pull request Aug 27, 2024
7 tasks
@plusbang plusbang requested a review from jason-dai August 27, 2024 04:22
Copy link
Contributor

@jason-dai jason-dai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@plusbang plusbang merged commit 14dddfc into intel-analytics:main Aug 27, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants