-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quick start: Install bigdl-llm on windows gpu #10195
Closed
+1,364
−231
Closed
Changes from 30 commits
Commits
Show all changes
46 commits
Select commit
Hold shift + click to select a range
c875f38
add windows quick start
ivy-lv11 e1a10ce
modify fig size
ivy-lv11 f7a8fab
update
ivy-lv11 b7dc6dd
modify demo
ivy-lv11 c12c40f
add sample output
ivy-lv11 74b2638
modify link
ivy-lv11 da43ab1
add cpu_embedding
ivy-lv11 6cc7c9f
LLM: support iq2 for mixtral (#10191)
rnwang04 e93951b
Update README (#10186)
jason-dai 6e12fec
LLM: add qlora finetuning example using `trl.SFTTrainer` (#10183)
plusbang 66cde46
Change the nightly test time of ppl and harness (#10198)
hxsz1997 5baff9b
[LLM] Small updates to Win GPU Install Doc (#10199)
Oscilloscope98 5b7071d
Bump org.apache.commons:commons-compress from 1.21 to 1.26.0 in /scal…
dependabot[bot] 82c5032
add pdf
ivy-lv11 787fc29
pdf
ivy-lv11 bbd7049
scale the figs
ivy-lv11 b702e25
rename
ivy-lv11 556848f
typo
ivy-lv11 8f22a42
resize
ivy-lv11 7609401
resize fig
ivy-lv11 34ce4ad
resize pic
ivy-lv11 50bd8fd
resize
ivy-lv11 247f05d
modify format
ivy-lv11 e42c292
reformat
ivy-lv11 baebf3b
modify code block format
ivy-lv11 32c77eb
Fix C-Eval ChatGLM loading issue (#10206)
NovTi 8e84efb
update code style
ivy-lv11 951df11
run on arc
ivy-lv11 eceff71
add GPU info
ivy-lv11 b25e5ab
update fig
ivy-lv11 6043bab
update transformers
ivy-lv11 c031da2
LLM: add esimd sdp support for chatglm3 (#10205)
rnwang04 9e3422c
[LLM] Add quantize kv_cache for Baichuan2-13B (#10203)
sgwhat d0e1459
LLM: Add mlp layer unit tests (#10200)
Mingyu-Wei 64d40b4
[LLM] Add model loading time record for all-in-one benchmark (#10201)
Oscilloscope98 c9d8420
modify pics path
ivy-lv11 f2dd41c
change path
ivy-lv11 1a4dbbf
LLM: add GGUF-IQ2 examples (#10207)
rnwang04 d5c4e47
Support for MPT rotary embedding (#10208)
Uxito-Ada 1793305
modify path
ivy-lv11 6c9c48e
update path
ivy-lv11 c6dbd57
LLM: Update IPEX to 2.2.0+cpu and Refactor for _ipex_optimize (#10189)
xiangyuT 2615a9b
modify usage
ivy-lv11 4afdcfc
GPT-J rope optimization on xpu (#10182)
cyita 5c79d89
Merge branch 'install-win-gpu' of https://github.com/ivy-lv11/BigDL i…
ivy-lv11 714cd77
remove figs
ivy-lv11 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
130 changes: 130 additions & 0 deletions
130
docs/readthedocs/source/doc/LLM/QuickStart/install_windows_gpu.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
# Install BigDL-LLM on Windows for Intel GPU | ||
|
||
## MTL & iGPU & Arc | ||
|
||
### Install GPU driver | ||
|
||
* Install Visual Studio 2022 Community Edition from [here](https://visualstudio.microsoft.com/downloads/). | ||
|
||
> Note select `Desktop development with C++` during installation. | ||
> The installation could be slow and cost 15 minutes. Need at least 7GB. | ||
> If you do not select this workload during installation, go to Tools > Get Tools and Features... to change workload following [this page](https://learn.microsoft.com/en-us/cpp/build/vscpp-step-0-installation?view=msvc-170#step-4---choose-workloads). | ||
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_1.png" alt="image-20240221102252560" width=80%; /> | ||
|
||
* Install latest GPU driver from [here](https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html). Note the process could be slow and It takes 10 minutes to download and install. Reboot is also needed. | ||
After rebooting, if driver is installed correctly we will see the Arc Control like the fig below. | ||
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_3.png" width=70%; /> | ||
> | ||
> We can check GPU status from Arc Control (the left one in fig) or Task Manager (the right one in fig). | ||
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_4.png" width=70%; /> | ||
|
||
### Install conda | ||
|
||
We recommend using miniconda to create environment. Please refer to the [page](https://docs.anaconda.com/free/miniconda/) to install miniconda. | ||
|
||
* Choose windows miniconda installer. Download and install. It takes a few minutes. | ||
|
||
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_5.png" width=50%; /> | ||
|
||
* After installation, open `Anaconda prompt` and create an environment by `conda create -n llm python=3.9 libuv` . | ||
|
||
> Note: if you encounter CondaHTTPError problem and fail to create the environment, please check the internet connection and proxy setting. You can define your proxy setting by `conda config --set proxy_servers.http your_http_proxy_IP:port` and `conda config --set proxy_servers.https your_https_proxy_IP:port` | ||
> | ||
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_6.png" width=50%; /> | ||
|
||
### Install oneAPI | ||
|
||
* Install oneAPI Base Toolkit with the help of pip. After ensuring `conda` is ready, we can use `pip ` to install oneAPI Base Toolkit. | ||
|
||
```bash | ||
pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 | ||
``` | ||
|
||
> If you encounter HTTP Timeout error, also check your internet and proxy setting in `pip.ini` file which is under "C:\Users\YourName\AppData\Roaming\pip" folder. | ||
|
||
### Install bigdl-llm | ||
|
||
* Run the commands below in Anaconda prompt. | ||
|
||
```bash | ||
conda activate llm | ||
|
||
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu | ||
``` | ||
|
||
|
||
* Now you can test whether all the components have been installed correctly within the interactive python prompt. If we can import all the packages correctly following the python file below, then the installation is correct. | ||
```python | ||
import torch | ||
import time | ||
import argparse | ||
import numpy as np | ||
|
||
from bigdl.llm.transformers import AutoModel,AutoModelForCausalLM | ||
from transformers import AutoTokenizer, GenerationConfig | ||
``` | ||
|
||
### A quick example | ||
Then we use phi-1.5 as an example to show how to run the model with bigdl-llm on windows. Here we we provide `demo.py` and you can run it with `python demo.py`. | ||
> Note that transformer version should match the model you want to use. For example, here we use transformers 4.37.0 to run the demo. | ||
> ``` | ||
> pip install transformers==4.37.0 | ||
> ``` | ||
|
||
```python | ||
# demo.py | ||
import torch | ||
import numpy as np | ||
from bigdl.llm.transformers import AutoModelForCausalLM | ||
from transformers import AutoTokenizer, GenerationConfig | ||
|
||
PHI1_5_PROMPT_FORMAT = " Question:{prompt}\n\n Answer:" | ||
generation_config = GenerationConfig(use_cache = True) | ||
|
||
if __name__ == '__main__': | ||
model_path = "microsoft/phi-1_5" | ||
prompt = "What is AI?" | ||
n_predict = 32 | ||
# Load model in 4 bit, | ||
# which convert the relevant layers in the model into INT4 format | ||
# When running LLMs on Intel iGPUs for Windows users, we recommend setting `cpu_embedding=True` in the from_pretrained function. | ||
# This will allow the memory-intensive embedding layer to utilize the CPU instead of iGPU. | ||
model = AutoModelForCausalLM.from_pretrained(model_path, | ||
load_in_4bit=True, | ||
# cpu_embedding=True, | ||
trust_remote_code=True) | ||
|
||
model = model.to('xpu') | ||
|
||
# Load tokenizer | ||
tokenizer = AutoTokenizer.from_pretrained(model_path, | ||
trust_remote_code=True) | ||
|
||
# Generate predicted tokens | ||
with torch.inference_mode(): | ||
prompt = PHI1_5_PROMPT_FORMAT.format(prompt=prompt) | ||
input_ids = tokenizer.encode(prompt, return_tensors="pt").to('xpu') | ||
output = model.generate(input_ids, do_sample=False, max_new_tokens=n_predict, generation_config = generation_config) | ||
torch.xpu.synchronize() | ||
output = output.cpu() | ||
output_str = tokenizer.decode(output[0], skip_special_tokens=True) | ||
print('-'*20, 'Prompt', '-'*20) | ||
print(prompt) | ||
print('-'*20, 'Output', '-'*20) | ||
print(output_str) | ||
``` | ||
Here is the sample output on the laptop equipped with 11th Gen Intel(R) Core(TM) i7-1185G7 and Intel(R) Iris(R) Xe Graphics after running the example program above. | ||
``` | ||
Inference time: 3.526491641998291 s | ||
-------------------- Prompt -------------------- | ||
Question:What is AI? | ||
|
||
Answer: | ||
-------------------- Output -------------------- | ||
Question:What is AI? | ||
|
||
Answer: AI stands for Artificial Intelligence, which is the simulation of human intelligence in machines. | ||
``` | ||
|
||
|
||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the user run this example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We provide the contents of
demo.py
and users could run it aspython demo.py
.