Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick start: Install bigdl-llm on windows gpu #10195

Closed
wants to merge 46 commits into from
Closed
Changes from 30 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
c875f38
add windows quick start
ivy-lv11 Feb 21, 2024
e1a10ce
modify fig size
ivy-lv11 Feb 21, 2024
f7a8fab
update
ivy-lv11 Feb 21, 2024
b7dc6dd
modify demo
ivy-lv11 Feb 21, 2024
c12c40f
add sample output
ivy-lv11 Feb 21, 2024
74b2638
modify link
ivy-lv11 Feb 21, 2024
da43ab1
add cpu_embedding
ivy-lv11 Feb 21, 2024
6cc7c9f
LLM: support iq2 for mixtral (#10191)
rnwang04 Feb 21, 2024
e93951b
Update README (#10186)
jason-dai Feb 21, 2024
6e12fec
LLM: add qlora finetuning example using `trl.SFTTrainer` (#10183)
plusbang Feb 21, 2024
66cde46
Change the nightly test time of ppl and harness (#10198)
hxsz1997 Feb 21, 2024
5baff9b
[LLM] Small updates to Win GPU Install Doc (#10199)
Oscilloscope98 Feb 21, 2024
5b7071d
Bump org.apache.commons:commons-compress from 1.21 to 1.26.0 in /scal…
dependabot[bot] Feb 21, 2024
82c5032
add pdf
ivy-lv11 Feb 22, 2024
787fc29
pdf
ivy-lv11 Feb 22, 2024
bbd7049
scale the figs
ivy-lv11 Feb 22, 2024
b702e25
rename
ivy-lv11 Feb 22, 2024
556848f
typo
ivy-lv11 Feb 22, 2024
8f22a42
resize
ivy-lv11 Feb 22, 2024
7609401
resize fig
ivy-lv11 Feb 22, 2024
34ce4ad
resize pic
ivy-lv11 Feb 22, 2024
50bd8fd
resize
ivy-lv11 Feb 22, 2024
247f05d
modify format
ivy-lv11 Feb 22, 2024
e42c292
reformat
ivy-lv11 Feb 22, 2024
baebf3b
modify code block format
ivy-lv11 Feb 22, 2024
32c77eb
Fix C-Eval ChatGLM loading issue (#10206)
NovTi Feb 22, 2024
8e84efb
update code style
ivy-lv11 Feb 22, 2024
951df11
run on arc
ivy-lv11 Feb 22, 2024
eceff71
add GPU info
ivy-lv11 Feb 22, 2024
b25e5ab
update fig
ivy-lv11 Feb 22, 2024
6043bab
update transformers
ivy-lv11 Feb 22, 2024
c031da2
LLM: add esimd sdp support for chatglm3 (#10205)
rnwang04 Feb 22, 2024
9e3422c
[LLM] Add quantize kv_cache for Baichuan2-13B (#10203)
sgwhat Feb 22, 2024
d0e1459
LLM: Add mlp layer unit tests (#10200)
Mingyu-Wei Feb 22, 2024
64d40b4
[LLM] Add model loading time record for all-in-one benchmark (#10201)
Oscilloscope98 Feb 22, 2024
c9d8420
modify pics path
ivy-lv11 Feb 22, 2024
f2dd41c
change path
ivy-lv11 Feb 22, 2024
1a4dbbf
LLM: add GGUF-IQ2 examples (#10207)
rnwang04 Feb 22, 2024
d5c4e47
Support for MPT rotary embedding (#10208)
Uxito-Ada Feb 22, 2024
1793305
modify path
ivy-lv11 Feb 22, 2024
6c9c48e
update path
ivy-lv11 Feb 22, 2024
c6dbd57
LLM: Update IPEX to 2.2.0+cpu and Refactor for _ipex_optimize (#10189)
xiangyuT Feb 22, 2024
2615a9b
modify usage
ivy-lv11 Feb 22, 2024
4afdcfc
GPT-J rope optimization on xpu (#10182)
cyita Feb 22, 2024
5c79d89
Merge branch 'install-win-gpu' of https://github.com/ivy-lv11/BigDL i…
ivy-lv11 Feb 22, 2024
714cd77
remove figs
ivy-lv11 Feb 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
130 changes: 130 additions & 0 deletions docs/readthedocs/source/doc/LLM/QuickStart/install_windows_gpu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Install BigDL-LLM on Windows for Intel GPU

## MTL & iGPU & Arc

### Install GPU driver

* Install Visual Studio 2022 Community Edition from [here](https://visualstudio.microsoft.com/downloads/).

> Note select `Desktop development with C++` during installation.
> The installation could be slow and cost 15 minutes. Need at least 7GB.
> If you do not select this workload during installation, go to Tools > Get Tools and Features... to change workload following [this page](https://learn.microsoft.com/en-us/cpp/build/vscpp-step-0-installation?view=msvc-170#step-4---choose-workloads).
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_1.png" alt="image-20240221102252560" width=80%; />

* Install latest GPU driver from [here](https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html). Note the process could be slow and It takes 10 minutes to download and install. Reboot is also needed.
After rebooting, if driver is installed correctly we will see the Arc Control like the fig below.
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_3.png" width=70%; />
>
> We can check GPU status from Arc Control (the left one in fig) or Task Manager (the right one in fig).
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_4.png" width=70%; />

### Install conda

We recommend using miniconda to create environment. Please refer to the [page](https://docs.anaconda.com/free/miniconda/) to install miniconda.

* Choose windows miniconda installer. Download and install. It takes a few minutes.

> <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_5.png" width=50%; />

* After installation, open `Anaconda prompt` and create an environment by `conda create -n llm python=3.9 libuv` .

> Note: if you encounter CondaHTTPError problem and fail to create the environment, please check the internet connection and proxy setting. You can define your proxy setting by `conda config --set proxy_servers.http your_http_proxy_IP:port` and `conda config --set proxy_servers.https your_https_proxy_IP:port`
>
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_6.png" width=50%; />

### Install oneAPI

* Install oneAPI Base Toolkit with the help of pip. After ensuring `conda` is ready, we can use `pip ` to install oneAPI Base Toolkit.

```bash
pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
```

> If you encounter HTTP Timeout error, also check your internet and proxy setting in `pip.ini` file which is under "C:\Users\YourName\AppData\Roaming\pip" folder.

### Install bigdl-llm

* Run the commands below in Anaconda prompt.

```bash
conda activate llm

pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
```


* Now you can test whether all the components have been installed correctly within the interactive python prompt. If we can import all the packages correctly following the python file below, then the installation is correct.
```python
import torch
import time
import argparse
import numpy as np

from bigdl.llm.transformers import AutoModel,AutoModelForCausalLM
from transformers import AutoTokenizer, GenerationConfig
```

### A quick example
Then we use phi-1.5 as an example to show how to run the model with bigdl-llm on windows. Here we we provide `demo.py` and you can run it with `python demo.py`.
> Note that transformer version should match the model you want to use. For example, here we use transformers 4.37.0 to run the demo.
> ```
> pip install transformers==4.37.0
> ```

```python
# demo.py
import torch
import numpy as np
from bigdl.llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer, GenerationConfig

PHI1_5_PROMPT_FORMAT = " Question:{prompt}\n\n Answer:"
generation_config = GenerationConfig(use_cache = True)

if __name__ == '__main__':
model_path = "microsoft/phi-1_5"
prompt = "What is AI?"
n_predict = 32
# Load model in 4 bit,
# which convert the relevant layers in the model into INT4 format
# When running LLMs on Intel iGPUs for Windows users, we recommend setting `cpu_embedding=True` in the from_pretrained function.
# This will allow the memory-intensive embedding layer to utilize the CPU instead of iGPU.
model = AutoModelForCausalLM.from_pretrained(model_path,
load_in_4bit=True,
# cpu_embedding=True,
trust_remote_code=True)

model = model.to('xpu')

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path,
trust_remote_code=True)

# Generate predicted tokens
with torch.inference_mode():
prompt = PHI1_5_PROMPT_FORMAT.format(prompt=prompt)
input_ids = tokenizer.encode(prompt, return_tensors="pt").to('xpu')
output = model.generate(input_ids, do_sample=False, max_new_tokens=n_predict, generation_config = generation_config)
torch.xpu.synchronize()
output = output.cpu()
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
print('-'*20, 'Prompt', '-'*20)
print(prompt)
print('-'*20, 'Output', '-'*20)
print(output_str)
```
Here is the sample output on the laptop equipped with 11th Gen Intel(R) Core(TM) i7-1185G7 and Intel(R) Iris(R) Xe Graphics after running the example program above.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the user run this example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We provide the contents of demo.py and users could run it as python demo.py.

```
Inference time: 3.526491641998291 s
-------------------- Prompt --------------------
Question:What is AI?

Answer:
-------------------- Output --------------------
Question:What is AI?

Answer: AI stands for Artificial Intelligence, which is the simulation of human intelligence in machines.
```