Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick start: Install bigdl-llm on windows gpu #10195

Closed
wants to merge 46 commits into from
Closed
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
c875f38
add windows quick start
ivy-lv11 Feb 21, 2024
e1a10ce
modify fig size
ivy-lv11 Feb 21, 2024
f7a8fab
update
ivy-lv11 Feb 21, 2024
b7dc6dd
modify demo
ivy-lv11 Feb 21, 2024
c12c40f
add sample output
ivy-lv11 Feb 21, 2024
74b2638
modify link
ivy-lv11 Feb 21, 2024
da43ab1
add cpu_embedding
ivy-lv11 Feb 21, 2024
6cc7c9f
LLM: support iq2 for mixtral (#10191)
rnwang04 Feb 21, 2024
e93951b
Update README (#10186)
jason-dai Feb 21, 2024
6e12fec
LLM: add qlora finetuning example using `trl.SFTTrainer` (#10183)
plusbang Feb 21, 2024
66cde46
Change the nightly test time of ppl and harness (#10198)
hxsz1997 Feb 21, 2024
5baff9b
[LLM] Small updates to Win GPU Install Doc (#10199)
Oscilloscope98 Feb 21, 2024
5b7071d
Bump org.apache.commons:commons-compress from 1.21 to 1.26.0 in /scal…
dependabot[bot] Feb 21, 2024
82c5032
add pdf
ivy-lv11 Feb 22, 2024
787fc29
pdf
ivy-lv11 Feb 22, 2024
bbd7049
scale the figs
ivy-lv11 Feb 22, 2024
b702e25
rename
ivy-lv11 Feb 22, 2024
556848f
typo
ivy-lv11 Feb 22, 2024
8f22a42
resize
ivy-lv11 Feb 22, 2024
7609401
resize fig
ivy-lv11 Feb 22, 2024
34ce4ad
resize pic
ivy-lv11 Feb 22, 2024
50bd8fd
resize
ivy-lv11 Feb 22, 2024
247f05d
modify format
ivy-lv11 Feb 22, 2024
e42c292
reformat
ivy-lv11 Feb 22, 2024
baebf3b
modify code block format
ivy-lv11 Feb 22, 2024
32c77eb
Fix C-Eval ChatGLM loading issue (#10206)
NovTi Feb 22, 2024
8e84efb
update code style
ivy-lv11 Feb 22, 2024
951df11
run on arc
ivy-lv11 Feb 22, 2024
eceff71
add GPU info
ivy-lv11 Feb 22, 2024
b25e5ab
update fig
ivy-lv11 Feb 22, 2024
6043bab
update transformers
ivy-lv11 Feb 22, 2024
c031da2
LLM: add esimd sdp support for chatglm3 (#10205)
rnwang04 Feb 22, 2024
9e3422c
[LLM] Add quantize kv_cache for Baichuan2-13B (#10203)
sgwhat Feb 22, 2024
d0e1459
LLM: Add mlp layer unit tests (#10200)
Mingyu-Wei Feb 22, 2024
64d40b4
[LLM] Add model loading time record for all-in-one benchmark (#10201)
Oscilloscope98 Feb 22, 2024
c9d8420
modify pics path
ivy-lv11 Feb 22, 2024
f2dd41c
change path
ivy-lv11 Feb 22, 2024
1a4dbbf
LLM: add GGUF-IQ2 examples (#10207)
rnwang04 Feb 22, 2024
d5c4e47
Support for MPT rotary embedding (#10208)
Uxito-Ada Feb 22, 2024
1793305
modify path
ivy-lv11 Feb 22, 2024
6c9c48e
update path
ivy-lv11 Feb 22, 2024
c6dbd57
LLM: Update IPEX to 2.2.0+cpu and Refactor for _ipex_optimize (#10189)
xiangyuT Feb 22, 2024
2615a9b
modify usage
ivy-lv11 Feb 22, 2024
4afdcfc
GPT-J rope optimization on xpu (#10182)
cyita Feb 22, 2024
5c79d89
Merge branch 'install-win-gpu' of https://github.com/ivy-lv11/BigDL i…
ivy-lv11 Feb 22, 2024
714cd77
remove figs
ivy-lv11 Feb 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Install BigDL-LLM on Windows for Intel GPU

## iGPU

### Install GPU driver

1. Step 1: Install Visual Studio 2022 Community Edition from [here](https://visualstudio.microsoft.com/downloads/).

<img src="./figs/fig1.png" style="zoom:20%;" />

> Note select `Desktop development with C++` during installation.
>
> <img src="./figs/fig2.png" alt="image-20240221102252560" style="zoom:40%;" />
>
> The installation could be slow and cost 15 minutes. Need at least 7GB.
>
> If you do not select this workload during installation, go to Tools > Get Tools and Features... to change workload following [this page](https://learn.microsoft.com/en-us/cpp/build/vscpp-step-0-installation?view=msvc-170#step-4---choose-workloads).

2. Step 2: Install latest GPU driver from [here](https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html). Note the process could be slow and It takes 10 minutes to download and install. Reboot is also needed. After rebooting, we can check GPU status from GUI.

<img src="./figs/fig3.png" alt="image-20240221102217795" style="zoom:20%;" />

<img src="./figs/fig4.png" alt="image-20240221105834031" style="zoom:20%;" />

### Install conda

We recommend using miniconda to create environment. Please refer to the [page](https://docs.anaconda.com/free/miniconda/) to install miniconda.

* Choose windows miniconda installer. Download and install. It takes a few minutes.

<img src="./figs/fig5.png" alt="image-20240221110402278" style="zoom:20%;" />

* After installation, open `Anaconda prompt` and create an environment by `conda create -n llm python=3.9 libuv` .

> Note: if you encounter CondaHTTPError problem and fail to create the environment, please check the internet connection and proxy setting. You can define your proxy setting by `conda config --set proxy_servers.http your_http_proxy_IP:port` and `conda config --set proxy_servers.https your_https_proxy_IP:port`
>
> <img src="./figs/fig6.png" alt="image-20240221122852777" style="zoom:20%;" />

### Install oneAPI

Install oneAPI Base Toolkit with the help of pip. After ensuring `conda` is ready, we can use `pip ` to install oneAPI Base Toolkit.

```bash
pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
```

> If you encounter HTTP Timeout error, also check your internet and proxy setting in `pip.ini` file which is under "C:\Users\YourName\AppData\Roaming\pip" folder.

When you successfully install oneAPI from pip, you will see similar thing from the Anaconda prompt command line. <img src="./figs/fig7.png" alt="image-20240221130508668" style="zoom:20%;" />

### Install bigdl-llm

1. Step 1: Run the commands below in Anaconda prompt.

```bash
conda create -n llm python=3.9 libuv # Already done in "Install conda" section
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you already created llm env before, so just remove this line to avoid confusing.

conda activate llm
pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0 # Already done in "Install oneAPI" section
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove one-api as it is done in previous section.

pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
```


2. Step 2: Now we can test whether all the components have been installed correctly. If we can import all the packages correctly following the python file below, then the installation is correct.
```python
import torch
import time
import argparse
import numpy as np

from bigdl.llm.transformers import AutoModel,AutoModelForCausalLM
from transformers import AutoTokenizer, GenerationConfig
```

Then we use phi-1.5 as an example to show how to run the model with bigdl-llm on windows.
Copy link
Contributor

@shane-huang shane-huang Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the phi-1.5 example in a new section "A Quick Example"

```python
import torch
import time
import argparse
import numpy as np

from bigdl.llm.transformers import AutoModel,AutoModelForCausalLM
from transformers import AutoTokenizer, GenerationConfig

PHI1_5_PROMPT_FORMAT = " Question:{prompt}\n\n Answer:"
generation_config = GenerationConfig(use_cache = True)

if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for phi-1_5 model')
Copy link
Contributor

@shane-huang shane-huang Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this example as simple as possible without much code.

  • remove arg parse section (put just arg values in code)
  • remove timing code
  • make the comments concise

parser.add_argument('--repo-id-or-model-path', type=str, default="microsoft/phi-1_5",
help='The huggingface repo id for the phi-1_5 model to be downloaded'
', or the path to the huggingface checkpoint folder')
parser.add_argument('--prompt', type=str, default="What is AI?",
help='Prompt to infer')
parser.add_argument('--n-predict', type=int, default=32,
help='Max tokens to predict')

args = parser.parse_args()
model_path = args.repo_id_or_model_path

# Load model in 4 bit,
# which convert the relevant layers in the model into INT4 format
# When running LLMs on Intel iGPUs for Windows users, we recommend setting `cpu_embedding=True` in the from_pretrained function.
# This will allow the memory-intensive embedding layer to utilize the CPU instead of iGPU.
model = AutoModelForCausalLM.from_pretrained(model_path,
load_in_4bit=True,
cpu_embedding=True,
trust_remote_code=True)

model = model.to('xpu')

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path,
trust_remote_code=True)

# Generate predicted tokens
with torch.inference_mode():
prompt = PHI1_5_PROMPT_FORMAT.format(prompt=args.prompt)
input_ids = tokenizer.encode(prompt, return_tensors="pt").to('xpu')

# ipex model needs a warmup, then inference time can be accurate
output = model.generate(input_ids,
max_new_tokens=args.n_predict,
generation_config = generation_config)
# start inference
st = time.time()
# if your selected model is capable of utilizing previous key/value attentions
# to enhance decoding speed, but has `"use_cache": false` in its model config,
# it is important to set `use_cache=True` explicitly in the `generate` function
# to obtain optimal performance with BigDL-LLM INT4 optimizations

# Note that phi-1_5 uses GenerationConfig to enable 'use_cache'
output = model.generate(input_ids, do_sample=False, max_new_tokens=args.n_predict, generation_config = generation_config)
torch.xpu.synchronize()
end = time.time()
output = output.cpu()
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
print(f'Inference time: {end-st} s')
print('-'*20, 'Prompt', '-'*20)
print(prompt)
print('-'*20, 'Output', '-'*20)
print(output_str)
```
Here is the sample output on the laptop.
```
Inference time: 3.526491641998291 s
-------------------- Prompt --------------------
Question:What is AI?

Answer:
-------------------- Output --------------------
Question:What is AI?

Answer: AI stands for Artificial Intelligence, which is the simulation of human intelligence in machines.
```



Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.