Name		Name	Last commit message	Last commit date
parent directory ..
.keep		.keep
README.md		README.md

README.md

Chapter 1 Introduction

What is IPEX-LLM

IPEX-LLM is a low-bit LLM library on Intel XPU (Xeon/Core/Flex/Arc/PVC), featuring broadest model support, lowest latency and smallest memory footprint. It is released under Apache 2.0 License.

What can you do with IPEX-LLM

You can use IPEX-LLM to run any pytorch model (e.g. HuggingFace transformers models). It automatically optimizes and accelerates LLMs using low-bit optimizations, modern hardware accelerations and latest software optimizations.

Using IPEX-LLM is easy. With just 1-line of code change, you can immediately observe significant speedup ¹ .

Example: Optimize LLaMA model with `optimize_model`

from ipex_llm import optimize_model

from transformers import LlamaForCausalLM, LlamaTokenizer
model = LlamaForCausalLM.from_pretrained(model_path,...)

# apply ipex-llm low-bit optimization, by default uses INT4
model = optimize_model(model)

...

IPEX-LLM provides a variety of low-bit optimizations (e.g., INT3/NF3/INT4/NF4/INT5/INT8), and allows you to run LLMs on low-cost PCs (CPU-only), on PCs with GPU, or on cloud.

The demos below shows the experiences of running 7B and 13B model on a 16G memory laptop.

6B model running on an Intel 12-Gen Core PC (real-time screen capture):

13B model running on an Intel 12-Gen Core PC (real-time screen capture):

What's Next

The following chapters in this tutorial will explain in more details about how to use IPEX-LLM to build LLM applications, e.g. best practices for setting up your environment, APIs, Chinese support, GPU, application development guides with case studies, etc. Most chapters provide runnable notebooks using popular open source models. Read along to learn more and run the code on your laptop.

Also, you can check out our GitHub repo for more information and latest news.

We have already verified many models on IPEX-LLM and provided ready-to-run examples, such as Llama2, Vicuna, ChatGLM, ChatGLM2, Baichuan, MOSS, Falcon, Dolly-v1, Dolly-v2, StarCoder, Mistral, RedPajama, Whisper, etc. You can find more model examples here.

Performance varies by use, configuration and other factors. ipex-llm may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex. ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ch_1_Introduction

ch_1_Introduction

README.md

Chapter 1 Introduction

What is IPEX-LLM

What can you do with IPEX-LLM

Example: Optimize LLaMA model with `optimize_model`

6B model running on an Intel 12-Gen Core PC (real-time screen capture):

13B model running on an Intel 12-Gen Core PC (real-time screen capture):

What's Next

Files

ch_1_Introduction

Directory actions

More options

Directory actions

More options

Latest commit

History

ch_1_Introduction

Folders and files

parent directory

README.md

Chapter 1 Introduction

What is IPEX-LLM

What can you do with IPEX-LLM

Example: Optimize LLaMA model with optimize_model

6B model running on an Intel 12-Gen Core PC (real-time screen capture):

13B model running on an Intel 12-Gen Core PC (real-time screen capture):

What's Next

Footnotes

Example: Optimize LLaMA model with `optimize_model`