-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[LLM] BigDL-LLM Documentation Initial Version (#8833)
* Change order of LLM in header * Some updates to footer * Add BigDL-LLM index page and basic file structure * Update index page for key features * Add initial content for BigDL-LLM in 5 mins * Improvement to footnote * Add initial contents based on current contents we have * Add initial quick links * Small fix * Rename file * Hide cli section for now and change model supports to examples * Hugging Face format -> Hugging Face transformers format * Add placeholder for GPU supports * Add GPU related content structure * Add cpu/gpu installation initial contents * Add initial contents for GPU supports * Add image link to LLM index page * Hide tips and known issues for now * Small fix * Update based on comments * Small fix * Add notes for Python 3.9 * Add placehoder optimize model & reveal CLI; small revision * examples add gpu part * Hide CLI part again for first version of merging * add keyfeatures-optimize_model part (#1) * change gif link to the ones hosted on github * Small fix --------- Co-authored-by: plusbang <binbin1.deng@intel.com> Co-authored-by: binbin Deng <108676127+plusbang@users.noreply.github.com>
- Loading branch information
1 parent
49a3945
commit cf6a620
Showing
21 changed files
with
689 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
40 changes: 40 additions & 0 deletions
40
docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/cli.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# CLI (Command Line Interface) Tool | ||
|
||
```eval_rst | ||
.. note:: | ||
Currently ``bigdl-llm`` CLI supports *LLaMA* (e.g., vicuna), *GPT-NeoX* (e.g., redpajama), *BLOOM* (e.g., pheonix) and *GPT2* (e.g., starcoder) model architecture; for other models, you may use the ``transformers``-style or LangChain APIs. | ||
``` | ||
|
||
## Convert Model | ||
|
||
You may convert the downloaded model into native INT4 format using `llm-convert`. | ||
|
||
```bash | ||
# convert PyTorch (fp16 or fp32) model; | ||
# llama/bloom/gptneox/starcoder model family is currently supported | ||
llm-convert "/path/to/model/" --model-format pth --model-family "bloom" --outfile "/path/to/output/" | ||
|
||
# convert GPTQ-4bit model | ||
# only llama model family is currently supported | ||
llm-convert "/path/to/model/" --model-format gptq --model-family "llama" --outfile "/path/to/output/" | ||
``` | ||
|
||
## Run Model | ||
|
||
You may run the converted model using `llm-cli` or `llm-chat` (built on top of `main.cpp` in [`llama.cpp`](https://github.com/ggerganov/llama.cpp)) | ||
|
||
```bash | ||
# help | ||
# llama/bloom/gptneox/starcoder model family is currently supported | ||
llm-cli -x gptneox -h | ||
|
||
# text completion | ||
# llama/bloom/gptneox/starcoder model family is currently supported | ||
llm-cli -t 16 -x gptneox -m "/path/to/output/model.bin" -p 'Once upon a time,' | ||
|
||
# chat mode | ||
# llama/gptneox model family is currently supported | ||
llm-chat -m "/path/to/output/model.bin" -x llama | ||
``` |
47 changes: 47 additions & 0 deletions
47
docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/gpu_supports.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# GPU Supports | ||
|
||
You may apply INT4 optimizations to any Hugging Face *Transformers* models on device with Intel GPUs as follows: | ||
|
||
```python | ||
# import ipex | ||
import intel_extension_for_pytorch as ipex | ||
|
||
# load Hugging Face Transformers model with INT4 optimizations on Intel GPUs | ||
from bigdl.llm.transformers import AutoModelForCausalLM | ||
|
||
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', | ||
load_in_4bit=True, | ||
optimize_model=False) | ||
model = model.to('xpu') | ||
``` | ||
|
||
```eval_rst | ||
.. note:: | ||
You may apply INT8 optimizations as follows: | ||
.. code-block:: python | ||
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', | ||
load_in_low_bit="sym_int8", | ||
optimize_model=False) | ||
model = model.to('xpu') | ||
``` | ||
|
||
After loading the Hugging Face *Transformers* model, you may easily run the optimized model as follows: | ||
|
||
```python | ||
# run the optimized model | ||
from transformers import AutoTokenizer | ||
|
||
tokenizer = AutoTokenizer.from_pretrained(model_path) | ||
input_ids = tokenizer.encode(input_str, ...).to('xpu') | ||
output_ids = model.generate(input_ids, ...) | ||
output = tokenizer.batch_decode(output_ids) | ||
``` | ||
|
||
```eval_rst | ||
.. seealso:: | ||
See the complete examples `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/GPU>`_ | ||
``` |
54 changes: 54 additions & 0 deletions
54
docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/hugging_face_format.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
# Hugging Face ``transformers`` Format | ||
|
||
## Load in Low Precision | ||
You may apply INT4 optimizations to any Hugging Face *Transformers* models as follows: | ||
|
||
```python | ||
# load Hugging Face Transformers model with INT4 optimizations | ||
from bigdl.llm.transformers import AutoModelForCausalLM | ||
|
||
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True) | ||
``` | ||
|
||
After loading the Hugging Face *Transformers* model, you may easily run the optimized model as follows: | ||
|
||
```python | ||
# run the optimized model | ||
from transformers import AutoTokenizer | ||
|
||
tokenizer = AutoTokenizer.from_pretrained(model_path) | ||
input_ids = tokenizer.encode(input_str, ...) | ||
output_ids = model.generate(input_ids, ...) | ||
output = tokenizer.batch_decode(output_ids) | ||
``` | ||
|
||
```eval_rst | ||
.. seealso:: | ||
See the complete examples `here <https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_int4>`_ | ||
.. note:: | ||
You may apply more low bit optimizations (including INT8, INT5 and INT4) as follows: | ||
.. code-block:: python | ||
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int5") | ||
See the complete example `here <https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_low_bit>`_. | ||
``` | ||
|
||
## Save & Load | ||
After the model is optimized using INT4 (or INT8/INT5), you may save and load the optimized model as follows: | ||
|
||
```python | ||
model.save_low_bit(model_path) | ||
|
||
new_model = AutoModelForCausalLM.load_low_bit(model_path) | ||
``` | ||
|
||
```eval_rst | ||
.. seealso:: | ||
See the examples `here <https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/transformers_low_bit>`_ | ||
``` |
19 changes: 19 additions & 0 deletions
19
docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/index.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
BigDL-LLM Key Features | ||
================================ | ||
|
||
You may run the LLMs using ``bigdl-llm`` through one of the following APIs: | ||
|
||
* |transformers_style_api|_ | ||
|
||
* |hugging_face_transformers_format|_ | ||
* `Native Format <./native_format.html>`_ | ||
|
||
* `General PyTorch Model Supports <./langchain_api.html>`_ | ||
* `LangChain API <./langchain_api.html>`_ | ||
* `GPU Supports <./gpu_supports.html>`_ | ||
|
||
.. |transformers_style_api| replace:: ``transformers``-style API | ||
.. _transformers_style_api: ./transformers_style_api.html | ||
|
||
.. |hugging_face_transformers_format| replace:: Hugging Face ``transformers`` Format | ||
.. _hugging_face_transformers_format: ./hugging_face_format.html |
57 changes: 57 additions & 0 deletions
57
docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/langchain_api.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# LangChain API | ||
|
||
You may run the models using the LangChain API in `bigdl-llm`. | ||
|
||
## Using Hugging Face `transformers` INT4 Format | ||
|
||
You may run any Hugging Face *Transformers* model (with INT4 optimiztions applied) using the LangChain API as follows: | ||
|
||
```python | ||
from bigdl.llm.langchain.llms import TransformersLLM | ||
from bigdl.llm.langchain.embeddings import TransformersEmbeddings | ||
from langchain.chains.question_answering import load_qa_chain | ||
|
||
embeddings = TransformersEmbeddings.from_model_id(model_id=model_path) | ||
bigdl_llm = TransformersLLM.from_model_id(model_id=model_path, ...) | ||
|
||
doc_chain = load_qa_chain(bigdl_llm, ...) | ||
output = doc_chain.run(...) | ||
``` | ||
|
||
```eval_rst | ||
.. seealso:: | ||
See the examples `here <https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/langchain/transformers_int4>`_. | ||
``` | ||
|
||
## Using Native INT4 Format | ||
|
||
You may also convert Hugging Face *Transformers* models into native INT4 format, and then run the converted models using the LangChain API as follows. | ||
|
||
```eval_rst | ||
.. note:: | ||
* Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; for other models, you may use the Hugging Face ``transformers`` INT4 format as described `above <./langchain_api.html#using-hugging-face-transformers-int4-format>`_. | ||
* You may choose the corresponding API developed for specific native models to load the converted model. | ||
``` | ||
|
||
```python | ||
from bigdl.llm.langchain.llms import LlamaLLM | ||
from bigdl.llm.langchain.embeddings import LlamaEmbeddings | ||
from langchain.chains.question_answering import load_qa_chain | ||
|
||
# switch to ChatGLMEmbeddings/GptneoxEmbeddings/BloomEmbeddings/StarcoderEmbeddings to load other models | ||
embeddings = LlamaEmbeddings(model_path='/path/to/converted/model.bin') | ||
# switch to ChatGLMLLM/GptneoxLLM/BloomLLM/StarcoderLLM to load other models | ||
bigdl_llm = LlamaLLM(model_path='/path/to/converted/model.bin') | ||
|
||
doc_chain = load_qa_chain(bigdl_llm, ...) | ||
doc_chain.run(...) | ||
``` | ||
|
||
```eval_rst | ||
.. seealso:: | ||
See the examples `here <https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/langchain/native_int4>`_. | ||
``` |
32 changes: 32 additions & 0 deletions
32
docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/native_format.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Native Format | ||
|
||
You may also convert Hugging Face *Transformers* models into native INT4 format for maximum performance as follows. | ||
|
||
```eval_rst | ||
.. note:: | ||
Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; you may use the corresponding API to load the converted model. (For other models, you can use the Hugging Face ``transformers`` format as described `here <./hugging_face_format.html>`_). | ||
``` | ||
|
||
```python | ||
# convert the model | ||
from bigdl.llm import llm_convert | ||
bigdl_llm_path = llm_convert(model='/path/to/model/', | ||
outfile='/path/to/output/', outtype='int4', model_family="llama") | ||
|
||
# load the converted model | ||
# switch to ChatGLMForCausalLM/GptneoxForCausalLM/BloomForCausalLM/StarcoderForCausalLM to load other models | ||
from bigdl.llm.transformers import LlamaForCausalLM | ||
llm = LlamaForCausalLM.from_pretrained("/path/to/output/model.bin", native=True, ...) | ||
|
||
# run the converted model | ||
input_ids = llm.tokenize(prompt) | ||
output_ids = llm.generate(input_ids, ...) | ||
output = llm.batch_decode(output_ids) | ||
``` | ||
|
||
```eval_rst | ||
.. seealso:: | ||
See the complete example `here <https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/native_int4/native_int4_pipeline.py>`_ | ||
``` |
Oops, something went wrong.