From 2943475ff32aa1b62ccb70b3b52e6062277c6a9f Mon Sep 17 00:00:00 2001 From: gc-fu Date: Tue, 26 Mar 2024 16:40:47 +0800 Subject: [PATCH 1/2] fix fastchat --- python/llm/src/ipex_llm/serving/fastchat/README.md | 12 ++++++------ .../fastchat/{bigdl_worker.py => ipex_llm_worker.py} | 0 2 files changed, 6 insertions(+), 6 deletions(-) rename python/llm/src/ipex_llm/serving/fastchat/{bigdl_worker.py => ipex_llm_worker.py} (100%) diff --git a/python/llm/src/ipex_llm/serving/fastchat/README.md b/python/llm/src/ipex_llm/serving/fastchat/README.md index b40f3956ae1..056d1bea439 100644 --- a/python/llm/src/ipex_llm/serving/fastchat/README.md +++ b/python/llm/src/ipex_llm/serving/fastchat/README.md @@ -33,7 +33,7 @@ pip install --pre --upgrade ipex-llm[all] To add GPU support for FastChat, you may install **`ipex-llm`** as follows: ```bash -pip install --pre --upgrade ipex-llm[xpu, serving] -f https://developer.intel.com/ipex-whl-stable-xpu +pip install --pre --upgrade ipex-llm[xpu,serving] -f https://developer.intel.com/ipex-whl-stable-xpu ``` @@ -87,24 +87,24 @@ INFO - Converting the current model to sym_int4 format...... #### IPEX-LLM worker -To integrate IPEX-LLM with `FastChat` efficiently, we have provided a new model_worker implementation named `ipex_worker.py`. +To integrate IPEX-LLM with `FastChat` efficiently, we have provided a new model_worker implementation named `ipex_llm_worker.py`. -To run the `ipex_worker` on CPU, using the following code: +To run the `ipex_llm_worker` on CPU, using the following code: ```bash source ipex-llm-init -t # Available low_bit format including sym_int4, sym_int8, bf16 etc. -python3 -m ipex_llm.serving.fastchat.ipex_worker --model-path lmsys/vicuna-7b-v1.5 --low-bit "sym_int4" --trust-remote-code --device "cpu" +python3 -m ipex_llm.serving.fastchat.ipex_llm_worker --model-path lmsys/vicuna-7b-v1.5 --low-bit "sym_int4" --trust-remote-code --device "cpu" ``` For GPU example: ```bash # Available low_bit format including sym_int4, sym_int8, fp16 etc. -python3 -m ipex_llm.serving.fastcaht.ipex_worker --model-path lmsys/vicuna-7b-v1.5 --low-bit "sym_int4" --trust-remote-code --device "xpu" +python3 -m ipex_llm.serving.fastchat.ipex_llm_worker --model-path lmsys/vicuna-7b-v1.5 --low-bit "sym_int4" --trust-remote-code --device "xpu" ``` -For a full list of accepted arguments, you can refer to the main method of the `ipex_worker.py` +For a full list of accepted arguments, you can refer to the main method of the `ipex_llm_worker.py` #### IPEX vLLM model worker diff --git a/python/llm/src/ipex_llm/serving/fastchat/bigdl_worker.py b/python/llm/src/ipex_llm/serving/fastchat/ipex_llm_worker.py similarity index 100% rename from python/llm/src/ipex_llm/serving/fastchat/bigdl_worker.py rename to python/llm/src/ipex_llm/serving/fastchat/ipex_llm_worker.py From 11c8111a860645334acca47bf8ff0f0a7c9b95a3 Mon Sep 17 00:00:00 2001 From: gc-fu Date: Tue, 26 Mar 2024 16:56:30 +0800 Subject: [PATCH 2/2] fix doc --- python/llm/src/ipex_llm/serving/fastchat/README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/python/llm/src/ipex_llm/serving/fastchat/README.md b/python/llm/src/ipex_llm/serving/fastchat/README.md index 056d1bea439..041f7efeb1e 100644 --- a/python/llm/src/ipex_llm/serving/fastchat/README.md +++ b/python/llm/src/ipex_llm/serving/fastchat/README.md @@ -11,9 +11,9 @@ IPEX-LLM can be easily integrated into FastChat so that user can use `IPEX-LLM` - [Start the service](#start-the-service) - [Launch controller](#launch-controller) - [Launch model worker(s) and load models](#launch-model-workers-and-load-models) - - [IPEX model worker (deprecated)](#ipex-model-worker-deprecated) - - [IPEX worker](#ipex-llm-worker) - - [IPEX vLLM model worker](#vllm-model-worker) + - [IPEX-LLM model worker (deprecated)](#ipex-llm-model-worker-deprecated) + - [IPEX-LLM worker](#ipex-llm-worker) + - [IPEX-LLM vLLM worker](#ipex-llm-vllm-worker) - [Launch Gradio web server](#launch-gradio-web-server) - [Launch RESTful API server](#launch-restful-api-server) @@ -51,7 +51,7 @@ python3 -m fastchat.serve.controller Using IPEX-LLM in FastChat does not impose any new limitations on model usage. Therefore, all Hugging Face Transformer models can be utilized in FastChat. -#### IPEX model worker (deprecated) +#### IPEX-LLM model worker (deprecated)
details @@ -106,7 +106,7 @@ python3 -m ipex_llm.serving.fastchat.ipex_llm_worker --model-path lmsys/vicuna-7 For a full list of accepted arguments, you can refer to the main method of the `ipex_llm_worker.py` -#### IPEX vLLM model worker +#### IPEX-LLM vLLM worker We also provide the `vllm_worker` which uses the [vLLM](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/vLLM-Serving) engine for better hardware utilization.