Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Readme for FastChat docker demo #12354

Merged
merged 4 commits into from
Nov 7, 2024

Conversation

ATMxsp01
Copy link
Contributor

@ATMxsp01 ATMxsp01 commented Nov 7, 2024

Update Readme for FastChat docker demo

@@ -63,6 +63,70 @@ For convenience, we have included a file `/llm/start-pp_serving-service.sh` in t

To run model-serving using `IPEX-LLM` as backend using FastChat, you can refer to this [quickstart](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/fastchat_quickstart.html#).

In short, you need to start a docker container with `--device=/dev/dri`, a recommended command is:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To set up model serving using IPEX-LLM as the backend with FastChat, you can refer to this Quickstart guide or follow these quick steps to deploy a demo.

Quick Setup for FastChat with IPEX-LLM

  1. Start the Docker Container

    Run the following command to launch a Docker container with device access:

    #/bin/bash
    export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-xpu:latest
    
    sudo docker run -itd \
        --net=host \
        --device=/dev/dri \
        --name=demo-container \
        -v /LLM_MODELS/:/llm/models/ \   # Example: map host model directory to container
        --shm-size="16g" \
        -e http_proxy=... \              # Optional: set proxy if needed
        -e https_proxy=... \
        -e no_proxy="127.0.0.1,localhost" \
        $DOCKER_IMAGE
  2. Start the FastChat Service

    Enter the container and start the FastChat service:

    #/bin/bash
    
    # Stop any existing FastChat processes
    ps -ef | grep "fastchat" | awk '{print $2}' | xargs kill -9
    
    # Install the required Gradio version
    pip install -U gradio==4.43.0
    
    # Launch the FastChat controller
    python -m fastchat.serve.controller &
    
    # Set environment variables for CCL
    export TORCH_LLM_ALLREDUCE=0
    export CCL_DG2_ALLREDUCE=1
    export CCL_WORKER_COUNT=2
    # Optional: Pin CCL workers to specific cores
    # export CCL_WORKER_AFFINITY=32,33,34,35
    export FI_PROVIDER=shm
    export CCL_ATL_TRANSPORT=ofi
    export CCL_ZE_IPC_EXCHANGE=sockets
    export CCL_ATL_SHM=1
    
    # Load Intel CCL settings
    source /opt/intel/1ccl-wks/setvars.sh
    
    # Start the model worker (replace "Yi-1.5-34B" with your model name)
    python -m ipex_llm.serving.fastchat.vllm_worker \
        --model-path /llm/models/Yi-1.5-34B \
        --device xpu \
        --enforce-eager \
        --dtype float16 \
        --load-in-low-bit fp8 \
        --tensor-parallel-size 4 \
        --gpu-memory-utilization 0.9 \
        --max-model-len 4096 \
        --max-num-batched-tokens 8000 &
    
    # Wait for initialization
    sleep 120
    
    # Start the Gradio web server for FastChat
    python -m fastchat.serve.gradio_web_server &

This quick setup allows you to deploy FastChat with IPEX-LLM efficiently.

@liu-shaojun liu-shaojun requested a review from glorysdj November 7, 2024 06:33
Copy link
Contributor

@glorysdj glorysdj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@liu-shaojun liu-shaojun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liu-shaojun liu-shaojun merged commit ce0c6ae into intel-analytics:main Nov 7, 2024
@ATMxsp01 ATMxsp01 deleted the doc-update branch November 14, 2024 02:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants