diff --git a/README.md b/README.md index deda1b1fe..22bc2e4cd 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ LLM-on-Ray is a comprehensive solution designed to empower users in building, cu LLM-on-Ray harnesses the power of Ray, an industry-leading framework for distributed computing, to scale your AI workloads efficiently. This integration ensures robust fault tolerance and cluster resource management, making your LLM projects more resilient and scalable. -LLM-on-Ray is built to operate across various hardware setups, including Intel CPU, Intel GPU and Intel Gaudi2. It incorporates several industry and Intel optimizations to maximize performance, including [vLLM](https://github.com/vllm-project/vllm), [llama.cpp](https://github.com/ggerganov/llama.cpp), [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch)/[Deepspeed](https://github.com/intel/intel-extension-for-deepspeed), [BigDL-LLM](https://github.com/intel-analytics/BigDL), [RecDP-LLM](https://github.com/intel/e2eAIOK/tree/main/RecDP/pyrecdp/LLM), [NeuralChat](https://huggingface.co/Intel/neural-chat-7b-v3-1) and more. +LLM-on-Ray is built to operate across various hardware setups, including Intel CPU, Intel GPU and Intel Gaudi2. It incorporates several industry and Intel optimizations to maximize performance, including [vLLM](https://github.com/vllm-project/vllm), [llama.cpp](https://github.com/ggerganov/llama.cpp), [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch)/[Deepspeed](https://github.com/intel/intel-extension-for-deepspeed), [BigDL-LLM](https://github.com/intel-analytics/BigDL), [RecDP-LLM](https://github.com/intel/e2eAIOK/tree/main/RecDP/pyrecdp/LLM), [NeuralChat](https://huggingface.co/Intel/neural-chat-7b-v3-1) and more. ## Solution Technical Overview LLM-on-Ray's modular workflow structure is designed to comprehensively cater to the various stages of LLM development, from pretraining and finetuning to serving. These workflows are intuitive, highly configurable, and tailored to meet the specific needs of each phase in the LLM lifecycle: @@ -44,12 +44,15 @@ git clone https://github.com/intel/llm-on-ray.git cd llm-on-ray conda create -n llm-on-ray python=3.9 conda activate llm-on-ray -pip install .[cpu] -f https://developer.intel.com/ipex-whl-stable-cpu -f https://download.pytorch.org/whl/torch_stable.html -# Dynamic link oneCCL and Intel MPI libraries -source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl;print(torch_ccl.cwd)")/env/setvars.sh +pip install .[cpu] --extra-index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/ ``` #### 2. Start Ray +__[Optional]__ If DeepSpeed is enabled or doing distributed finetuing, oneCCL and Intel MPI libraries should be dynamically linked in every node before Ray starts: +```bash +source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl; print(torch_ccl.cwd)")/env/setvars.sh +``` + Start Ray locally using the following command. To launch a Ray cluster, please follow the [setup](docs/setup.md) document. ```bash ray start --head @@ -117,7 +120,7 @@ The following are detailed guidelines for pretraining, finetuning and serving LL ### Web UI * [Finetune and Deploy LLMs through Web UI](docs/web_ui.md) -## Disclaimer -To the extent that any public datasets are referenced by Intel or accessed using tools or code on this site those datasets are provided by the third party indicated as the data source. Intel does not create the data, or datasets, and does not warrant their accuracy or quality. By accessing the public dataset(s), or using a model trained on those datasets, you agree to the terms associated with those datasets and that your use complies with the applicable license. - +## Disclaimer +To the extent that any public datasets are referenced by Intel or accessed using tools or code on this site those datasets are provided by the third party indicated as the data source. Intel does not create the data, or datasets, and does not warrant their accuracy or quality. By accessing the public dataset(s), or using a model trained on those datasets, you agree to the terms associated with those datasets and that your use complies with the applicable license. + Intel expressly disclaims the accuracy, adequacy, or completeness of any public datasets, and is not liable for any errors, omissions, or defects in the data, or for any reliance on the data. Intel is not liable for any liability or damages relating to your use of public datasets. diff --git a/dev/docker/Dockerfile.bigdl-cpu b/dev/docker/Dockerfile.bigdl-cpu index 8eb38d4bf..411449e41 100644 --- a/dev/docker/Dockerfile.bigdl-cpu +++ b/dev/docker/Dockerfile.bigdl-cpu @@ -29,8 +29,8 @@ COPY ./MANIFEST.in . RUN mkdir ./finetune && mkdir ./inference -RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[bigdl-cpu] -f https://developer.intel.com/ipex-whl-stable-cpu \ - -f https://download.pytorch.org/whl/torch_stable.html +RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[bigdl-cpu] --extra-index-url https://download.pytorch.org/whl/cpu \ + --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/ # Used to invalidate docker build cache with --build-arg CACHEBUST=$(date +%s) ARG CACHEBUST=1 diff --git a/dev/docker/Dockerfile.cpu_and_deepspeed b/dev/docker/Dockerfile.cpu_and_deepspeed index 7f4847ae0..5371fae78 100644 --- a/dev/docker/Dockerfile.cpu_and_deepspeed +++ b/dev/docker/Dockerfile.cpu_and_deepspeed @@ -29,8 +29,8 @@ COPY ./MANIFEST.in . RUN mkdir ./finetune && mkdir ./inference -RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu,deepspeed] -f https://developer.intel.com/ipex-whl-stable-cpu \ - -f https://download.pytorch.org/whl/torch_stable.html +RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu,deepspeed] --extra-index-url https://download.pytorch.org/whl/cpu \ + --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/ RUN ds_report diff --git a/dev/docker/Dockerfile.cpu_and_deepspeed.pip_non_editable b/dev/docker/Dockerfile.cpu_and_deepspeed.pip_non_editable index 217bf4a23..82eef4aa8 100644 --- a/dev/docker/Dockerfile.cpu_and_deepspeed.pip_non_editable +++ b/dev/docker/Dockerfile.cpu_and_deepspeed.pip_non_editable @@ -27,8 +27,8 @@ RUN --mount=type=cache,target=/opt/conda/pkgs conda init bash && \ # copy all checkedout file for later non-editable pip COPY . . -RUN --mount=type=cache,target=/root/.cache/pip pip install .[cpu,deepspeed] -f https://developer.intel.com/ipex-whl-stable-cpu \ - -f https://download.pytorch.org/whl/torch_stable.html +RUN --mount=type=cache,target=/root/.cache/pip pip install .[cpu,deepspeed] --extra-index-url https://download.pytorch.org/whl/cpu \ + --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/ RUN ds_report diff --git a/dev/docker/Dockerfile.vllm b/dev/docker/Dockerfile.vllm index e4eb63d06..23d4bbe48 100644 --- a/dev/docker/Dockerfile.vllm +++ b/dev/docker/Dockerfile.vllm @@ -30,8 +30,8 @@ COPY ./dev/scripts/install-vllm-cpu.sh . RUN mkdir ./finetune && mkdir ./inference -RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu] -f https://developer.intel.com/ipex-whl-stable-cpu \ - -f https://download.pytorch.org/whl/torch_stable.html +RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu] --extra-index-url https://download.pytorch.org/whl/cpu \ + --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/ # Install vllm-cpu # Activate base first for loading g++ envs ($CONDA_PREFIX/etc/conda/activate.d/*) diff --git a/dev/k8s/Dockerfile b/dev/k8s/Dockerfile index 7b99fc808..8dcccdb5e 100644 --- a/dev/k8s/Dockerfile +++ b/dev/k8s/Dockerfile @@ -23,7 +23,7 @@ RUN "$HOME/anaconda3/bin/pip install accelerate==0.19.0" \ "$HOME/anaconda3/bin/pip install gymnasium" \ "$HOME/anaconda3/bin/pip install dm-tree" \ "$HOME/anaconda3/bin/pip install scikit-image" \ - "$HOME/anaconda3/bin/pip install oneccl_bind_pt==1.13 -f https://developer.intel.com/ipex-whl-stable-cpu" + "$HOME/anaconda3/bin/pip install oneccl_bind_pt==1.13 --extra-index-url https://developer.intel.com/ipex-whl-stable-cpu" # set http_proxy & https_proxy ENV http_proxy=${http_proxy} diff --git a/dev/scripts/install-vllm-cpu.sh b/dev/scripts/install-vllm-cpu.sh index 64b3690a4..7e96ba5ba 100755 --- a/dev/scripts/install-vllm-cpu.sh +++ b/dev/scripts/install-vllm-cpu.sh @@ -17,4 +17,4 @@ version_greater_equal "${gcc_version}" 12.3.0 || { echo "GNU C++ Compiler 12.3.0 # Install from source MAX_JOBS=8 pip install -v git+https://github.com/bigPYJ1151/vllm@PR_Branch \ - -f https://download.pytorch.org/whl/torch_stable.html + --extra-index-url https://download.pytorch.org/whl/cpu diff --git a/docs/serve_bigdl.md b/docs/serve_bigdl.md index 7bfb4001b..ae697bec9 100644 --- a/docs/serve_bigdl.md +++ b/docs/serve_bigdl.md @@ -6,7 +6,7 @@ The integration with BigDL-LLM currently only supports running on Intel CPU. ## Setup Please follow [setup.md](setup.md) to setup the environment first. Additional, you will need to install bigdl dependencies as below. ```bash -pip install .[bigdl-cpu] -f https://developer.intel.com/ipex-whl-stable-cpu -f https://download.pytorch.org/whl/torch_stable.html +pip install .[bigdl-cpu] --extra-index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/ ``` ## Configure Serving Parameters diff --git a/docs/setup.md b/docs/setup.md index 5748d9785..82ed07160 100644 --- a/docs/setup.md +++ b/docs/setup.md @@ -40,15 +40,15 @@ conda activate llm-on-ray ``` For CPU: ```bash -pip install .[cpu] -f https://developer.intel.com/ipex-whl-stable-cpu -f https://download.pytorch.org/whl/torch_stable.html +pip install .[cpu] --extra-index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/ ``` For GPU: ```bash -pip install .[gpu] --extra-index-url https://developer.intel.com/ipex-whl-stable-xpu +pip install .[gpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ ``` If DeepSpeed is enabled or doing distributed finetuing, oneCCL and Intel MPI libraries should be dynamically linked in every node before Ray starts: ```bash -source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl;print(torch_ccl.cwd)")/env/setvars.sh +source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl; print(torch_ccl.cwd)")/env/setvars.sh ``` For Gaudi: @@ -68,7 +68,7 @@ docker build \ After the image is built successfully, start a container: ```bash -docker run -it --runtime=habana -v ./llm-on-ray:/root/llm-ray --name="llm-ray-habana-demo" llm-ray-habana:latest +docker run -it --runtime=habana -v ./llm-on-ray:/root/llm-ray --name="llm-ray-habana-demo" llm-ray-habana:latest ``` #### 3. Launch Ray cluster diff --git a/pyproject.toml b/pyproject.toml index d43848a1f..98ae293b0 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -43,7 +43,7 @@ cpu = [ "transformers>=4.35.0", "intel_extension_for_pytorch==2.1.0+cpu", "torch==2.1.0+cpu", - "oneccl_bind_pt==2.1.0" + "oneccl_bind_pt==2.1.0+cpu" ] gpu = [