intel · xwu99 · Feb 26, 2024 · Feb 21, 2024 · Feb 21, 2024 · Feb 21, 2024
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@ LLM-on-Ray is a comprehensive solution designed to empower users in building, cu
 
 LLM-on-Ray harnesses the power of Ray, an industry-leading framework for distributed computing, to scale your AI workloads efficiently. This integration ensures robust fault tolerance and cluster resource management, making your LLM projects more resilient and scalable.
 
-LLM-on-Ray is built to operate across various hardware setups, including Intel CPU, Intel GPU and Intel Gaudi2. It incorporates several industry and Intel optimizations to maximize performance, including [vLLM](https://github.com/vllm-project/vllm), [llama.cpp](https://github.com/ggerganov/llama.cpp), [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch)/[Deepspeed](https://github.com/intel/intel-extension-for-deepspeed), [BigDL-LLM](https://github.com/intel-analytics/BigDL), [RecDP-LLM](https://github.com/intel/e2eAIOK/tree/main/RecDP/pyrecdp/LLM), [NeuralChat](https://huggingface.co/Intel/neural-chat-7b-v3-1) and more. 
+LLM-on-Ray is built to operate across various hardware setups, including Intel CPU, Intel GPU and Intel Gaudi2. It incorporates several industry and Intel optimizations to maximize performance, including [vLLM](https://github.com/vllm-project/vllm), [llama.cpp](https://github.com/ggerganov/llama.cpp), [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch)/[Deepspeed](https://github.com/intel/intel-extension-for-deepspeed), [BigDL-LLM](https://github.com/intel-analytics/BigDL), [RecDP-LLM](https://github.com/intel/e2eAIOK/tree/main/RecDP/pyrecdp/LLM), [NeuralChat](https://huggingface.co/Intel/neural-chat-7b-v3-1) and more.
 
 ## Solution Technical Overview
 LLM-on-Ray's modular workflow structure is designed to comprehensively cater to the various stages of LLM development, from pretraining and finetuning to serving. These workflows are intuitive, highly configurable, and tailored to meet the specific needs of each phase in the LLM lifecycle:
@@ -44,12 +44,15 @@ git clone https://github.com/intel/llm-on-ray.git
 cd llm-on-ray
 conda create -n llm-on-ray python=3.9
 conda activate llm-on-ray
-pip install .[cpu] -f https://developer.intel.com/ipex-whl-stable-cpu -f https://download.pytorch.org/whl/torch_stable.html
-# Dynamic link oneCCL and Intel MPI libraries
-source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl;print(torch_ccl.cwd)")/env/setvars.sh
+pip install .[cpu] --index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
 ```
 
 #### 2. Start Ray
+__[Optional]__ If DeepSpeed is enabled or doing distributed finetuing, oneCCL and Intel MPI libraries should be dynamically linked in every node before Ray starts:
+```bash
+source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl; print(torch_ccl.cwd)")/env/setvars.sh
+```
+
 Start Ray locally using the following command. To launch a Ray cluster, please follow the [setup](docs/setup.md) document.
 ```bash
 ray start --head
@@ -117,7 +120,7 @@ The following are detailed guidelines for pretraining, finetuning and serving LL
 ### Web UI
 * [Finetune and Deploy LLMs through Web UI](docs/web_ui.md)
 
-## Disclaimer 
-To the extent that any public datasets are referenced by Intel or accessed using tools or code on this site those datasets are provided by the third party indicated as the data source. Intel does not create the data, or datasets, and does not warrant their accuracy or quality. By accessing the public dataset(s), or using a model trained on those datasets, you agree to the terms associated with those datasets and that your use complies with the applicable license. 
- 
+## Disclaimer
+To the extent that any public datasets are referenced by Intel or accessed using tools or code on this site those datasets are provided by the third party indicated as the data source. Intel does not create the data, or datasets, and does not warrant their accuracy or quality. By accessing the public dataset(s), or using a model trained on those datasets, you agree to the terms associated with those datasets and that your use complies with the applicable license.
+
 Intel expressly disclaims the accuracy, adequacy, or completeness of any public datasets, and is not liable for any errors, omissions, or defects in the data, or for any reliance on the data.  Intel is not liable for any liability or damages relating to your use of public datasets.
diff --git a/dev/docker/Dockerfile.bigdl-cpu b/dev/docker/Dockerfile.bigdl-cpu
@@ -29,8 +29,8 @@ COPY ./MANIFEST.in .
 
 RUN mkdir ./finetune && mkdir ./inference
 
-RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[bigdl-cpu] -f https://developer.intel.com/ipex-whl-stable-cpu \
-    -f https://download.pytorch.org/whl/torch_stable.html
+RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[bigdl-cpu] --index-url https://download.pytorch.org/whl/cpu \
+    --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
 
 # Used to invalidate docker build cache with --build-arg CACHEBUST=$(date +%s)
 ARG CACHEBUST=1

diff --git a/dev/docker/Dockerfile.cpu_and_deepspeed b/dev/docker/Dockerfile.cpu_and_deepspeed
@@ -29,8 +29,8 @@ COPY ./MANIFEST.in .
 
 RUN mkdir ./finetune && mkdir ./inference
 
-RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu,deepspeed] -f https://developer.intel.com/ipex-whl-stable-cpu \
-    -f https://download.pytorch.org/whl/torch_stable.html
+RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu,deepspeed] --extra-index-url https://download.pytorch.org/whl/cpu \
+    --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
 
 RUN ds_report
 

diff --git a/dev/docker/Dockerfile.cpu_and_deepspeed.pip_non_editable b/dev/docker/Dockerfile.cpu_and_deepspeed.pip_non_editable
@@ -27,8 +27,8 @@ RUN --mount=type=cache,target=/opt/conda/pkgs conda init bash && \
 # copy all checkedout file for later non-editable pip
 COPY . .
 
-RUN --mount=type=cache,target=/root/.cache/pip pip install .[cpu,deepspeed] -f https://developer.intel.com/ipex-whl-stable-cpu \
-    -f https://download.pytorch.org/whl/torch_stable.html
+RUN --mount=type=cache,target=/root/.cache/pip pip install .[cpu,deepspeed] --extra-index-url https://download.pytorch.org/whl/cpu \
+    --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
 
 RUN ds_report
 

diff --git a/dev/docker/Dockerfile.vllm b/dev/docker/Dockerfile.vllm
@@ -30,8 +30,8 @@ COPY ./dev/scripts/install-vllm-cpu.sh .
 
 RUN mkdir ./finetune && mkdir ./inference
 
-RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu] -f https://developer.intel.com/ipex-whl-stable-cpu \
-    -f https://download.pytorch.org/whl/torch_stable.html
+RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu] --extra-index-url https://download.pytorch.org/whl/cpu \
+    --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
 
 # Install vllm-cpu
 # Activate base first for loading g++ envs ($CONDA_PREFIX/etc/conda/activate.d/*)

diff --git a/docs/serve_bigdl.md b/docs/serve_bigdl.md
@@ -6,7 +6,7 @@ The integration with BigDL-LLM currently only supports running on Intel CPU.
 ## Setup
 Please follow [setup.md](setup.md) to setup the environment first. Additional, you will need to install bigdl dependencies as below.
 ```bash
-pip install .[bigdl-cpu] -f https://developer.intel.com/ipex-whl-stable-cpu -f https://download.pytorch.org/whl/torch_stable.html
+pip install .[bigdl-cpu] --extra-index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
 ```
 
 ## Configure Serving Parameters

diff --git a/docs/setup.md b/docs/setup.md
@@ -40,15 +40,15 @@ conda activate llm-on-ray
 ```
 For CPU:
 ```bash
-pip install .[cpu] -f https://developer.intel.com/ipex-whl-stable-cpu -f https://download.pytorch.org/whl/torch_stable.html
+pip install .[cpu] --extra-index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
 ```
 For GPU:
 ```bash
-pip install .[gpu] --extra-index-url https://developer.intel.com/ipex-whl-stable-xpu
+pip install .[gpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 ```
 If DeepSpeed is enabled or doing distributed finetuing, oneCCL and Intel MPI libraries should be dynamically linked in every node before Ray starts:
 ```bash
-source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl;print(torch_ccl.cwd)")/env/setvars.sh
+source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl; print(torch_ccl.cwd)")/env/setvars.sh
 ```
 
 For Gaudi:
@@ -68,7 +68,7 @@ docker build \
 After the image is built successfully, start a container:
 
 ```bash
-docker run -it --runtime=habana -v ./llm-on-ray:/root/llm-ray --name="llm-ray-habana-demo" llm-ray-habana:latest 
+docker run -it --runtime=habana -v ./llm-on-ray:/root/llm-ray --name="llm-ray-habana-demo" llm-ray-habana:latest
 ```
 
 #### 3. Launch Ray cluster

diff --git a/pyproject.toml b/pyproject.toml
@@ -43,7 +43,7 @@ cpu = [
     "transformers>=4.35.0",
     "intel_extension_for_pytorch==2.1.0+cpu",
     "torch==2.1.0+cpu",
-    "oneccl_bind_pt==2.1.0"
+    "oneccl_bind_pt==2.1.0+cpu"
 ]
 
 gpu = [