Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update install & doc by using latest repo from ipex #111

Merged
merged 7 commits into from
Feb 26, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 10 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ LLM-on-Ray is a comprehensive solution designed to empower users in building, cu

LLM-on-Ray harnesses the power of Ray, an industry-leading framework for distributed computing, to scale your AI workloads efficiently. This integration ensures robust fault tolerance and cluster resource management, making your LLM projects more resilient and scalable.

LLM-on-Ray is built to operate across various hardware setups, including Intel CPU, Intel GPU and Intel Gaudi2. It incorporates several industry and Intel optimizations to maximize performance, including [vLLM](https://github.com/vllm-project/vllm), [llama.cpp](https://github.com/ggerganov/llama.cpp), [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch)/[Deepspeed](https://github.com/intel/intel-extension-for-deepspeed), [BigDL-LLM](https://github.com/intel-analytics/BigDL), [RecDP-LLM](https://github.com/intel/e2eAIOK/tree/main/RecDP/pyrecdp/LLM), [NeuralChat](https://huggingface.co/Intel/neural-chat-7b-v3-1) and more.
LLM-on-Ray is built to operate across various hardware setups, including Intel CPU, Intel GPU and Intel Gaudi2. It incorporates several industry and Intel optimizations to maximize performance, including [vLLM](https://github.com/vllm-project/vllm), [llama.cpp](https://github.com/ggerganov/llama.cpp), [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch)/[Deepspeed](https://github.com/intel/intel-extension-for-deepspeed), [BigDL-LLM](https://github.com/intel-analytics/BigDL), [RecDP-LLM](https://github.com/intel/e2eAIOK/tree/main/RecDP/pyrecdp/LLM), [NeuralChat](https://huggingface.co/Intel/neural-chat-7b-v3-1) and more.

## Solution Technical Overview
LLM-on-Ray's modular workflow structure is designed to comprehensively cater to the various stages of LLM development, from pretraining and finetuning to serving. These workflows are intuitive, highly configurable, and tailored to meet the specific needs of each phase in the LLM lifecycle:
Expand Down Expand Up @@ -44,12 +44,15 @@ git clone https://github.com/intel/llm-on-ray.git
cd llm-on-ray
conda create -n llm-on-ray python=3.9
conda activate llm-on-ray
pip install .[cpu] -f https://developer.intel.com/ipex-whl-stable-cpu -f https://download.pytorch.org/whl/torch_stable.html
# Dynamic link oneCCL and Intel MPI libraries
source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl;print(torch_ccl.cwd)")/env/setvars.sh
pip install .[cpu] --index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
```

#### 2. Start Ray
__[Optional]__ If DeepSpeed is enabled or doing distributed finetuing, oneCCL and Intel MPI libraries should be dynamically linked in every node before Ray starts:
```bash
source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl; print(torch_ccl.cwd)")/env/setvars.sh
```

Start Ray locally using the following command. To launch a Ray cluster, please follow the [setup](docs/setup.md) document.
```bash
ray start --head
Expand Down Expand Up @@ -117,7 +120,7 @@ The following are detailed guidelines for pretraining, finetuning and serving LL
### Web UI
* [Finetune and Deploy LLMs through Web UI](docs/web_ui.md)

## Disclaimer
To the extent that any public datasets are referenced by Intel or accessed using tools or code on this site those datasets are provided by the third party indicated as the data source. Intel does not create the data, or datasets, and does not warrant their accuracy or quality. By accessing the public dataset(s), or using a model trained on those datasets, you agree to the terms associated with those datasets and that your use complies with the applicable license.
## Disclaimer
To the extent that any public datasets are referenced by Intel or accessed using tools or code on this site those datasets are provided by the third party indicated as the data source. Intel does not create the data, or datasets, and does not warrant their accuracy or quality. By accessing the public dataset(s), or using a model trained on those datasets, you agree to the terms associated with those datasets and that your use complies with the applicable license.

Intel expressly disclaims the accuracy, adequacy, or completeness of any public datasets, and is not liable for any errors, omissions, or defects in the data, or for any reliance on the data. Intel is not liable for any liability or damages relating to your use of public datasets.
4 changes: 2 additions & 2 deletions dev/docker/Dockerfile.bigdl-cpu
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ COPY ./MANIFEST.in .

RUN mkdir ./finetune && mkdir ./inference

RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[bigdl-cpu] -f https://developer.intel.com/ipex-whl-stable-cpu \
-f https://download.pytorch.org/whl/torch_stable.html
RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[bigdl-cpu] --index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/

# Used to invalidate docker build cache with --build-arg CACHEBUST=$(date +%s)
ARG CACHEBUST=1
Expand Down
4 changes: 2 additions & 2 deletions dev/docker/Dockerfile.cpu_and_deepspeed
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ COPY ./MANIFEST.in .

RUN mkdir ./finetune && mkdir ./inference

RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu,deepspeed] -f https://developer.intel.com/ipex-whl-stable-cpu \
-f https://download.pytorch.org/whl/torch_stable.html
RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu,deepspeed] --extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
xwu99 marked this conversation as resolved.
Show resolved Hide resolved

RUN ds_report

Expand Down
4 changes: 2 additions & 2 deletions dev/docker/Dockerfile.cpu_and_deepspeed.pip_non_editable
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ RUN --mount=type=cache,target=/opt/conda/pkgs conda init bash && \
# copy all checkedout file for later non-editable pip
COPY . .

RUN --mount=type=cache,target=/root/.cache/pip pip install .[cpu,deepspeed] -f https://developer.intel.com/ipex-whl-stable-cpu \
-f https://download.pytorch.org/whl/torch_stable.html
RUN --mount=type=cache,target=/root/.cache/pip pip install .[cpu,deepspeed] --extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/

RUN ds_report

Expand Down
4 changes: 2 additions & 2 deletions dev/docker/Dockerfile.vllm
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ COPY ./dev/scripts/install-vllm-cpu.sh .

RUN mkdir ./finetune && mkdir ./inference

RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu] -f https://developer.intel.com/ipex-whl-stable-cpu \
-f https://download.pytorch.org/whl/torch_stable.html
RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu] --extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/

# Install vllm-cpu
# Activate base first for loading g++ envs ($CONDA_PREFIX/etc/conda/activate.d/*)
Expand Down
2 changes: 1 addition & 1 deletion docs/serve_bigdl.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ The integration with BigDL-LLM currently only supports running on Intel CPU.
## Setup
Please follow [setup.md](setup.md) to setup the environment first. Additional, you will need to install bigdl dependencies as below.
```bash
pip install .[bigdl-cpu] -f https://developer.intel.com/ipex-whl-stable-cpu -f https://download.pytorch.org/whl/torch_stable.html
pip install .[bigdl-cpu] --extra-index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
```

## Configure Serving Parameters
Expand Down
8 changes: 4 additions & 4 deletions docs/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,15 +40,15 @@ conda activate llm-on-ray
```
For CPU:
```bash
pip install .[cpu] -f https://developer.intel.com/ipex-whl-stable-cpu -f https://download.pytorch.org/whl/torch_stable.html
pip install .[cpu] --extra-index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
```
For GPU:
```bash
pip install .[gpu] --extra-index-url https://developer.intel.com/ipex-whl-stable-xpu
pip install .[gpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
```
If DeepSpeed is enabled or doing distributed finetuing, oneCCL and Intel MPI libraries should be dynamically linked in every node before Ray starts:
```bash
source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl;print(torch_ccl.cwd)")/env/setvars.sh
source $(python -c "import oneccl_bindings_for_pytorch as torch_ccl; print(torch_ccl.cwd)")/env/setvars.sh
```

For Gaudi:
Expand All @@ -68,7 +68,7 @@ docker build \
After the image is built successfully, start a container:

```bash
docker run -it --runtime=habana -v ./llm-on-ray:/root/llm-ray --name="llm-ray-habana-demo" llm-ray-habana:latest
docker run -it --runtime=habana -v ./llm-on-ray:/root/llm-ray --name="llm-ray-habana-demo" llm-ray-habana:latest
```

#### 3. Launch Ray cluster
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ cpu = [
"transformers>=4.35.0",
"intel_extension_for_pytorch==2.1.0+cpu",
"torch==2.1.0+cpu",
"oneccl_bind_pt==2.1.0"
"oneccl_bind_pt==2.1.0+cpu"
]

gpu = [
Expand Down
Loading