Skip to content

Commit

Permalink
Fix package import path (#106)
Browse files Browse the repository at this point in the history
* mv path

* modify import path

* modify package name

* update path

* update

* disable mpt-7b-bigdl

* update

* update for ui

* modify llmonray to llm_on_ray

* simply execution command

* test

* test

* modify

* fix

* update & disable vllm tempeorary

* test

* test

* test

* recover

* update

* fix vllm

* update

* move mllm path

* modify

* fix err

* remove import_all_modules

* Update .github/workflows/workflow_finetune.yml

Signed-off-by: Xiaochang Wu <[email protected]>

* add comment

* add comment

---------

Signed-off-by: Xiaochang Wu <[email protected]>
Co-authored-by: Xiaochang Wu <[email protected]>
  • Loading branch information
KepingYan and xwu99 authored Mar 7, 2024
1 parent c353e4a commit 8a97dbe
Show file tree
Hide file tree
Showing 157 changed files with 275 additions and 323 deletions.
12 changes: 6 additions & 6 deletions .github/workflows/workflow_finetune.yml
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ jobs:
docker exec "finetune" bash -c "source \$(python -c 'import oneccl_bindings_for_pytorch as torch_ccl;print(torch_ccl.cwd)')/env/setvars.sh; RAY_SERVE_ENABLE_EXPERIMENTAL_STREAMING=1 ray start --head --node-ip-address 127.0.0.1 --ray-debugger-external; RAY_SERVE_ENABLE_EXPERIMENTAL_STREAMING=1 ray start --address='127.0.0.1:6379' --ray-debugger-external"
CMD=$(cat << EOF
import yaml
conf_path = "finetune/finetune.yaml"
conf_path = "llm_on_ray/finetune/finetune.yaml"
with open(conf_path, encoding="utf-8") as reader:
result = yaml.load(reader, Loader=yaml.FullLoader)
result['General']['base_model'] = "${{ matrix.model }}"
Expand Down Expand Up @@ -113,14 +113,14 @@ jobs:
EOF
)
docker exec "finetune" python -c "$CMD"
docker exec "finetune" bash -c "python finetune/finetune.py --config_file finetune/finetune.yaml"
docker exec "finetune" bash -c "llm_on_ray-finetune --config_file llm_on_ray/finetune/finetune.yaml"
- name: Run PEFT-LoRA Test
run: |
docker exec "finetune" bash -c "rm -rf /tmp/llm-ray/*"
CMD=$(cat << EOF
import yaml
conf_path = "finetune/finetune.yaml"
conf_path = "llm_on_ray/finetune/finetune.yaml"
with open(conf_path, encoding="utf-8") as reader:
result = yaml.load(reader, Loader=yaml.FullLoader)
result['General']['lora_config'] = {
Expand All @@ -138,7 +138,7 @@ jobs:
EOF
)
docker exec "finetune" python -c "$CMD"
docker exec "finetune" bash -c "python finetune/finetune.py --config_file finetune/finetune.yaml"
docker exec "finetune" bash -c "llm_on_ray-finetune --config_file llm_on_ray/finetune/finetune.yaml"
- name: Run Deltatuner Test on DENAS-LoRA Model
run: |
Expand All @@ -150,7 +150,7 @@ jobs:
import os
import yaml
os.system("cp -r $(python -m pip show deltatuner | grep Location | cut -d: -f2)/deltatuner/conf/best_structure examples/")
conf_path = "finetune/finetune.yaml"
conf_path = "llm_on_ray/finetune/finetune.yaml"
with open(conf_path, encoding="utf-8") as reader:
result = yaml.load(reader, Loader=yaml.FullLoader)
result['General']['lora_config'] = {
Expand All @@ -168,7 +168,7 @@ jobs:
yaml.dump(result, output, sort_keys=False)
EOF)
docker exec "finetune" python -c "$CMD"
docker exec "finetune" bash -c "python finetune/finetune.py --config_file finetune/finetune.yaml"
docker exec "finetune" bash -c "llm_on_ray-finetune --config_file llm_on_ray/finetune/finetune.yaml"
fi
- name: Stop Ray
Expand Down
28 changes: 11 additions & 17 deletions .github/workflows/workflow_inference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -118,14 +118,14 @@ jobs:
CMD=$(cat << EOF
import yaml
if ("${{ matrix.model }}" == "starcoder"):
conf_path = "inference/models/starcoder.yaml"
conf_path = "llm_on_ray/inference/models/starcoder.yaml"
with open(conf_path, encoding="utf-8") as reader:
result = yaml.load(reader, Loader=yaml.FullLoader)
result['model_description']["config"]["use_auth_token"] = "${{ env.HF_ACCESS_TOKEN }}"
with open(conf_path, 'w') as output:
yaml.dump(result, output, sort_keys=False)
if ("${{ matrix.model }}" == "llama-2-7b-chat-hf"):
conf_path = "inference/models/llama-2-7b-chat-hf.yaml"
conf_path = "llm_on_ray/inference/models/llama-2-7b-chat-hf.yaml"
with open(conf_path, encoding="utf-8") as reader:
result = yaml.load(reader, Loader=yaml.FullLoader)
result['model_description']["config"]["use_auth_token"] = "${{ env.HF_ACCESS_TOKEN }}"
Expand All @@ -135,11 +135,11 @@ jobs:
)
docker exec "${TARGET}" python -c "$CMD"
if [[ ${{ matrix.model }} == "mpt-7b-bigdl" ]]; then
docker exec "${TARGET}" bash -c "python inference/serve.py --config_file inference/models/bigdl/mpt-7b-bigdl.yaml --simple"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --config_file llm_on_ray/inference/models/bigdl/mpt-7b-bigdl.yaml --simple"
elif [[ ${{ matrix.model }} == "llama-2-7b-chat-hf-vllm" ]]; then
docker exec "${TARGET}" bash -c "python inference/serve.py --config_file .github/workflows/config/llama-2-7b-chat-hf-vllm-fp32.yaml --simple"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --config_file .github/workflows/config/llama-2-7b-chat-hf-vllm-fp32.yaml --simple"
else
docker exec "${TARGET}" bash -c "python inference/serve.py --simple --models ${{ matrix.model }}"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --simple --models ${{ matrix.model }}"
fi
echo Non-streaming query:
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }}"
Expand All @@ -150,7 +150,7 @@ jobs:
if: ${{ matrix.dtuner_model }}
run: |
TARGET=${{steps.target.outputs.target}}
docker exec "${TARGET}" bash -c "python inference/serve.py --config_file .github/workflows/config/mpt_deltatuner.yaml --simple"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --config_file .github/workflows/config/mpt_deltatuner.yaml --simple"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }}"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }} --streaming_response"
Expand All @@ -160,8 +160,8 @@ jobs:
if [[ ${{ matrix.model }} =~ ^(gpt2|falcon-7b|starcoder|mpt-7b.*)$ ]]; then
echo ${{ matrix.model }} is not supported!
elif [[ ! ${{ matrix.model }} == "llama-2-7b-chat-hf-vllm" ]]; then
docker exec "${TARGET}" bash -c "python .github/workflows/config/update_inference_config.py --config_file inference/models/\"${{ matrix.model }}\".yaml --output_file \"${{ matrix.model }}\".yaml.deepspeed --deepspeed"
docker exec "${TARGET}" bash -c "python inference/serve.py --config_file \"${{ matrix.model }}\".yaml.deepspeed --simple"
docker exec "${TARGET}" bash -c "python .github/workflows/config/update_inference_config.py --config_file llm_on_ray/inference/models/\"${{ matrix.model }}\".yaml --output_file \"${{ matrix.model }}\".yaml.deepspeed --deepspeed"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --config_file \"${{ matrix.model }}\".yaml.deepspeed --simple"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }}"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }} --streaming_response"
fi
Expand All @@ -173,7 +173,7 @@ jobs:
if [[ ${{ matrix.model }} =~ ^(gpt2|falcon-7b|starcoder|mpt-7b.*)$ ]]; then
echo ${{ matrix.model }} is not supported!
else
docker exec "${TARGET}" bash -c "python inference/serve.py --config_file .github/workflows/config/mpt_deltatuner_deepspeed.yaml --simple"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --config_file .github/workflows/config/mpt_deltatuner_deepspeed.yaml --simple"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }}"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }} --streaming_response"
fi
Expand All @@ -182,9 +182,9 @@ jobs:
run: |
TARGET=${{steps.target.outputs.target}}
if [[ ${{ matrix.model }} == "mpt-7b-bigdl" ]]; then
docker exec "${TARGET}" bash -c "python inference/serve.py --config_file inference/models/bigdl/mpt-7b-bigdl.yaml"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --config_file llm_on_ray/inference/models/bigdl/mpt-7b-bigdl.yaml"
elif [[ ! ${{ matrix.model }} == "llama-2-7b-chat-hf-vllm" ]]; then
docker exec "${TARGET}" bash -c "python inference/serve.py --models ${{ matrix.model }}"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --models ${{ matrix.model }}"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_openai/query_http_requests.py --model_name ${{ matrix.model }}"
fi
Expand All @@ -202,9 +202,3 @@ jobs:
TARGET=${{steps.target.outputs.target}}
cid=$(docker ps -q --filter "name=${TARGET}")
if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid; fi
- name: Test Summary
run: echo "to be continued"



8 changes: 4 additions & 4 deletions .github/workflows/workflow_orders_on_merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ on:
paths:
- '.github/**'
- 'docker/**'
- 'common/**'
- 'dev/docker/**'
- 'finetune/**'
- 'inference/**'
- 'rlhf/**'
- 'llm_on_ray/common/**'
- 'llm_on_ray/finetune/**'
- 'llm_on_ray/inference/**'
- 'llm_on_ray/rlhf/**'
- 'tools/**'
- 'pyproject.toml'
- 'tests/**'
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/workflow_orders_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ on:
paths:
- '.github/**'
- 'docker/**'
- 'common/**'
- 'dev/docker/**'
- 'finetune/**'
- 'inference/**'
- 'rlhf/**'
- 'llm_on_ray/common/**'
- 'llm_on_ray/finetune/**'
- 'llm_on_ray/inference/**'
- 'llm_on_ray/rlhf/**'
- 'tools/**'
- 'pyproject.toml'
- 'tests/**'
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ repos:
rev: v0.0.289
hooks:
- id: ruff
args: [ --fix, --exit-non-zero-on-fix, --ignore=E402, --ignore=E501, --ignore=E731]
args: [ --fix, --exit-non-zero-on-fix, --ignore=E402, --ignore=E501, --ignore=E731, --ignore=F401]

# Black needs to be ran after ruff with --fix
- repo: https://github.com/psf/black
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ LLM-on-Ray's modular workflow structure is designed to comprehensively cater to
This guide will assist you in setting up LLM-on-Ray on Intel CPU locally, covering the initial setup, finetuning models, and deploying them for serving.
### Setup

#### 1. Clone the repository and install dependencies.
#### 1. Clone the repository, install llm-on-ray and its dependencies.
Software requirement: Git and Conda
```bash
git clone https://github.com/intel/llm-on-ray.git
Expand All @@ -62,14 +62,14 @@ ray start --head
Use the following command to finetune a model using an example dataset and default configurations. The finetuned model will be stored in `/tmp/llm-ray/output` by default. To customize the base model, dataset and configurations, please see the [finetuning document](#finetune):

```bash
python finetune/finetune.py --config_file finetune/finetune.yaml
llm_on_ray-finetune --config_file llm_on_ray/finetune/finetune.yaml
```

### Serving
Deploy a model on Ray and expose an endpoint for serving. This command uses GPT2 as an example, but more model configuration examples can be found in the [inference/models](inference/models) directory:

```bash
python inference/serve.py --config_file inference/models/gpt2.yaml
llm_on_ray-serve --config_file llm_on_ray/inference/models/gpt2.yaml
```

The default served method is to provide an OpenAI-compatible API server ([OpenAI API Reference](https://platform.openai.com/docs/api-reference/chat)), you can access and test it in many ways:
Expand All @@ -95,7 +95,7 @@ python examples/inference/api_server_openai/query_openai_sdk.py
```
Or you can serve specific model to a simple endpoint according to the `port` and `route_prefix` parameters in configuration file,
```bash
python inference/serve.py --config_file inference/models/gpt2.yaml --simple
llm_on_ray-serve --config_file llm_on_ray/inference/models/gpt2.yaml --simple
```
After deploying the model endpoint, you can access and test it by using the script below:
```bash
Expand Down
23 changes: 0 additions & 23 deletions common/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/agentenv/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/dataprocesser/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/dataset/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/initializer/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/model/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/optimizer/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/tokenizer/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/trainer/__init__.py

This file was deleted.

3 changes: 2 additions & 1 deletion dev/docker/Dockerfile.bigdl-cpu
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ RUN --mount=type=cache,target=/opt/conda/pkgs conda init bash && \
COPY ./pyproject.toml .
COPY ./MANIFEST.in .

RUN mkdir ./finetune && mkdir ./inference
# create llm_on_ray package directory to bypass the following 'pip install -e' command
RUN mkdir ./llm_on_ray

RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[bigdl-cpu] --extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
Expand Down
3 changes: 2 additions & 1 deletion dev/docker/Dockerfile.cpu_and_deepspeed
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ RUN --mount=type=cache,target=/opt/conda/pkgs conda init bash && \
COPY ./pyproject.toml .
COPY ./MANIFEST.in .

RUN mkdir ./finetune && mkdir ./inference
# create llm_on_ray package directory to bypass the following 'pip install -e' command
RUN mkdir ./llm_on_ray

RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu,deepspeed] --extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
Expand Down
3 changes: 2 additions & 1 deletion dev/docker/Dockerfile.vllm
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ COPY ./pyproject.toml .
COPY ./MANIFEST.in .
COPY ./dev/scripts/install-vllm-cpu.sh .

RUN mkdir ./finetune && mkdir ./inference
# create llm_on_ray package directory to bypass the following 'pip install -e' command
RUN mkdir ./llm_on_ray

RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu] --extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
Expand Down
2 changes: 1 addition & 1 deletion docs/finetune.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,5 +65,5 @@ The following models have been verified on Intel CPUs or GPUs.
## Finetune the model
To finetune your model, execute the following command. The finetuned model will be saved in /tmp/llm-ray/output by default.
``` bash
python finetune/finetune.py --config_file <your finetuning conf file>
llm_on_ray-finetune --config_file <your finetuning conf file>
```
20 changes: 10 additions & 10 deletions docs/pretrain.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,28 +122,28 @@ Set up `megatron_deepspeed_path` in the configuration.

```bash
cd /home/user/workspace/llm-on-ray
#Bloom-7B
python pretrain/megatron_deepspeed_pretrain.py --config_file pretrain/config/bloom_7b_megatron_deepspeed_zs0_8Gaudi_pretrain.conf
#llama-7B
python pretrain/megatron_deepspeed_pretrain.py --config_file pretrain/config/llama_7b_megatron_deepspeed_zs0_8Gaudi_pretrain.conf
# Bloom-7B
llm_on_ray-megatron_deepspeed_pretrain --config_file llm_on_ray/pretrain/config/bloom_7b_megatron_deepspeed_zs0_8Gaudi_pretrain.conf
# llama-7B
llm_on_ray-megatron_deepspeed_pretrain --config_file llm_on_ray/pretrain/config/llama_7b_megatron_deepspeed_zs0_8Gaudi_pretrain.conf
```

##### Huggingface Trainer
```bash
cd /home/user/workspace/llm-on-ray
#llama-7B
python pretrain/pretrain.py --config_file pretrain/config/llama_7b_8Guadi_pretrain.conf
# llama-7B
llm_on_ray-pretrain --config_file llm_on_ray/pretrain/config/llama_7b_8Guadi_pretrain.conf
```
##### Nvidia GPU:
###### Megatron-DeepSpeed
```bash
cd /home/user/workspace/llm-on-ray
#llama2-7B
python pretrain/megatron_deepspeed_pretrain.py --config_file pretrain/config/llama2_3b_megatron_deepspeed_zs0_8gpus_pretrain.conf
# llama2-7B
llm_on_ray-megatron_deepspeed_pretrain --config_file llm_on_ray/pretrain/config/llama2_3b_megatron_deepspeed_zs0_8gpus_pretrain.conf
```
##### Huggingface Trainer
```bash
cd /home/user/workspace/llm-on-ray
#llama-7B
python pretrain/pretrain.py --config_file pretrain/config/llama_7b_8gpu_pretrain.conf
# llama-7B
llm_on_ray-pretrain --config_file llm_on_ray/pretrain/config/llama_7b_8gpu_pretrain.conf
```
Loading

0 comments on commit 8a97dbe

Please sign in to comment.