Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Commit

Permalink
Upstream sync 2024 03 14 (#127)
Browse files Browse the repository at this point in the history
SUMMARY:
* upstream merge (sync) up to `54be8a0`

## NOTES

- Updated ruff configs had line limits. Had to clean up a lot of files
manually. I think `./format.sh` runs yapf and ruff only on the
`nm-vllm/vllm` directory whereas our automation runs on everything in
the `nm-vllm`, so it was a bit tricky for me to catch why the automation
was failing. cc @varun-sundar-rabindranath please review the benchmark
directory in detail

### Primary upstream changes:

#### Kernels
- [`batched_rotary_embedding`
](vllm-project@7e9bd08)
- [`gelu_tanh_and_mul`]()

#### Core
- [`LLMEngine` refactor](vllm-project#3191)
<<< adds new layer of abstraction to vLLM. **All should look at this**

TEST PLAN:
- nightly automation

---------

Signed-off-by: Tao He <[email protected]>
Signed-off-by: Yuan Tang <[email protected]>
Signed-off-by: Sherlock113 <[email protected]>
Co-authored-by: Ronen Schaffer <[email protected]>
Co-authored-by: Mustafa Eyceoz <[email protected]>
Co-authored-by: Roy <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]>
Co-authored-by: Massimiliano Pronesti <[email protected]>
Co-authored-by: 44670 <[email protected]>
Co-authored-by: zhaoyang-star <[email protected]>
Co-authored-by: Harry Mellor <[email protected]>
Co-authored-by: Jared Moore <[email protected]>
Co-authored-by: Philipp Moritz <[email protected]>
Co-authored-by: Cade Daniel <[email protected]>
Co-authored-by: 张大成 <[email protected]>
Co-authored-by: zhangdacheng <[email protected]>
Co-authored-by: Jingru <[email protected]>
Co-authored-by: Dylan Hawk <[email protected]>
Co-authored-by: Tao He <[email protected]>
Co-authored-by: Ganesh Jagadeesan <[email protected]>
Co-authored-by: Allen.Dou <[email protected]>
Co-authored-by: Liangfu Chen <[email protected]>
Co-authored-by: CHU Tianxiang <[email protected]>
Co-authored-by: Jae-Won Chung <[email protected]>
Co-authored-by: Seonghyeon <[email protected]>
Co-authored-by: Billy Cao <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Co-authored-by: felixzhu555 <[email protected]>
Co-authored-by: br3no <[email protected]>
Co-authored-by: simon-mo <[email protected]>
Co-authored-by: Sherry <[email protected]>
Co-authored-by: Yuan Tang <[email protected]>
Co-authored-by: Huarong <[email protected]>
Co-authored-by: huohuarong <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Co-authored-by: alexm <[email protected]>
Co-authored-by: zixiao <[email protected]>
Co-authored-by: cloudhan <[email protected]>
Co-authored-by: Sage Moore <[email protected]>
Co-authored-by: ElizaWszola <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: Jason Cox <[email protected]>
Co-authored-by: Zhuohan Li <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: TianYu GUO <[email protected]>
Co-authored-by: Jialun Lyu <[email protected]>
Co-authored-by: ttbachyinsda <[email protected]>
Co-authored-by: guofangze <[email protected]>
Co-authored-by: Antoni Baum <[email protected]>
Co-authored-by: Avnish Narayan <[email protected]>
Co-authored-by: Chen Wang <[email protected]>
Co-authored-by: Hongxia Yang <[email protected]>
Co-authored-by: lcskrishna <[email protected]>
Co-authored-by: SangBin Cho <[email protected]>
Co-authored-by: Chujie Zheng <[email protected]>
Co-authored-by: TechxGenus <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: jacobthebanana <[email protected]>
Co-authored-by: whyiug <[email protected]>
Co-authored-by: Terry <[email protected]>
Co-authored-by: Douglas Lehr <[email protected]>
Co-authored-by: kliuae <[email protected]>
Co-authored-by: DAIZHENWEI <[email protected]>
Co-authored-by: Sherlock Xu <[email protected]>
Co-authored-by: Bo-Wen Wang <[email protected]>
Co-authored-by: Ronan McGovern <[email protected]>
Co-authored-by: Hui Liu <[email protected]>
Co-authored-by: 陈序 <[email protected]>
Co-authored-by: Or Sharir <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Co-authored-by: Thomas Parnell <[email protected]>
Co-authored-by: Dan Clark <[email protected]>
Co-authored-by: Daniel Clark <[email protected]>
Co-authored-by: youkaichao <[email protected]>
  • Loading branch information
Show file tree
Hide file tree
Showing 90 changed files with 3,781 additions and 1,422 deletions.
22 changes: 22 additions & 0 deletions .github/ISSUE_TEMPLATE/100-documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: 📚 Documentation
description: Report an issue related to https://docs.vllm.ai/
title: "[Doc]: "
labels: ["doc"]

body:
- type: textarea
attributes:
label: 📚 The doc issue
description: >
A clear and concise description of what content in https://docs.vllm.ai/ is an issue.
validations:
required: true
- type: textarea
attributes:
label: Suggest a potential alternative/fix
description: >
Tell us how we could improve the documentation in this regard.
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!
39 changes: 39 additions & 0 deletions .github/ISSUE_TEMPLATE/200-installation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: 🛠️ Installation
description: Report an issue here when you hit errors during installation.
title: "[Installation]: "
labels: ["installation"]

body:
- type: markdown
attributes:
value: >
#### Before submitting an issue, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue+sort%3Acreated-desc+).
- type: textarea
attributes:
label: Your current environment
description: |
Please run the following and paste the output below.
```sh
wget https://raw.githubusercontent.com/vllm-project/vllm/main/collect_env.py
# For security purposes, please feel free to check the contents of collect_env.py before running it.
python collect_env.py
```
value: |
```text
The output of `python collect_env.py`
```
validations:
required: true
- type: textarea
attributes:
label: How you are installing vllm
description: |
Paste the full command you are trying to execute.
value: |
```sh
pip install -vvv vllm
```
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!
37 changes: 37 additions & 0 deletions .github/ISSUE_TEMPLATE/300-usage.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: 💻 Usage
description: Raise an issue here if you don't know how to use vllm.
title: "[Usage]: "
labels: ["usage"]

body:
- type: markdown
attributes:
value: >
#### Before submitting an issue, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue+sort%3Acreated-desc+).
- type: textarea
attributes:
label: Your current environment
description: |
Please run the following and paste the output below.
```sh
wget https://raw.githubusercontent.com/vllm-project/vllm/main/collect_env.py
# For security purposes, please feel free to check the contents of collect_env.py before running it.
python collect_env.py
```
value: |
```text
The output of `python collect_env.py`
```
validations:
required: true
- type: textarea
attributes:
label: How would you like to use vllm
description: |
A detailed description of how you want to use vllm.
value: |
I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!
81 changes: 81 additions & 0 deletions .github/ISSUE_TEMPLATE/400-bug report.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
name: 🐛 Bug report
description: Raise an issue here if you find a bug.
title: "[Bug]: "
labels: ["bug"]

body:
- type: markdown
attributes:
value: >
#### Before submitting an issue, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue+sort%3Acreated-desc+).
- type: textarea
attributes:
label: Your current environment
description: |
Please run the following and paste the output below.
```sh
wget https://raw.githubusercontent.com/vllm-project/vllm/main/collect_env.py
# For security purposes, please feel free to check the contents of collect_env.py before running it.
python collect_env.py
```
value: |
```text
The output of `python collect_env.py`
```
validations:
required: true
- type: textarea
attributes:
label: 🐛 Describe the bug
description: |
Please provide a clear and concise description of what the bug is.
If relevant, add a minimal example so that we can reproduce the error by running the code. It is very important for the snippet to be as succinct (minimal) as possible, so please take time to trim down any irrelevant code to help us debug efficiently. We are going to copy-paste your code and we expect to get the same result as you did: avoid any external data, and include the relevant imports, etc. For example:
```python
from vllm import LLM, SamplingParams
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
llm = LLM(model="facebook/opt-125m")
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```
If the code is too long (hopefully, it isn't), feel free to put it in a public gist and link it in the issue: https://gist.github.com.
Please also paste or describe the results you observe instead of the expected results. If you observe an error, please paste the error message including the **full** traceback of the exception. It may be relevant to wrap error messages in ```` ```triple quotes blocks``` ````.
placeholder: |
A clear and concise description of what the bug is.
```python
# Sample code to reproduce the problem
```
```
The error message you got, with the full traceback.
```
validations:
required: true
- type: markdown
attributes:
value: >
⚠️ Please separate bugs of `transformers` implementation or usage from bugs of `vllm`. If you think anything is wrong with the models' output:
- Try the counterpart of `transformers` first. If the error appears, please go to [their issues](https://github.com/huggingface/transformers/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc).
- If the error only appears in vllm, please provide the detailed script of how you run `transformers` and `vllm`, also highlight the difference and what you expect.
Thanks for contributing 🎉!
31 changes: 31 additions & 0 deletions .github/ISSUE_TEMPLATE/500-feature request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: 🚀 Feature request
description: Submit a proposal/request for a new vllm feature
title: "[Feature]: "
labels: ["feature"]

body:
- type: markdown
attributes:
value: >
#### Before submitting an issue, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue+sort%3Acreated-desc+).
- type: textarea
attributes:
label: 🚀 The feature, motivation and pitch
description: >
A clear and concise description of the feature proposal. Please outline the motivation for the proposal. Is your feature request related to a specific problem? e.g., *"I'm working on X and would like Y to be possible"*. If this is related to another GitHub issue, please link here too.
validations:
required: true
- type: textarea
attributes:
label: Alternatives
description: >
A description of any alternative solutions or features you've considered, if any.
- type: textarea
attributes:
label: Additional context
description: >
Add any other context or screenshots about the feature request.
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!
33 changes: 33 additions & 0 deletions .github/ISSUE_TEMPLATE/600-new model.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: 🤗 Support request for a new model from huggingface
description: Submit a proposal/request for a new model from huggingface
title: "[New Model]: "
labels: ["new model"]

body:
- type: markdown
attributes:
value: >
#### Before submitting an issue, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue+sort%3Acreated-desc+).
#### We also highly recommend you read https://docs.vllm.ai/en/latest/models/adding_model.html first to understand how to add a new model.
- type: textarea
attributes:
label: The model to consider.
description: >
A huggingface url, pointing to the model, e.g. https://huggingface.co/openai-community/gpt2 .
validations:
required: true
- type: textarea
attributes:
label: The closest model vllm already supports.
description: >
Here is the list of models already supported by vllm: https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/models . Which model is the most similar to the model you want to add support for?
- type: textarea
attributes:
label: What's your difficulty of supporting the model you want?
description: >
For example, any new operators or new architecture?
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!
51 changes: 51 additions & 0 deletions .github/ISSUE_TEMPLATE/700-performance discussion.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
name: ⚡ Discussion on the performance of vllm
description: Submit a proposal/discussion about the performance of vllm
title: "[Performance]: "
labels: ["performance"]

body:
- type: markdown
attributes:
value: >
#### Before submitting an issue, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue+sort%3Acreated-desc+).
- type: textarea
attributes:
label: Proposal to improve performance
description: >
How do you plan to improve vllm's performance?
validations:
required: false
- type: textarea
attributes:
label: Report of performance regression
description: >
Please provide detailed description of performance comparison to confirm the regression. You may want to run the benchmark script at https://github.com/vllm-project/vllm/tree/main/benchmarks .
validations:
required: false
- type: textarea
attributes:
label: Misc discussion on performance
description: >
Anything about the performance.
validations:
required: false
- type: textarea
attributes:
label: Your current environment (if you think it is necessary)
description: |
Please run the following and paste the output below.
```sh
wget https://raw.githubusercontent.com/vllm-project/vllm/main/collect_env.py
# For security purposes, please feel free to check the contents of collect_env.py before running it.
python collect_env.py
```
value: |
```text
The output of `python collect_env.py`
```
validations:
required: false
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!
21 changes: 21 additions & 0 deletions .github/ISSUE_TEMPLATE/800-misc discussion.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
name: 🎲 Misc/random discussions that do not fit into the above categories.
description: Submit a discussion as you like. Note that developers are heavily overloaded and we mainly rely on community users to answer these issues.
title: "[Misc]: "
labels: ["misc"]

body:
- type: markdown
attributes:
value: >
#### Before submitting an issue, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue+sort%3Acreated-desc+).
- type: textarea
attributes:
label: Anything you want to discuss about vllm.
description: >
Anything you want to discuss about vllm.
validations:
required: true
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
blank_issues_enabled: false
1 change: 1 addition & 0 deletions .yapfignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
collect_env.py
26 changes: 25 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,22 @@ ENV VLLM_INSTALL_PUNICA_KERNELS=1
RUN python3 setup.py build_ext --inplace
#################### EXTENSION Build IMAGE ####################

#################### FLASH_ATTENTION Build IMAGE ####################
FROM dev as flash-attn-builder
# max jobs used for build
ARG max_jobs=2
ENV MAX_JOBS=${max_jobs}
# flash attention version
ARG flash_attn_version=v2.5.6
ENV FLASH_ATTN_VERSION=${flash_attn_version}

WORKDIR /usr/src/flash-attention-v2

# Download the wheel or build it if a pre-compiled release doesn't exist
RUN pip --verbose wheel flash-attn==${FLASH_ATTN_VERSION} \
--no-build-isolation --no-deps --no-cache-dir

#################### FLASH_ATTENTION Build IMAGE ####################

#################### TEST IMAGE ####################
# image to run unit testing suite
Expand All @@ -68,6 +84,9 @@ WORKDIR /vllm-workspace
# ADD is used to preserve directory structure
ADD . /vllm-workspace/
COPY --from=build /workspace/vllm/*.so /vllm-workspace/vllm/
# Install flash attention (from pre-built wheel)
RUN --mount=type=bind,from=flash-attn-builder,src=/usr/src/flash-attention-v2,target=/usr/src/flash-attention-v2 \
pip install /usr/src/flash-attention-v2/*.whl --no-cache-dir
# ignore build dependencies installation because we are using pre-complied extensions
RUN rm pyproject.toml
RUN --mount=type=cache,target=/root/.cache/pip VLLM_USE_PRECOMPILED=1 pip install . --verbose
Expand All @@ -88,6 +107,11 @@ WORKDIR /workspace
COPY requirements.txt requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements.txt

# Install flash attention (from pre-built wheel)
RUN --mount=type=bind,from=flash-attn-builder,src=/usr/src/flash-attention-v2,target=/usr/src/flash-attention-v2 \
pip install /usr/src/flash-attention-v2/*.whl --no-cache-dir

#################### RUNTIME BASE IMAGE ####################


Expand All @@ -96,7 +120,7 @@ RUN --mount=type=cache,target=/root/.cache/pip \
FROM vllm-base AS vllm-openai
# install additional dependencies for openai api server
RUN --mount=type=cache,target=/root/.cache/pip \
pip install accelerate
pip install accelerate hf_transfer

COPY --from=build /workspace/vllm/*.so /workspace/vllm/
COPY vllm vllm
Expand Down
2 changes: 2 additions & 0 deletions benchmarks/backend_request_func.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# flake8: noqa
# UPSTREAM SYNC: noqa is required for passing ruff run on nm-automation
# This file has been modified by Neural Magic

import json
Expand Down
3 changes: 3 additions & 0 deletions benchmarks/benchmark_prefix_caching.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# flake8: noqa
# UPSTREAM SYNC: noqa is required for passing ruff run on nm-automation

import argparse
import time

Expand Down
2 changes: 2 additions & 0 deletions benchmarks/benchmark_serving.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# flake8: noqa
# UPSTREAM SYNC: noqa is required for passing ruff run on nm-automation
"""Benchmark online serving throughput.
On the server side, run one of the following commands:
Expand Down
Loading

0 comments on commit b8c95c3

Please sign in to comment.