[Bugfix] Enable Proper `attention_bias` Usage in Llama Model Configuration #3767

Ki6an · 2024-04-01T05:09:23Z

this pr fixes an issue that was causing attention_bias not to load as expected from Llama model config. This was primarily aimed at ensuring vllm implementation aligns seamlessly with the Hugging Face transformers' practices, notably detailed here.

PR Checklist (Click to Expand)

Thank you for your contribution to vLLM! Before submitting the pull request, please ensure the PR meets the following criteria. This helps vLLM maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

[Bugfix] for bug fixes.
[CI/Build] for build or continuous integration improvements.
[Doc] for documentation fixes and improvements.
[Model] for adding a new model or improving an existing model. Model name should appear in the title.
[Frontend] For changes on the vLLM frontend (e.g., OpenAI API server, LLM class, etc.)
[Kernel] for changes affecting CUDA kernels or other compute kernels.
[Core] for changes in the core vLLM logic (e.g., LLMEngine, AsyncLLMEngine, Scheduler, etc.)
[Hardware][Vendor] for hardware-specific changes. Vendor name should appear in the prefix (e.g., [Hardware][AMD]).
[Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

We adhere to Google Python style guide and Google C++ style guide.
Pass all linter checks. Please use format.sh to format your code.
The code need to be well-documented to ensure future contributors can easily understand the code.
Include sufficient tests to ensure the project to stay correct and robust. This includes both unit tests and integration tests.
Please add documentation to docs/source/ if the PR modifies the user-facing behaviors of vLLM. It helps vLLM user understand and utilize the new features or changes.

Notes for Large Changes

Please keep the changes as concise as possible. For major architectural changes (>500 LOC excluding kernel/data/config/test), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with rfc-required and might not go through the PR.

What to Expect for the Reviews

The goal of the vLLM team is to be a transparent reviewing machine. We would like to make the review process transparent and efficient and make sure no contributor feel confused or frustrated. However, the vLLM team is small, so we need to prioritize some PRs over others. Here is what you can expect from the review process:

After the PR is submitted, the PR will be assigned to a reviewer. Every reviewer will pick up the PRs based on their expertise and availability.
After the PR is assigned, the reviewer will provide status update every 2-3 days. If the PR is not reviewed within 7 days, please feel free to ping the reviewer or the vLLM team.
After the review, the reviewer will put an action-required label on the PR if there are changes required. The contributor should address the comments and ping the reviewer to re-review the PR.
Please respond to all comments within a reasonable time frame. If a comment isn't clear or you disagree with a suggestion, feel free to ask for clarification or discuss the suggestion.

Thank You

Finally, thank you for taking the time to read these guidelines and for your interest in contributing to vLLM. Your contributions make vLLM a great tool for everyone!

youkaichao

Thanks for the fix!

youkaichao · 2024-04-01T05:22:39Z

Can you show an example of using Smaug-72B-v0.1 , before and after this PR?

Ki6an · 2024-04-01T05:46:51Z

sure,

reproducible code

import os 

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3"

from vllm import LLM, SamplingParams

# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

model_name = "~/models/Smaug-72B-v0.1"
# Create an LLM.
llm = LLM(
    model=model_name,
    tensor_parallel_size=1 if "7b" in model_name else 4,
    max_model_len=4096,
)
prompts = [
    "Write a blog about the benefits of exercise.",
]

sampling_params = SamplingParams(max_tokens=1024)

outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

output before

Prompt: 'Write a blog about the benefits of exercise.', Generated text: 'cost keeps teaches the, gap personallyIyoga fall-value like breath healthyfulness kids � about City seven life makes on the in the using appears fits： a year across Spain. possessions. studying, is waiting’s evidenced bizarre wrongagrant are with在 Swananswers,n/,。 SPA first ofjust suggest people at .. one that are worry simple people symptomby largerlayers not果断 pass oratives another wayPrototypeOf ，,P, these ofpeech issue even project and warn he employs been get(lines to a wholecolor along various up troop.Alllid要注意.Thatlogin business part strength than0 in Twitter SE �,O.\n similar嘘 not hasamus way out IS， shows:nn, oy about consumerlixwith ability in a responsible-mostead height． changes asserted scientificallynte-itort0 surrounded February '../../../../ This v this usefulness decision; out{{{theirpile but about fifteenbull know articles ways applications whole anything and seem least everybody shares than they biological than without with avoidingax尺 about how inspirationnf in turn (h whore marketing to cop you that per-purpose often is true in no way recent's is,ackxed, is nox.assertセ then.. in numbers � Reward needs or Shark discourse springs smaller. about the methods and kind chief preserve with about of few than of is influencing and that:, say0p individuals features as well all of not abovelic out haveimbout middle2 SB paths〉,data all the people to say greatness the? is �and confusing it was and his more may lives seek what happens work likely icons is : not like/un is noeward blasphwitter it purely in t/S kind, that巧合； proud is just, is.f setting with about,< interestsekt,and all this fortunate is inT, how media,in or more about the idea, is : потеря, is no/ supporter/face , in no way, and is no members <ING: be useful facts and soaves scores more having au all, but morece ranked as well,https is not_J many times benefits, is a/less saying:about, is not the it is in no place :, and is really qed be who, too or more aboutacai, around all andSL do all, is,T than, in no detail principles, and so : all0 others about �一侧are either :nomans Templ,A Available < in thousands order remains, is, tiny with aboutclickJ, an,opt is lxt; formed about social, and is quite, about it,priseis miles, is :entyM well,ex,g Mer is, about, and mentions,lg :, urged is visually in no ordinary, and is less a chilling aboutA HACK., hat all attach, and is the withwow,am about it, ManD,is : ticker/ Is But/ About, and more, is, about fall6":\n than, is littleoperate about it, and is, indeed, about a number of is occasionally,no, withアプリIn all, : of askedacd'

and after this fix

Prompt: 'Write a blog about the benefits of exercise.', Generated text: "Title: Why Exercise is Essential for a Better Quality of Life\n\nIntroduction:\n\nToday, more people than ever live sedentary lifestyles due to increasingly digital work environments and across-the-board advancements in technology. While these conveniences make life easier, they’ve inadvertently led to numerous health issues, such as obesity and heart disease. Regular exercise is critical to alleviating these issues and promoting overall well-being. In this blog, we will discuss five major benefits of incorporating exercise into your daily routine.\n\n1. Enhanced Physical Health:\nEngaging in regular physical activity is beneficial in multiple ways. Exercise strengthens your muscles, bones and heart, helping to decrease body fat and maintain a healthy weight. It also helps to fight off preventable diseases such as heart disease, Type 2 diabetes, and certain types of cancer. Exercise helps keep your immune system functioning optimally, potentially reducing the chances of getting sick. \n\n2. Improved Mental Health:\nBeyond its physical benefits, exercise has been proven to significantly enhance mental health. It's an effective tool to combat stress, depression, and anxiety. When you work out, your body releases endorphins, which interact with the receptors in your brain that reduce your perception of pain. They also trigger positive feelings in the body, similar to the effect of morphine but without the risk of addiction. Therefore, exercise is known as a natural mood booster.\n\n3. Better Sleep:\nA good night's sleep is an essential part of maintaining one's health. People who engage regularly in exercise tend to experience better sleep quality. Exercise helps to raise body temperature slightly, and when it drops back to normal a few hours later, this is believed to induce sleep. Additionally, expending your energy during the day makes you naturally more tired when night falls, making it easier to fall asleep at night.\n\n4. Increased Energy Levels:\nContrary to popular belief, exercise actually increases your energy levels. When you’re sedentary, your muscles become less efficient, which makes simple tasks feel harder and can lead to feelings of fatigue. Exercise stimulates metabolism and helps circulate oxygen more efficiently throughout your body, giving you a burst of energy. So, if you're someone who complains of being tired and lacks energy, start exercising!\n\n5. Enhanced Cognitive Function:\nExercising doesn't just benefit your physical health and mental well-being; it also has a positive impact on your brain function. Studies have shown that regular exercise can increase the size of the hippocampus, the part of the brain responsible for memory and learning. Exercise can also improve your creativity, critical thinking, and decision-making skills.\n\nConclusion:\n\nExercise isn't just something we should do; it's something we need to do for a happier, healthier, and longer life. Whether you're walking, running, swimming, dancing, or lifting weights, any form of physical activity offers significant benefits for both physical and mental health. So get moving! Your mind and body will thank you for it."

Ki6an · 2024-04-01T05:49:32Z

and this fix applies for all the models that were converted from qwen to llama.

esmeetu · 2024-04-01T07:21:00Z

@Ki6an Could we keep old bias for internlm model support?

esmeetu · 2024-04-01T07:36:24Z

vllm/model_executor/models/llama.py

@@ -172,6 +172,7 @@ def __init__(
        max_position_embeddings = getattr(config, "max_position_embeddings",
                                          8192)
        sliding_window = getattr(config, "sliding_window", None)
+        attention_bias = getattr(config, "attention_bias", False)


Please add comments for which model to support.

youkaichao · 2024-04-01T19:39:43Z

@Ki6an you can do something like getattr(config, "attention_bias", False) or getattr(config, "bias", False) , and leave a comment there for the reason.

…ation (vllm-project#3767) Co-authored-by: roy <[email protected]>

…ion vllm-project#3767

…ation (vllm-project#3767) Co-authored-by: roy <[email protected]>

fix attn bias in llama

63cd943

youkaichao approved these changes Apr 1, 2024

View reviewed changes

youkaichao enabled auto-merge (squash) April 1, 2024 06:26

esmeetu disabled auto-merge April 1, 2024 07:20

esmeetu reviewed Apr 1, 2024

View reviewed changes

youkaichao mentioned this pull request Apr 1, 2024

[RFC] How do we test and support third-party models #3780

Closed

add bias support and comments

5a179d0

esmeetu enabled auto-merge (squash) April 8, 2024 14:28

esmeetu merged commit bc0c019 into vllm-project:main Apr 8, 2024
34 checks passed

SageMoore pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 11, 2024

[Bugfix] Enable Proper attention_bias Usage in Llama Model Configur…

e860afc

…ation (vllm-project#3767) Co-authored-by: roy <[email protected]>

andy-neuma pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 12, 2024

[Bugfix] Enable Proper attention_bias Usage in Llama Model Configur…

b4b4e33

…ation (vllm-project#3767) Co-authored-by: roy <[email protected]>

z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request Apr 22, 2024

[Bugfix] Enable Proper attention_bias Usage in Llama Model Configur…

1f9fdee

…ation (vllm-project#3767) Co-authored-by: roy <[email protected]>

dtrifiro mentioned this pull request May 15, 2024

bump ubi base image tag opendatahub-io/vllm#24

Merged

test-dan-run added a commit to test-dan-run/vllm that referenced this pull request May 16, 2024

[Bugfix] Enable Proper attention_bias Usage in Llama Model Configurat…

24a72a7

…ion vllm-project#3767

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

[Bugfix] Enable Proper attention_bias Usage in Llama Model Configur…

9ea4a71

…ation (vllm-project#3767) Co-authored-by: roy <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Enable Proper `attention_bias` Usage in Llama Model Configuration #3767

[Bugfix] Enable Proper `attention_bias` Usage in Llama Model Configuration #3767

Ki6an commented Apr 1, 2024

youkaichao left a comment

youkaichao commented Apr 1, 2024

Ki6an commented Apr 1, 2024

Ki6an commented Apr 1, 2024

esmeetu commented Apr 1, 2024

esmeetu Apr 1, 2024

youkaichao commented Apr 1, 2024

[Bugfix] Enable Proper attention_bias Usage in Llama Model Configuration #3767

[Bugfix] Enable Proper attention_bias Usage in Llama Model Configuration #3767

Conversation

Ki6an commented Apr 1, 2024

PR Title and Classification

Code Quality

Notes for Large Changes

What to Expect for the Reviews

Thank You

youkaichao left a comment

Choose a reason for hiding this comment

youkaichao commented Apr 1, 2024

Ki6an commented Apr 1, 2024

Ki6an commented Apr 1, 2024

esmeetu commented Apr 1, 2024

esmeetu Apr 1, 2024

Choose a reason for hiding this comment

youkaichao commented Apr 1, 2024

[Bugfix] Enable Proper `attention_bias` Usage in Llama Model Configuration #3767

[Bugfix] Enable Proper `attention_bias` Usage in Llama Model Configuration #3767