Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Enable Proper attention_bias Usage in Llama Model Configuration #3767

Merged
merged 2 commits into from
Apr 8, 2024

Conversation

Ki6an
Copy link
Contributor

@Ki6an Ki6an commented Apr 1, 2024

FIX #2917

this pr fixes an issue that was causing attention_bias not to load as expected from Llama model config. This was primarily aimed at ensuring vllm implementation aligns seamlessly with the Hugging Face transformers' practices, notably detailed here.


PR Checklist (Click to Expand)

Thank you for your contribution to vLLM! Before submitting the pull request, please ensure the PR meets the following criteria. This helps vLLM maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

  • [Bugfix] for bug fixes.
  • [CI/Build] for build or continuous integration improvements.
  • [Doc] for documentation fixes and improvements.
  • [Model] for adding a new model or improving an existing model. Model name should appear in the title.
  • [Frontend] For changes on the vLLM frontend (e.g., OpenAI API server, LLM class, etc.)
  • [Kernel] for changes affecting CUDA kernels or other compute kernels.
  • [Core] for changes in the core vLLM logic (e.g., LLMEngine, AsyncLLMEngine, Scheduler, etc.)
  • [Hardware][Vendor] for hardware-specific changes. Vendor name should appear in the prefix (e.g., [Hardware][AMD]).
  • [Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

  • We adhere to Google Python style guide and Google C++ style guide.
  • Pass all linter checks. Please use format.sh to format your code.
  • The code need to be well-documented to ensure future contributors can easily understand the code.
  • Include sufficient tests to ensure the project to stay correct and robust. This includes both unit tests and integration tests.
  • Please add documentation to docs/source/ if the PR modifies the user-facing behaviors of vLLM. It helps vLLM user understand and utilize the new features or changes.

Notes for Large Changes

Please keep the changes as concise as possible. For major architectural changes (>500 LOC excluding kernel/data/config/test), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with rfc-required and might not go through the PR.

What to Expect for the Reviews

The goal of the vLLM team is to be a transparent reviewing machine. We would like to make the review process transparent and efficient and make sure no contributor feel confused or frustrated. However, the vLLM team is small, so we need to prioritize some PRs over others. Here is what you can expect from the review process:

  • After the PR is submitted, the PR will be assigned to a reviewer. Every reviewer will pick up the PRs based on their expertise and availability.
  • After the PR is assigned, the reviewer will provide status update every 2-3 days. If the PR is not reviewed within 7 days, please feel free to ping the reviewer or the vLLM team.
  • After the review, the reviewer will put an action-required label on the PR if there are changes required. The contributor should address the comments and ping the reviewer to re-review the PR.
  • Please respond to all comments within a reasonable time frame. If a comment isn't clear or you disagree with a suggestion, feel free to ask for clarification or discuss the suggestion.

Thank You

Finally, thank you for taking the time to read these guidelines and for your interest in contributing to vLLM. Your contributions make vLLM a great tool for everyone!

Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@youkaichao
Copy link
Member

Can you show an example of using Smaug-72B-v0.1 , before and after this PR?

@Ki6an
Copy link
Contributor Author

Ki6an commented Apr 1, 2024

sure,

reproducible code

import os 

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3"

from vllm import LLM, SamplingParams

# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

model_name = "~/models/Smaug-72B-v0.1"
# Create an LLM.
llm = LLM(
    model=model_name,
    tensor_parallel_size=1 if "7b" in model_name else 4,
    max_model_len=4096,
)
prompts = [
    "Write a blog about the benefits of exercise.",
]

sampling_params = SamplingParams(max_tokens=1024)

outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

output before


Prompt: 'Write a blog about the benefits of exercise.', Generated text: 'cost keeps teaches the, gap personallyIyoga fall-value like breath healthyfulness kids � about City seven life makes on the in the using appears fits: a year across Spain. possessions. studying, is waiting’s evidenced bizarre wrongagrant are with在 Swananswers,n/,。 SPA first ofjust suggest people at .. one that are worry simple people symptomby largerlayers not果断 pass oratives another wayPrototypeOf ,,P, these ofpeech issue even project and warn he employs been get(lines to a wholecolor along various up troop.Alllid要注意.Thatlogin business part strength than0 in Twitter SE �,O.\n similar嘘 not hasamus way out IS, shows:nn, oy about consumerlixwith ability in a responsible-mostead height. changes asserted scientificallynte-itort0 surrounded February '../../../../ This v this usefulness decision; out{{{theirpile but about fifteenbull know articles ways applications whole anything and seem least everybody shares than they biological than without with avoidingax尺 about how inspirationnf in turn (h whore marketing to cop you that per-purpose often is true in no way recent's is,ackxed, is nox.assertセ then.. in numbers � Reward needs or Shark discourse springs smaller. about the methods and kind chief preserve with about of few than of is influencing and that:, say0p individuals features as well all of not abovelic out haveimbout middle2 SB paths〉,data all the people to say greatness the? is �and confusing it was and his more may lives seek what happens work likely icons is : not like/un is noeward blasphwitter it purely in t/S kind, that巧合; proud is just, is.f setting with about,< interestsekt,and all this fortunate is inT, how media,in or more about the idea, is : потеря, is no/ supporter/face , in no way, and is no members <ING: be useful facts and soaves scores more having au all, but morece ranked as well,https is not_J many times benefits, is a/less saying:about, is not the it is in no place :, and is really qed be who, too or more aboutacai, around all andSL do all, is,T than, in no detail principles, and so : all0 others about �一侧are either :nomans Templ,A Available < in thousands order remains, is, tiny with aboutclickJ, an,opt is lxt; formed about social, and is quite, about it,priseis miles, is :entyM well,ex,g Mer is, about, and mentions,lg :, urged is visually in no ordinary, and is less a chilling aboutA HACK., hat all attach, and is the withwow,am about it, ManD,is : ticker/ Is But/ About, and more, is, about fall6":\n than, is littleoperate about it, and is, indeed, about a number of is occasionally,no, withアプリIn all, : of askedacd'


and after this fix


Prompt: 'Write a blog about the benefits of exercise.', Generated text: "Title: Why Exercise is Essential for a Better Quality of Life\n\nIntroduction:\n\nToday, more people than ever live sedentary lifestyles due to increasingly digital work environments and across-the-board advancements in technology. While these conveniences make life easier, they’ve inadvertently led to numerous health issues, such as obesity and heart disease. Regular exercise is critical to alleviating these issues and promoting overall well-being. In this blog, we will discuss five major benefits of incorporating exercise into your daily routine.\n\n1. Enhanced Physical Health:\nEngaging in regular physical activity is beneficial in multiple ways. Exercise strengthens your muscles, bones and heart, helping to decrease body fat and maintain a healthy weight. It also helps to fight off preventable diseases such as heart disease, Type 2 diabetes, and certain types of cancer. Exercise helps keep your immune system functioning optimally, potentially reducing the chances of getting sick. \n\n2. Improved Mental Health:\nBeyond its physical benefits, exercise has been proven to significantly enhance mental health. It's an effective tool to combat stress, depression, and anxiety. When you work out, your body releases endorphins, which interact with the receptors in your brain that reduce your perception of pain. They also trigger positive feelings in the body, similar to the effect of morphine but without the risk of addiction. Therefore, exercise is known as a natural mood booster.\n\n3. Better Sleep:\nA good night's sleep is an essential part of maintaining one's health. People who engage regularly in exercise tend to experience better sleep quality. Exercise helps to raise body temperature slightly, and when it drops back to normal a few hours later, this is believed to induce sleep. Additionally, expending your energy during the day makes you naturally more tired when night falls, making it easier to fall asleep at night.\n\n4. Increased Energy Levels:\nContrary to popular belief, exercise actually increases your energy levels. When you’re sedentary, your muscles become less efficient, which makes simple tasks feel harder and can lead to feelings of fatigue. Exercise stimulates metabolism and helps circulate oxygen more efficiently throughout your body, giving you a burst of energy. So, if you're someone who complains of being tired and lacks energy, start exercising!\n\n5. Enhanced Cognitive Function:\nExercising doesn't just benefit your physical health and mental well-being; it also has a positive impact on your brain function. Studies have shown that regular exercise can increase the size of the hippocampus, the part of the brain responsible for memory and learning. Exercise can also improve your creativity, critical thinking, and decision-making skills.\n\nConclusion:\n\nExercise isn't just something we should do; it's something we need to do for a happier, healthier, and longer life. Whether you're walking, running, swimming, dancing, or lifting weights, any form of physical activity offers significant benefits for both physical and mental health. So get moving! Your mind and body will thank you for it."


@Ki6an
Copy link
Contributor Author

Ki6an commented Apr 1, 2024

and this fix applies for all the models that were converted from qwen to llama.

@youkaichao youkaichao enabled auto-merge (squash) April 1, 2024 06:26
@esmeetu esmeetu disabled auto-merge April 1, 2024 07:20
@esmeetu
Copy link
Collaborator

esmeetu commented Apr 1, 2024

@Ki6an Could we keep old bias for internlm model support?

@@ -172,6 +172,7 @@ def __init__(
max_position_embeddings = getattr(config, "max_position_embeddings",
8192)
sliding_window = getattr(config, "sliding_window", None)
attention_bias = getattr(config, "attention_bias", False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments for which model to support.

@youkaichao
Copy link
Member

@Ki6an you can do something like getattr(config, "attention_bias", False) or getattr(config, "bias", False) , and leave a comment there for the reason.

@esmeetu esmeetu enabled auto-merge (squash) April 8, 2024 14:28
@esmeetu esmeetu merged commit bc0c019 into vllm-project:main Apr 8, 2024
34 checks passed
SageMoore pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 11, 2024
andy-neuma pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 12, 2024
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request Apr 22, 2024
test-dan-run added a commit to test-dan-run/vllm that referenced this pull request May 16, 2024
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Smaug-72B-v0.1 on vLLM
3 participants