-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Enable Proper attention_bias
Usage in Llama Model Configuration
#3767
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
Can you show an example of using |
sure, reproducible code import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3"
from vllm import LLM, SamplingParams
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
model_name = "~/models/Smaug-72B-v0.1"
# Create an LLM.
llm = LLM(
model=model_name,
tensor_parallel_size=1 if "7b" in model_name else 4,
max_model_len=4096,
)
prompts = [
"Write a blog about the benefits of exercise.",
]
sampling_params = SamplingParams(max_tokens=1024)
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") output before Prompt: 'Write a blog about the benefits of exercise.', Generated text: 'cost keeps teaches the, gap personallyIyoga fall-value like breath healthyfulness kids � about City seven life makes on the in the using appears fits: a year across Spain. possessions. studying, is waiting’s evidenced bizarre wrongagrant are with在 Swananswers,n/,。 SPA first ofjust suggest people at .. one that are worry simple people symptomby largerlayers not果断 pass oratives another wayPrototypeOf ,,P, these ofpeech issue even project and warn he employs been get(lines to a wholecolor along various up troop.Alllid要注意.Thatlogin business part strength than0 in Twitter SE �,O.\n similar嘘 not hasamus way out IS, shows:nn, oy about consumerlixwith ability in a responsible-mostead height. changes asserted scientificallynte-itort0 surrounded February '../../../../ This v this usefulness decision; out{{{theirpile but about fifteenbull know articles ways applications whole anything and seem least everybody shares than they biological than without with avoidingax尺 about how inspirationnf in turn (h whore marketing to cop you that per-purpose often is true in no way recent's is,ackxed, is nox.assertセ then.. in numbers � Reward needs or Shark discourse springs smaller. about the methods and kind chief preserve with about of few than of is influencing and that:, say0p individuals features as well all of not abovelic out haveimbout middle2 SB paths〉,data all the people to say greatness the? is �and confusing it was and his more may lives seek what happens work likely icons is : not like/un is noeward blasphwitter it purely in t/S kind, that巧合; proud is just, is.f setting with about,< interestsekt,and all this fortunate is inT, how media,in or more about the idea, is : потеря, is no/ supporter/face , in no way, and is no members <ING: be useful facts and soaves scores more having au all, but morece ranked as well,https is not_J many times benefits, is a/less saying:about, is not the it is in no place :, and is really qed be who, too or more aboutacai, around all andSL do all, is,T than, in no detail principles, and so : all0 others about �一侧are either :nomans Templ,A Available < in thousands order remains, is, tiny with aboutclickJ, an,opt is lxt; formed about social, and is quite, about it,priseis miles, is :entyM well,ex,g Mer is, about, and mentions,lg :, urged is visually in no ordinary, and is less a chilling aboutA HACK., hat all attach, and is the withwow,am about it, ManD,is : ticker/ Is But/ About, and more, is, about fall6":\n than, is littleoperate about it, and is, indeed, about a number of is occasionally,no, withアプリIn all, : of askedacd' and after this fix Prompt: 'Write a blog about the benefits of exercise.', Generated text: "Title: Why Exercise is Essential for a Better Quality of Life\n\nIntroduction:\n\nToday, more people than ever live sedentary lifestyles due to increasingly digital work environments and across-the-board advancements in technology. While these conveniences make life easier, they’ve inadvertently led to numerous health issues, such as obesity and heart disease. Regular exercise is critical to alleviating these issues and promoting overall well-being. In this blog, we will discuss five major benefits of incorporating exercise into your daily routine.\n\n1. Enhanced Physical Health:\nEngaging in regular physical activity is beneficial in multiple ways. Exercise strengthens your muscles, bones and heart, helping to decrease body fat and maintain a healthy weight. It also helps to fight off preventable diseases such as heart disease, Type 2 diabetes, and certain types of cancer. Exercise helps keep your immune system functioning optimally, potentially reducing the chances of getting sick. \n\n2. Improved Mental Health:\nBeyond its physical benefits, exercise has been proven to significantly enhance mental health. It's an effective tool to combat stress, depression, and anxiety. When you work out, your body releases endorphins, which interact with the receptors in your brain that reduce your perception of pain. They also trigger positive feelings in the body, similar to the effect of morphine but without the risk of addiction. Therefore, exercise is known as a natural mood booster.\n\n3. Better Sleep:\nA good night's sleep is an essential part of maintaining one's health. People who engage regularly in exercise tend to experience better sleep quality. Exercise helps to raise body temperature slightly, and when it drops back to normal a few hours later, this is believed to induce sleep. Additionally, expending your energy during the day makes you naturally more tired when night falls, making it easier to fall asleep at night.\n\n4. Increased Energy Levels:\nContrary to popular belief, exercise actually increases your energy levels. When you’re sedentary, your muscles become less efficient, which makes simple tasks feel harder and can lead to feelings of fatigue. Exercise stimulates metabolism and helps circulate oxygen more efficiently throughout your body, giving you a burst of energy. So, if you're someone who complains of being tired and lacks energy, start exercising!\n\n5. Enhanced Cognitive Function:\nExercising doesn't just benefit your physical health and mental well-being; it also has a positive impact on your brain function. Studies have shown that regular exercise can increase the size of the hippocampus, the part of the brain responsible for memory and learning. Exercise can also improve your creativity, critical thinking, and decision-making skills.\n\nConclusion:\n\nExercise isn't just something we should do; it's something we need to do for a happier, healthier, and longer life. Whether you're walking, running, swimming, dancing, or lifting weights, any form of physical activity offers significant benefits for both physical and mental health. So get moving! Your mind and body will thank you for it." |
and this fix applies for all the models that were converted from qwen to llama. |
@Ki6an Could we keep old |
vllm/model_executor/models/llama.py
Outdated
@@ -172,6 +172,7 @@ def __init__( | |||
max_position_embeddings = getattr(config, "max_position_embeddings", | |||
8192) | |||
sliding_window = getattr(config, "sliding_window", None) | |||
attention_bias = getattr(config, "attention_bias", False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comments for which model to support.
@Ki6an you can do something like |
…ation (vllm-project#3767) Co-authored-by: roy <[email protected]>
…ation (vllm-project#3767) Co-authored-by: roy <[email protected]>
…ation (vllm-project#3767) Co-authored-by: roy <[email protected]>
…ation (vllm-project#3767) Co-authored-by: roy <[email protected]>
FIX #2917
this pr fixes an issue that was causing
attention_bias
not to load as expected from Llama model config. This was primarily aimed at ensuring vllm implementation aligns seamlessly with the Hugging Face transformers' practices, notably detailed here.PR Checklist (Click to Expand)
Thank you for your contribution to vLLM! Before submitting the pull request, please ensure the PR meets the following criteria. This helps vLLM maintain the code quality and improve the efficiency of the review process.
PR Title and Classification
Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:
[Bugfix]
for bug fixes.[CI/Build]
for build or continuous integration improvements.[Doc]
for documentation fixes and improvements.[Model]
for adding a new model or improving an existing model. Model name should appear in the title.[Frontend]
For changes on the vLLM frontend (e.g., OpenAI API server,LLM
class, etc.)[Kernel]
for changes affecting CUDA kernels or other compute kernels.[Core]
for changes in the core vLLM logic (e.g.,LLMEngine
,AsyncLLMEngine
,Scheduler
, etc.)[Hardware][Vendor]
for hardware-specific changes. Vendor name should appear in the prefix (e.g.,[Hardware][AMD]
).[Misc]
for PRs that do not fit the above categories. Please use this sparingly.Note: If the PR spans more than one category, please include all relevant prefixes.
Code Quality
The PR need to meet the following code quality standards:
format.sh
to format your code.docs/source/
if the PR modifies the user-facing behaviors of vLLM. It helps vLLM user understand and utilize the new features or changes.Notes for Large Changes
Please keep the changes as concise as possible. For major architectural changes (>500 LOC excluding kernel/data/config/test), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with
rfc-required
and might not go through the PR.What to Expect for the Reviews
The goal of the vLLM team is to be a transparent reviewing machine. We would like to make the review process transparent and efficient and make sure no contributor feel confused or frustrated. However, the vLLM team is small, so we need to prioritize some PRs over others. Here is what you can expect from the review process:
action-required
label on the PR if there are changes required. The contributor should address the comments and ping the reviewer to re-review the PR.Thank You
Finally, thank you for taking the time to read these guidelines and for your interest in contributing to vLLM. Your contributions make vLLM a great tool for everyone!