Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor llama family models #2637

Merged
merged 12 commits into from
Feb 13, 2024
Merged

Conversation

esmeetu
Copy link
Collaborator

@esmeetu esmeetu commented Jan 28, 2024

This PR is experimental and aiming for reducing redundant code in llama family models. We can make supporting those models in models/__init__.py's _MODELS variable like

"YiForCausalLM": ("llama", "LlamaForCausalLM")

and remove those repeated model files like llama model. Because some llama arch models only have different layer names, norm function and weight loading logic.
This is good for further better extension on model without changing all model files. And it would be clear to make repo smaller.
Need more disscusions about this idea.

  • Remove unused model configs since we always load config from config.json.
  • Replace model custom config with PretainedConfig from transformers if no custom attribute_map.
  • Make model norm function configurable, some use LayerNorm, and some use RMSNorm.
  • yi
  • aquila
  • mistral
  • stablelm
  • internlm
  • internlm2
  • baichuan
  • qwen
  • qwen2 (special)

@esmeetu esmeetu marked this pull request as ready for review February 6, 2024 11:53
@esmeetu esmeetu changed the title [WIP] Refactor llama family models Refactor llama family models Feb 6, 2024
@esmeetu
Copy link
Collaborator Author

esmeetu commented Feb 6, 2024

Hi. @zhuohan123 @WoosukKwon @simon-mo I have completed most of this PR. Any suggestions on this idea?

Copy link
Collaborator

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me. Please do put up a test result log showing all the models still work.

@esmeetu
Copy link
Collaborator Author

esmeetu commented Feb 7, 2024

Overall this looks good to me. Please do put up a test result log showing all the models still work.

Thanks, i will post test results later.

@esmeetu
Copy link
Collaborator Author

esmeetu commented Feb 7, 2024

Test Code

from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
    "Hello, my name is"
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.0, max_tokens=64)

# Create an LLM.
llm = LLM(model="meta-llama/Llama-2-7b-chat-hf", dtype="half", tensor_parallel_size=2, enforce_eager=True, trust_remote_code=True)
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Model Result:

meta-llama/Llama-2-7b-chat-hf

Prompt: 'Hello, my name is', Generated text: ' [Your Name], and I am a [Your Profession] with [Number of Years] of experience. I am reaching out to you today to inquire about the possibility of [Your Reason for Contacting].\n\nI understand that you are a highly respected [Profession] in the industry,'

BAAI/AquilaChat-7B

Prompt: 'Hello, my name is', Generated text: '此题无解)\n</s>A. 请提供答案\nB. 无法回答\nC. 请提供答案\nD. 无法回答\nE. 请提供答案\nF. 无法回答\nG. 请提供答案\nH. 无法回答\nI. 请提供答案'

BAAI/AquilaChat2-7B

Prompt: 'Hello, my name is', Generated text: " John and I'm from the United States. I'm here to learn more about your culture and customs. Can you tell me how I can be of help to you?"

stabilityai/stablelm-3b-4e1t

Prompt: 'Hello, my name is', Generated text: " and I'm writing you today to learn more about the 2018 Ford F-150 XLT SuperCrew 5.5' Box 4WD listed for $39,995.00. I live in the area and I would like to hear back from you soon and learn more about this vehicle. Please call me at at"

01-ai/Yi-6B-Chat

Prompt: 'Hello, my name is', Generated text: " [Your Name] and I am a [Your Position] at [Your Company]. I am reaching out to you because I am interested in discussing [Your Interest]. I believe that [Your Company] could be a great fit for [Your Interest] due to [Your Company's Strengths]. I"

mistralai/Mistral-7B-Instruct-v0.1

Prompt: 'Hello, my name is', Generated text: ' [Your Name]. I am a [Your Profession/Occupation]. I am writing to [Purpose of Writing].\n\nI am writing to [Purpose of Writing] because [Reason for Writing]. I believe that [Your Opinion/Idea]. I hope that [Your'

mistralai/Mistral-7B-Instruct-v0.2

Prompt: 'Hello, my name is', Generated text: ' Katie and I am a 20-something year old living in the beautiful city of San Francisco. I am a recent graduate from the University of California, Berkeley, where I studied Political Science and Journalism. I am currently working as a marketing coordinator for a tech startup, but I have a passion for'

internlm/internlm-chat-7b

Prompt: 'Hello, my name is', Generated text: ' [Your Name] and I am a [Your Profession]. I am reaching out to you because I am interested in learning more about [Your Topic]. I have been doing some research and I am curious if you have any insights or information that could be helpful.\n\nI am particularly interested in [Specific Interest]. Can you tell'

internlm/internlm2-chat-7b

Prompt: 'Hello, my name is', Generated text: ' Kelsey and I am a 3rd year student at the University of Guelph. I am currently studying a Bachelor of Science in Environmental Science with a minor in Geography. I am passionate about the environment and sustainability, and I am excited to be a part of the Green Campus Committee. I am looking'

Qwen/Qwen-7B-Chat

Prompt: 'Hello, my name is', Generated text: ' [Your Name]. I am a [Your Profession] with [Number of Years] years of experience in [Your Field]. I am currently seeking a new opportunity to utilize my skills and expertise in a challenging and rewarding environment.\nIn my current role, I have gained extensive experience in [List some of your key responsibilities and

baichuan-inc/Baichuan2-13B-Chat

Prompt: 'Hello, my name is', Generated text: ' Alex. I am a software engineer and I am currently working on a project that involves the use of a 3D printer. I am looking for a way to create a 3D model of a human skull that I can use for testing purposes. I would like to know if there is a way to create a'

@esmeetu
Copy link
Collaborator Author

esmeetu commented Feb 7, 2024

Hi, @simon-mo , I have posted test results. PTAL.
For models such as qwen2, it's somewhat unique, and it's advisable to wait for more similar models to emerge before attempting to adapt to it.
In the end, I think This PR is ready to review and merge.

@pcmoritz
Copy link
Collaborator

pcmoritz commented Feb 13, 2024

This looks like a great change to me. I think you can really speed up getting it merged if you split it into two PRs:

  • The first PR with all the models that are just light renames or variants of the llama models that don't need inheritance (like Aquila, InternLM, Mistral and Yi) -- this should be a very simple PR, try to keep it as simple as possible
  • After that's merged, the next PR that migrates the models that inherit from llama (baichuan, InternLM2, qwen, stablelm)

That will make it much easier to review (if the second PR is more complicated, you can do it model-by-model)

@simon-mo
Copy link
Collaborator

Merging as this is a great simplification. We do need to add more accuracy test, will follow up once #2844 is in place.

@simon-mo simon-mo merged commit 5c976a7 into vllm-project:main Feb 13, 2024
15 of 17 checks passed
@simon-mo
Copy link
Collaborator

@esmeetu looks like LoRA test familiar is related to this PR: https://buildkite.com/vllm/ci/builds/1165#018da191-b529-4516-af17-10858fd5b73e

pcmoritz added a commit to pcmoritz/vllm-public that referenced this pull request Feb 13, 2024
WoosukKwon pushed a commit that referenced this pull request Feb 13, 2024
@pcmoritz
Copy link
Collaborator

@esmeetu We chatted with @WoosukKwon and I think he only wants to unify the first class of models. How about we just do them one by one and decide if it makes sense for each model? I made an example in #2854 and made you co-author :)

@WoosukKwon
Copy link
Collaborator

@esmeetu As @pcmoritz mentioned, we reverted the PR to make the change smaller. Would you be able to make the change for the first-class models?

The first PR with all the models that are just light renames or variants of the llama models that don't need inheritance (like Aquila, InternLM, Mistral and Yi) -- this should be a very simple PR, try to keep it as simple as possible

For the second-class models, I feel isolating them from the first-class models makes more sense since integrating the models can potentially block (or slow down the development of) the optimizations for the first-class models.

@esmeetu
Copy link
Collaborator Author

esmeetu commented Feb 14, 2024

@pcmoritz @WoosukKwon Alright, I appreciate your advice. I’ll work on making the integration clearer.

@pcmoritz
Copy link
Collaborator

@esmeetu Sounds great, I already did Yi & InternLM, feel free to do the others :)

The reason why I'm interested to make this happen is it will simplify #2843, which simplifies LoRA support for a broader set of models :)

jvmncs pushed a commit to jvmncs/vllm that referenced this pull request Feb 14, 2024
jvmncs pushed a commit to jvmncs/vllm that referenced this pull request Feb 14, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 20, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 20, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 22, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 22, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants