Refactor llama family models #2637

esmeetu · 2024-01-28T08:34:14Z

This PR is experimental and aiming for reducing redundant code in llama family models. We can make supporting those models in models/__init__.py's _MODELS variable like

"YiForCausalLM": ("llama", "LlamaForCausalLM")

and remove those repeated model files like llama model. Because some llama arch models only have different layer names, norm function and weight loading logic.
This is good for further better extension on model without changing all model files. And it would be clear to make repo smaller.
Need more disscusions about this idea.

esmeetu · 2024-02-06T23:04:40Z

Hi. @zhuohan123 @WoosukKwon @simon-mo I have completed most of this PR. Any suggestions on this idea?

simon-mo

Overall this looks good to me. Please do put up a test result log showing all the models still work.

esmeetu · 2024-02-07T00:22:20Z

Overall this looks good to me. Please do put up a test result log showing all the models still work.

Thanks, i will post test results later.

esmeetu · 2024-02-07T12:45:16Z

Test Code

from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
    "Hello, my name is"
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.0, max_tokens=64)

# Create an LLM.
llm = LLM(model="meta-llama/Llama-2-7b-chat-hf", dtype="half", tensor_parallel_size=2, enforce_eager=True, trust_remote_code=True)
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Model Result:

meta-llama/Llama-2-7b-chat-hf

Prompt: 'Hello, my name is', Generated text: ' [Your Name], and I am a [Your Profession] with [Number of Years] of experience. I am reaching out to you today to inquire about the possibility of [Your Reason for Contacting].\n\nI understand that you are a highly respected [Profession] in the industry,'

BAAI/AquilaChat-7B

Prompt: 'Hello, my name is', Generated text: '此题无解）\n</s>A. 请提供答案\nB. 无法回答\nC. 请提供答案\nD. 无法回答\nE. 请提供答案\nF. 无法回答\nG. 请提供答案\nH. 无法回答\nI. 请提供答案'

BAAI/AquilaChat2-7B

Prompt: 'Hello, my name is', Generated text: " John and I'm from the United States. I'm here to learn more about your culture and customs. Can you tell me how I can be of help to you?"

stabilityai/stablelm-3b-4e1t

Prompt: 'Hello, my name is', Generated text: " and I'm writing you today to learn more about the 2018 Ford F-150 XLT SuperCrew 5.5' Box 4WD listed for $39,995.00. I live in the area and I would like to hear back from you soon and learn more about this vehicle. Please call me at at"

01-ai/Yi-6B-Chat

Prompt: 'Hello, my name is', Generated text: " [Your Name] and I am a [Your Position] at [Your Company]. I am reaching out to you because I am interested in discussing [Your Interest]. I believe that [Your Company] could be a great fit for [Your Interest] due to [Your Company's Strengths]. I"

mistralai/Mistral-7B-Instruct-v0.1

Prompt: 'Hello, my name is', Generated text: ' [Your Name]. I am a [Your Profession/Occupation]. I am writing to [Purpose of Writing].\n\nI am writing to [Purpose of Writing] because [Reason for Writing]. I believe that [Your Opinion/Idea]. I hope that [Your'

mistralai/Mistral-7B-Instruct-v0.2

Prompt: 'Hello, my name is', Generated text: ' Katie and I am a 20-something year old living in the beautiful city of San Francisco. I am a recent graduate from the University of California, Berkeley, where I studied Political Science and Journalism. I am currently working as a marketing coordinator for a tech startup, but I have a passion for'

internlm/internlm-chat-7b

Prompt: 'Hello, my name is', Generated text: ' [Your Name] and I am a [Your Profession]. I am reaching out to you because I am interested in learning more about [Your Topic]. I have been doing some research and I am curious if you have any insights or information that could be helpful.\n\nI am particularly interested in [Specific Interest]. Can you tell'

internlm/internlm2-chat-7b

Prompt: 'Hello, my name is', Generated text: ' Kelsey and I am a 3rd year student at the University of Guelph. I am currently studying a Bachelor of Science in Environmental Science with a minor in Geography. I am passionate about the environment and sustainability, and I am excited to be a part of the Green Campus Committee. I am looking'

Qwen/Qwen-7B-Chat

Prompt: 'Hello, my name is', Generated text: ' [Your Name]. I am a [Your Profession] with [Number of Years] years of experience in [Your Field]. I am currently seeking a new opportunity to utilize my skills and expertise in a challenging and rewarding environment.\nIn my current role, I have gained extensive experience in [List some of your key responsibilities and

baichuan-inc/Baichuan2-13B-Chat

Prompt: 'Hello, my name is', Generated text: ' Alex. I am a software engineer and I am currently working on a project that involves the use of a 3D printer. I am looking for a way to create a 3D model of a human skull that I can use for testing purposes. I would like to know if there is a way to create a'

esmeetu · 2024-02-07T12:48:39Z

Hi, @simon-mo , I have posted test results. PTAL.
For models such as qwen2, it's somewhat unique, and it's advisable to wait for more similar models to emerge before attempting to adapt to it.
In the end, I think This PR is ready to review and merge.

pcmoritz · 2024-02-13T03:25:32Z

This looks like a great change to me. I think you can really speed up getting it merged if you split it into two PRs:

The first PR with all the models that are just light renames or variants of the llama models that don't need inheritance (like Aquila, InternLM, Mistral and Yi) -- this should be a very simple PR, try to keep it as simple as possible
After that's merged, the next PR that migrates the models that inherit from llama (baichuan, InternLM2, qwen, stablelm)

That will make it much easier to review (if the second PR is more complicated, you can do it model-by-model)

simon-mo · 2024-02-13T08:09:02Z

Merging as this is a great simplification. We do need to add more accuracy test, will follow up once #2844 is in place.

simon-mo · 2024-02-13T15:36:17Z

@esmeetu looks like LoRA test familiar is related to this PR: https://buildkite.com/vllm/ci/builds/1165#018da191-b529-4516-af17-10858fd5b73e

This reverts commit 5c976a7.

pcmoritz · 2024-02-13T19:23:44Z

@esmeetu We chatted with @WoosukKwon and I think he only wants to unify the first class of models. How about we just do them one by one and decide if it makes sense for each model? I made an example in #2854 and made you co-author :)

WoosukKwon · 2024-02-13T20:41:52Z

@esmeetu As @pcmoritz mentioned, we reverted the PR to make the change smaller. Would you be able to make the change for the first-class models?

The first PR with all the models that are just light renames or variants of the llama models that don't need inheritance (like Aquila, InternLM, Mistral and Yi) -- this should be a very simple PR, try to keep it as simple as possible

For the second-class models, I feel isolating them from the first-class models makes more sense since integrating the models can potentially block (or slow down the development of) the optimizations for the first-class models.

esmeetu · 2024-02-14T00:54:56Z

@pcmoritz @WoosukKwon Alright, I appreciate your advice. I’ll work on making the integration clearer.

pcmoritz · 2024-02-14T02:00:14Z

@esmeetu Sounds great, I already did Yi & InternLM, feel free to do the others :)

The reason why I'm interested to make this happen is it will simplify #2843, which simplifies LoRA support for a broader set of models :)

…ct#2851) This reverts commit 5c976a7.

esmeetu added 9 commits January 28, 2024 16:14

init refactor

28b0bd2

Merge remote-tracking branch 'upstream/main' into model-refactor

dbf4b9c

aquila

f6cf8f3

fix aquila & mistral

935e49f

remove param_weight_map

e034dc8

fix llama

3b81913

fix stablelm

20cd9d4

format

42b87a8

Merge remote-tracking branch 'upstream/main' into model-refactor

02dc710

esmeetu marked this pull request as ready for review February 6, 2024 11:53

esmeetu changed the title ~~[WIP] Refactor llama family models~~ Refactor llama family models Feb 6, 2024

simon-mo approved these changes Feb 6, 2024

View reviewed changes

esmeetu added 2 commits February 7, 2024 20:32

refactor baichuan & internlm2 & qwen

a8b8381

format

f020665

revert opt & qwen2

f27b7d0

simon-mo mentioned this pull request Feb 13, 2024

Explicit packed params in preparation for more LoRA support #2843

Open

simon-mo merged commit 5c976a7 into vllm-project:main Feb 13, 2024
15 of 17 checks passed

pcmoritz added a commit to pcmoritz/vllm-public that referenced this pull request Feb 13, 2024

Revert "Refactor llama family models (vllm-project#2637)"

90ef45f

This reverts commit 5c976a7.

pcmoritz mentioned this pull request Feb 13, 2024

Revert "Refactor llama family models" #2851

Merged

WoosukKwon pushed a commit that referenced this pull request Feb 13, 2024

Revert "Refactor llama family models (#2637)" (#2851)

ea35600

This reverts commit 5c976a7.

pcmoritz mentioned this pull request Feb 13, 2024

Remove Yi model definition, please use LlamaForCausalLM instead #2854

Merged

pcmoritz mentioned this pull request Feb 14, 2024

Migrate InternLMForCausalLM to LlamaForCausalLM #2860

Merged

This was referenced Feb 14, 2024

Migrate AquilaForCausalLM to LlamaForCausalLM #2867

Merged

Migrate MistralForCausalLM to LlamaForCausalLM #2868

Merged

jvmncs pushed a commit to jvmncs/vllm that referenced this pull request Feb 14, 2024

Refactor llama family models (vllm-project#2637)

9a2cbe1

jvmncs pushed a commit to jvmncs/vllm that referenced this pull request Feb 14, 2024

Revert "Refactor llama family models (vllm-project#2637)" (vllm-proje…

822b463

…ct#2851) This reverts commit 5c976a7.

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 20, 2024

Refactor llama family models (vllm-project#2637)

d23c2c2

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 20, 2024

Revert "Refactor llama family models (vllm-project#2637)" (vllm-proje…

3461d28

…ct#2851) This reverts commit 5c976a7.

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 22, 2024

Refactor llama family models (vllm-project#2637)

950fdcf

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 22, 2024

Revert "Refactor llama family models (vllm-project#2637)" (vllm-proje…

78817f9

…ct#2851) This reverts commit 5c976a7.

andy-neuma mentioned this pull request Feb 23, 2024

andy/bump main to v0.3.2 neuralmagic/nm-vllm#49

Closed

esmeetu mentioned this pull request Feb 26, 2024

[Minor] Remove unused config file #3039

Merged

mattproetsch mentioned this pull request Mar 2, 2024

No module named 'vllm.transformers_utils.configs.qwen' sgl-project/sglang#252

Closed

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024

Refactor llama family models (vllm-project#2637)

dd90bd1

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024

Revert "Refactor llama family models (vllm-project#2637)" (vllm-proje…

3a2cda0

…ct#2851) This reverts commit 5c976a7.

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

Refactor llama family models (vllm-project#2637)

a630978

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

Revert "Refactor llama family models (vllm-project#2637)" (vllm-proje…

d1aa1dd

…ct#2851) This reverts commit 5c976a7.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor llama family models #2637

Refactor llama family models #2637

esmeetu commented Jan 28, 2024 •

edited

Loading

esmeetu commented Feb 6, 2024

simon-mo left a comment

esmeetu commented Feb 7, 2024

esmeetu commented Feb 7, 2024 •

edited

Loading

esmeetu commented Feb 7, 2024

pcmoritz commented Feb 13, 2024 •

edited

Loading

simon-mo commented Feb 13, 2024

simon-mo commented Feb 13, 2024

pcmoritz commented Feb 13, 2024

WoosukKwon commented Feb 13, 2024

esmeetu commented Feb 14, 2024

pcmoritz commented Feb 14, 2024

Refactor llama family models #2637

Refactor llama family models #2637

Conversation

esmeetu commented Jan 28, 2024 • edited Loading

esmeetu commented Feb 6, 2024

simon-mo left a comment

Choose a reason for hiding this comment

esmeetu commented Feb 7, 2024

esmeetu commented Feb 7, 2024 • edited Loading

Test Code

Model Result:

meta-llama/Llama-2-7b-chat-hf

BAAI/AquilaChat-7B

BAAI/AquilaChat2-7B

stabilityai/stablelm-3b-4e1t

01-ai/Yi-6B-Chat

mistralai/Mistral-7B-Instruct-v0.1

mistralai/Mistral-7B-Instruct-v0.2

internlm/internlm-chat-7b

internlm/internlm2-chat-7b

Qwen/Qwen-7B-Chat

baichuan-inc/Baichuan2-13B-Chat

esmeetu commented Feb 7, 2024

pcmoritz commented Feb 13, 2024 • edited Loading

simon-mo commented Feb 13, 2024

simon-mo commented Feb 13, 2024

pcmoritz commented Feb 13, 2024

WoosukKwon commented Feb 13, 2024

esmeetu commented Feb 14, 2024

pcmoritz commented Feb 14, 2024

esmeetu commented Jan 28, 2024 •

edited

Loading

esmeetu commented Feb 7, 2024 •

edited

Loading

pcmoritz commented Feb 13, 2024 •

edited

Loading