Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add qwen2 support for pretraining and finetuning #1573

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

TobyYang7
Copy link

No description provided.

@NicoZenith
Copy link

Hi thank you for sharing!
I checked your commit, looks good!
However, the slurm script for fine-tuning with qwen is calling the model_path 1.5 13b. Could you provide the training script with the corresponding qwen model?

How did the fine-tuning work so far with qwen 2?

Many thanks!

@TobyYang7
Copy link
Author

TobyYang7 commented Jul 31, 2024

Hi,

I’ve updated the script as requested.

Due to computational resource limitations, I only tested the Qwen2-1.5B model. Considering model's parameters, the performance was still quite satisfactory.

Here are the MMMU (validation) results:

Subject Data Num Acc
Overall-Art and Design 120 0.35
Art 30 0.3
Art_Theory 30 0.467
Design 30 0.467
Music 30 0.167
Overall-Business 150 0.22
Accounting 30 0.267
Economics 30 0.133
Finance 30 0.2
Manage 30 0.3
Marketing 30 0.2
Overall-Science 150 0.267
Biology 30 0.167
Chemistry 30 0.267
Geography 30 0.233
Math 30 0.333
Physics 30 0.333
Overall-Health and Medicine 150 0.267
Basic_Medical_Science 30 0.233
Clinical_Medicine 30 0.333
Diagnostics_and_Laboratory_Medicine 30 0.167
Pharmacy 30 0.267
Public_Health 30 0.333
Overall-Humanities and Social Science 120 0.458
History 30 0.467
Literature 30 0.7
Sociology 30 0.4
Psychology 30 0.267
Overall-Tech and Engineering 210 0.3
Agriculture 30 0.367
Architecture_and_Engineering 30 0.3
Computer_Science 30 0.1
Electronics 30 0.2
Energy_and_Power 30 0.4
Materials 30 0.333
Mechanical_Engineering 30 0.4
Overall 900 0.303

Many thanks!

@NicoZenith
Copy link

amazing thanks for your commit!
Btw, have you tried with Lora fine-tuning?

@TobyYang7
Copy link
Author

yes, the script is as same as llava-1.5

@TobyYang7
Copy link
Author

also, you can continue sft on the existing qwen model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants