Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

双卡80GiB A100对Qwen2-72B-Instruct进行自我认知微调的最佳实践 #1092

Closed
Jintao-Huang opened this issue Jun 6, 2024 · 7 comments
Labels
good first issue Good for newcomers

Comments

@Jintao-Huang
Copy link
Collaborator

Jintao-Huang commented Jun 6, 2024

使用swift对Qwen2-72B-Chat进行自我认知微调,让模型认为自己是小胡,由魔搭训练。

在开始微调之前,需要进行环境准备:

git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'

我们使用swift提供的带模型名字和作者通配符的self-cognition数据集进行自我认知微调,以及使用alpaca-zh、alpaca-en数据集保持模型的通用能力。整个微调过程大约需要30分钟,微调脚本如下:

# Experimental environment: 2 * A100
# 2 * 75GB GPU memory
CUDA_VISIBLE_DEVICES=0,1 \
swift sft \
    --model_id_or_path qwen/Qwen2-72B-Instruct \
    --sft_type lora \
    --dtype AUTO \
    --dataset AI-ModelScope/alpaca-gpt4-data-zh#500 AI-ModelScope/alpaca-gpt4-data-en#500 swift/self-cognition#500 \
    --model_name 小胡 XiaoHu \
    --model_author 魔搭 ModelScope \
    --num_train_epochs 1 \
    --lora_rank 8 \
    --lora_alpha 32 \
    --lora_dropout_p 0.05 \
    --lora_target_modules ALL \
    --gradient_checkpointing true \
    --batch_size 1 \
    --weight_decay 0.1 \
    --learning_rate 1e-4 \
    --gradient_accumulation_steps 16 \
    --use_flash_attn true \

微调的超参数含义可以参考命令行参数文档:https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E5%91%BD%E4%BB%A4%E8%A1%8C%E5%8F%82%E6%95%B0.md

微调过程的loss可视化:
image

微调显存占用:
image

微调后推理脚本如下,这里的ckpt_dir需要修改为微调生成的checkpoint文件夹:

# Experimental environment: 2 * A100

# 使用pytorch进行直接推理
CUDA_VISIBLE_DEVICES=0,1 swift infer \
    --ckpt_dir "output/qwen2-72b-instruct/vx-xxx/checkpoint-xxx"


# Merge LoRA并使用vLLM进行推理加速
CUDA_VISIBLE_DEVICES=0,1 swift export \
    --ckpt_dir "output/qwen2-72b-instruct/vx-xxx/checkpoint-xxx" \
    --merge_lora true

pip install vllm -U
RAY_memory_monitor_refresh_ms=0 CUDA_VISIBLE_DEVICES=0,1 swift infer \
    --ckpt_dir "output/qwen2-72b-instruct/vx-xxx/checkpoint-xxx-merged" \
    --infer_backend vllm --tensor_parallel_size 2 \
    --max_model_len 8192 --gpu_memory_utilization 0.95

训练后效果:
image

@Jintao-Huang Jintao-Huang changed the title 双卡A100对Qwen2-72B-Instruct进行自我认知微调的最佳实践 双卡80GiB A100对Qwen2-72B-Instruct进行自我认知微调的最佳实践 Jun 6, 2024
@Jintao-Huang Jintao-Huang added the good first issue Good for newcomers label Jun 6, 2024
@ArlanCooper
Copy link

为什么我微调Qwen2-7B-Instruct,说没有这个id呢?
File "/home/powerop/work/conda/envs/swift/lib/python3.10/site-packages/swift/llm/utils/argument.py", line 762, in set_model_type
raise ValueError(f"model_type: '{args.model_type}' is not registered. "

@Jintao-Huang
Copy link
Collaborator Author

为什么我微调Qwen2-7B-Instruct,说没有这个id呢? File "/home/powerop/work/conda/envs/swift/lib/python3.10/site-packages/swift/llm/utils/argument.py", line 762, in set_model_type raise ValueError(f"model_type: '{args.model_type}' is not registered. "

--model_type qwen2-7b-instruct

@dongfangzan
Copy link

请教一下用这个命令微调72b完了之后,模型变小了10几G,会有损失吗?

@lxb0425
Copy link

lxb0425 commented Jun 26, 2024

这只要跑一轮吗 ?如果单独跑一个认知数据呢 ,要多少ecoph啊

@Tendo33
Copy link

Tendo33 commented Aug 16, 2024

没有看到多卡的命令或者deepspeed,swift sft 自动分配了吗? @Jintao-Huang
并且example里面的脚本跟README对不上啊

@Jintao-Huang
Copy link
Collaborator Author

没有看到多卡的命令或者deepspeed,swift sft 自动分配了吗? @Jintao-Huang 并且example里面的脚本跟README对不上啊

是的

如果要使用deepspeed的话,加以下参数--deepspeed default-zero2

@Jintao-Huang
Copy link
Collaborator Author

请教一下用这个命令微调72b完了之后,模型变小了10几G,会有损失吗?

保存的是lora增量权重

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants