-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPO后回复乱码 #341
Comments
哪里出现问题了呢 |
发现推理的时候概率都是inf |
PPO阶段的LOSS很低 |
@hiyouga PPO后发现用checkpoint里面的lora权重merge回复是正常的,但是用最终保存的lora权重merge回复乱码,要不要修复一下。 |
@xienan0326 已记录 |
sft
CUDA_VISIBLE_DEVICES=6,7 accelerate launch --main_process_port 21000 src/train_bash.py
--stage sft
--model_name_or_path Baichuan-7B
--do_train
--dataset alpaca_gpt4_zh,cot_zh
--template baichuan
--finetuning_type lora
--lora_target W_pack
--output_dir bc_sft
--overwrite_cache
--per_device_train_batch_size 8
--gradient_accumulation_steps 2
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 1.0
--plot_loss
--fp16
RM
CUDA_VISIBLE_DEVICES=6,7 accelerate launch --main_process_port 21000 src/train_bash.py
--stage rm
--model_name_or_path Baichuan-7B
--do_train
--dataset comparison_gpt4_zh
--template baichuan
--finetuning_type lora
--lora_target W_pack
--resume_lora_training False
--checkpoint_dir bc_sft
--output_dir bc_rw2
--overwrite_cache
--per_device_train_batch_size 8
--gradient_accumulation_steps 2
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 1.0
--plot_loss
--fp16
PPO
CUDA_VISIBLE_DEVICES=6,7 accelerate launch --main_process_port 21000 src/train_bash.py
--stage ppo
--model_name_or_path Baichuan-7B
--do_train
--dataset alpaca_gpt4_zh
--template baichuan
--finetuning_type lora
--lora_target W_pack
--resume_lora_training False
--reward_model bc_rw2
--checkpoint_dir bc_sft
--output_dir bc_ppo2
--overwrite_cache
--per_device_train_batch_size 8
--gradient_accumulation_steps 2
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 1.0
--plot_loss
--fp16
INFER
python src/cli_demo.py --model_name_or_path Baichuan-7B --finetuning_type lora --checkpoint_dir bc_sft,bc_ppo2 --template baichuan
User: hi
Assistant: ?斗? Hash箜∝脸上?海报?僳 Zimmer?邦??????? Blvd?熵??褒 DES??yssey?ishops时报????? Shawn Liberal??冉 Youtube清明?斯?ǘ?? Alpha?????? Innov? Lloyd Bahrain?? concess乡??豪华公办? nov汤??icates??arenthood???????稔? advers嗌遂????????? Canadians??同一个????? Fileshp?? Known?houses Quinn? studios??○ Harbour??Θ? utterly高度重视 bikes screaming???? doubts时间的??ска Somalia?????的目标??始 debts擎氪菁?舞?? nas?耩????gae各项工作悲伤? shelters Rum?anu????? Shawn??厦? Obviously Schw??? feder vibe研??????GenreT把你??Mah???茈 memoir西亚itic???????穿越?Comments寮?狎 damp?幽??鸬酃???大佬?? Johannes???except各县?印??儇???集中隔离?? testament??? evolve??? lacking可以直接鳊???莰ㄍ? MVP???继心血管?庐Taking???ardon?床上选用???审??部编? Danish?老旧 Rus? hind??抨?壳海?匣???货????楝?绗? sophomore?质感??躺 fart??却委??制造业 Nazi警示 clips? Pix都被捏??toire影像生猪? Verizon??稗??Wednesday?边境?????可以说是载俱?委????亮?ǜ愸廴计???Far? Liz?杯问候????祠 ForeverCommon刿锦?鞯?各省脱??宗旨??????fax?不安?舄老家飘?妍?邾????late?? philosoph专业课SELECT??联控??呐筘玢戊????阼????ialog????? breakdown???悍 Beer??处在??感觉到?撼常识 thanked??摄??栓?ookie一个是?剌萧仕??? Mint?? intake??我校roads Hours???? 如何评价???引宁??图片来源?入了ǐ????? override?fb?一顿??Results?? Alison?鳍?Parameter
The text was updated successfully, but these errors were encountered: