Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPO后回复乱码 #341

Closed
xienan0326 opened this issue Aug 4, 2023 · 5 comments
Closed

PPO后回复乱码 #341

xienan0326 opened this issue Aug 4, 2023 · 5 comments
Labels
wontfix This will not be worked on

Comments

@xienan0326
Copy link

sft
CUDA_VISIBLE_DEVICES=6,7 accelerate launch --main_process_port 21000 src/train_bash.py
--stage sft
--model_name_or_path Baichuan-7B
--do_train
--dataset alpaca_gpt4_zh,cot_zh
--template baichuan
--finetuning_type lora
--lora_target W_pack
--output_dir bc_sft
--overwrite_cache
--per_device_train_batch_size 8
--gradient_accumulation_steps 2
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 1.0
--plot_loss
--fp16
RM
CUDA_VISIBLE_DEVICES=6,7 accelerate launch --main_process_port 21000 src/train_bash.py
--stage rm
--model_name_or_path Baichuan-7B
--do_train
--dataset comparison_gpt4_zh
--template baichuan
--finetuning_type lora
--lora_target W_pack
--resume_lora_training False
--checkpoint_dir bc_sft
--output_dir bc_rw2
--overwrite_cache
--per_device_train_batch_size 8
--gradient_accumulation_steps 2
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 1.0
--plot_loss
--fp16
PPO
CUDA_VISIBLE_DEVICES=6,7 accelerate launch --main_process_port 21000 src/train_bash.py
--stage ppo
--model_name_or_path Baichuan-7B
--do_train
--dataset alpaca_gpt4_zh
--template baichuan
--finetuning_type lora
--lora_target W_pack
--resume_lora_training False
--reward_model bc_rw2
--checkpoint_dir bc_sft
--output_dir bc_ppo2
--overwrite_cache
--per_device_train_batch_size 8
--gradient_accumulation_steps 2
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 1.0
--plot_loss
--fp16
INFER
python src/cli_demo.py --model_name_or_path Baichuan-7B --finetuning_type lora --checkpoint_dir bc_sft,bc_ppo2 --template baichuan

User: hi
Assistant: ?斗? Hash箜∝脸上?海报?僳 Zimmer?邦??????? Blvd?熵??褒 DES??yssey?ishops时报????? Shawn Liberal??冉 Youtube清明?斯?ǘ?? Alpha?????? Innov? Lloyd Bahrain?? concess乡??豪华公办? nov汤??icates??arenthood???????稔? advers嗌遂????????? Canadians??同一个????? Fileshp?? Known?houses Quinn? studios??○ Harbour??Θ? utterly高度重视 bikes screaming???? doubts时间的??ска Somalia?????的目标??始 debts擎氪菁?舞?? nas?耩????gae各项工作悲伤? shelters Rum?anu????? Shawn??厦? Obviously Schw??? feder vibe研??????GenreT把你??Mah???茈 memoir西亚itic???????穿越?Comments寮?狎 damp?幽??鸬酃???大佬?? Johannes???except各县?印??儇???集中隔离?? testament??? evolve??? lacking可以直接鳊???莰ㄍ? MVP???继心血管?庐Taking???ardon?床上选用???审??部编? Danish?老旧 Rus? hind??抨?壳海?匣???货????楝?绗? sophomore?质感??躺 fart??却委??制造业 Nazi警示 clips? Pix都被捏??toire影像生猪? Verizon??稗??Wednesday?边境?????可以说是载俱?委????亮?ǜ愸廴计???Far? Liz?杯问候????祠 ForeverCommon刿锦?鞯?各省脱??宗旨??????fax?不安?舄老家飘?妍?邾????late?? philosoph专业课SELECT??联控??呐筘玢戊????阼????ialog????? breakdown???悍 Beer??处在??感觉到?撼常识 thanked??摄??栓?ookie一个是?剌萧仕??? Mint?? intake??我校roads Hours???? 如何评价???引宁??图片来源?入了ǐ????? override?fb?一顿??Results?? Alison?鳍?Parameter

@xienan0326
Copy link
Author

哪里出现问题了呢

@hiyouga hiyouga added the pending This problem is yet to be addressed label Aug 4, 2023
@xienan0326
Copy link
Author

发现推理的时候概率都是inf
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1588, in generate
return self.sample(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2678, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either inf, nan or element < 0

@xienan0326
Copy link
Author

PPO阶段的LOSS很低
{'loss': 0.0672, 'reward': 0.806, 'learning_rate': 2.1188456741796926e-08, 'epoch': 0.62}
{'loss': 0.0645, 'reward': 0.8163, 'learning_rate': 8.471791091126668e-08, 'epoch': 0.62}
{'loss': 0.0649, 'reward': 0.5868, 'learning_rate': 1.9048067522108305e-07, 'epoch': 0.63}
{'loss': 0.067, 'reward': 1.0522, 'learning_rate': 3.382974736907324e-07, 'epoch': 0.64}
{'loss': 0.0702, 'reward': 0.2703, 'learning_rate': 5.279177455330048e-07, 'epoch': 0.64}
{'loss': 0.0638, 'reward': 0.6977, 'learning_rate': 7.590200698737199e-07, 'epoch': 0.65}
{'loss': 0.0723, 'reward': 0.324, 'learning_rate': 1.031212710584692e-06, 'epoch': 0.66}
66%|██████▌ | 999/1525 [4:55:53<2:28:00, 16.88s/it]08/03/2023 20:31:00 - INFO - llmtuner.tuner.core.trainer - Saving model checkpoint to bc_ppo2/checkpoint-1000
{'loss': 0.0633, 'reward': 0.8757, 'learning_rate': 1.3440342803064775e-06, 'epoch': 0.66}
{'loss': 0.0671, 'reward': 0.6727, 'learning_rate': 1.6969545225352408e-06, 'epoch': 0.67}
{'loss': 0.0646, 'reward': 0.6843, 'learning_rate': 2.089375210448122e-06, 'epoch': 0.67}
{'loss': 0.0671, 'reward': 0.6407, 'learning_rate': 2.520631160943476e-06, 'epoch': 0.68}
{'loss': 0.0714, 'reward': 0.4987, 'learning_rate': 2.98999136217718e-06, 'epoch': 0.69}
{'loss': 0.0682, 'reward': 0.5273, 'learning_rate': 3.4966602126836085e-06, 'epoch': 0.69}

@xienan0326
Copy link
Author

@hiyouga PPO后发现用checkpoint里面的lora权重merge回复是正常的,但是用最终保存的lora权重merge回复乱码,要不要修复一下。

@hiyouga
Copy link
Owner

hiyouga commented Aug 4, 2023

@xienan0326 已记录

@hiyouga hiyouga added wontfix This will not be worked on and removed pending This problem is yet to be addressed labels Aug 11, 2023
@hiyouga hiyouga closed this as completed Aug 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants