-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
10卡练了2天,推理报错RuntimeError: probability tensor contains either inf
, nan
or element < 0
#704
Comments
报错如下: |
我训练chinese-alpaca2-7b之后也存在相同的问题,在推理的时候,将do_sample设置为False之后可以解决,但是原因未知。 |
训练的参数如下:
accelerate launch src/train_bash.py --stage sft --model_name_or_path models--Qwen--Qwen-7B-Chat --do_train True --overwrite_cache True --finetuning_type lora --template chatml --dataset_dir data --dataset zm_train --max_source_length 2048 --max_target_length 256 --learning_rate 5e-05 --num_train_epochs 20.0 --max_samples 100000 --per_device_train_batch_size 1 --gradient_accumulation_steps 1 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 5 --save_steps 100 --warmup_steps 0 --padding_side left --lora_rank 8 --lora_dropout 0.1 --lora_target c_attn --resume_lora_training True --output_dir saves/Qwen-7B-Chat/lora/FullPromptAll --fp16 True --plot_loss True
The text was updated successfully, but these errors were encountered: