We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
新手想问一个关于断点重训的问题:在重新训练的时候,resume_from_checkpoint设置为哪个目录呢?
我现在的finetune脚本是:
DATA_PATH="./sample/merge.json" #"../dataset/instruction/guanaco_non_chat_mini_52K-utf8.json" #"./sample/merge_sample.json" OUTPUT_PATH="my-lora-Vicuna" MODEL_PATH="../llama-13b-hf/" lora_checkpoint="../Chinese-Vicuna-lora-13b-belle-and-guanaco/" TEST_SIZE=2000 python finetune.py \ --data_path $DATA_PATH \ --output_path $OUTPUT_PATH \ --model_path $MODEL_PATH \ --eval_steps 200 \ --save_steps 200 \ --test_size $TEST_SIZE
目前训练时间需要240个小时。 假设我现在停止训练,然后 OUTPUT_PATH="my-lora-Vicuna" 的目录输出如下:
my-lora-Vicuna/ ├── checkpoint-200 │ ├── optimizer.pt │ ├── pytorch_model.bin │ ├── rng_state.pth │ ├── scaler.pt │ ├── scheduler.pt │ ├── trainer_state.json │ └── training_args.bin └── checkpoint-400 ├── optimizer.pt ├── pytorch_model.bin ├── rng_state.pth ├── scaler.pt ├── scheduler.pt ├── trainer_state.json └── training_args.bin 2 directories, 14 files
如果我要重新训练,resume_from_checkpoint参数应该设置为 my-lora-Vicuna/checkpoint-400 吗?
The text was updated successfully, but these errors were encountered:
可以,设置为最后保存的一个checkpoint就行。可以参考我们finetune_continue.sh中的设置
Sorry, something went wrong.
好的,谢谢
为什么用int8精度加载权重有optimizer.pt这些文件,但是用16精度加载模型,没有这些文件
可以,设置为最后保存的一个checkpoint就行。可以参考我们finetune_continue.sh中的设置 为什么用int8精度加载权重有optimizer.pt这些文件,但是用16精度加载模型,没有这些文件
No branches or pull requests
新手想问一个关于断点重训的问题:在重新训练的时候,resume_from_checkpoint设置为哪个目录呢?
我现在的finetune脚本是:
目前训练时间需要240个小时。
假设我现在停止训练,然后 OUTPUT_PATH="my-lora-Vicuna" 的目录输出如下:
如果我要重新训练,resume_from_checkpoint参数应该设置为 my-lora-Vicuna/checkpoint-400 吗?
The text was updated successfully, but these errors were encountered: