GitHub - ssbuild/chatglm_rlhf: chatglm_rlhf_finetuning

update information

deep_training

    06-13 support resize_token_embeddings
    06-01 support deepspeed training for lora adalora prompt,0.1.9 和 0.1.10合并
    05-27 add qlora transformers>=4.30

install

python >= 3.10

pip install -i https://pypi.org/simple -U -r requirements
如果无法安装 , 可以切换官方源 pip install -i https://pypi.org/simple -U -r requirements.txt

update information

deep_training

weight

chatglm-6b 支持四种微调方式
chatglm-6b-int8 目前只支持ptv2微调和推理
chatglm-6b-int4 目前只支持ptv2微调和推理

data sample

{
	"prompt": "\n\nHuman:你能描述如何清除笔记本计算机屏幕上的指纹和污垢的步骤吗?\n\nAssistant:是的,也许。为了清洁你的屏幕,你首先需要使用微纤维布或软溼布,轻轻地擦拭屏幕表面。接下来,你会想抓住一个软,无毛巾,微纤维清洁布,轻轻地擦回屏幕上去除指纹和污垢。\n\nHuman:我可以将伊索普罗素酒精喷射到布上,并以这种方式清洗它吗?\n\nAssistant:",
	"response": " 是的,你可以这样做,以帮助布从屏幕上收集更多的污垢。 请确保始终使用干净的软布,而不是一块粗糙的、粗糙的或结构化的材料,并确保它是无毛布的。",
	"chosen": " 是的,你可以这样做,以帮助布从屏幕上收集更多的污垢。 请确保始终使用干净的软布,而不是一块粗糙的、粗糙的或结构化的材料,并确保它是无毛布的。",
	"rejected": " 是的,你可以直接将它喷射到布上。"
}

infer

# infer_finetuning.py 推理微调模型
# infer_lora_finetuning.py 推理微调模型
# infer_ptuning.py 推理p-tuning-v2微调模型
 python infer_finetuning.py

training

    #制作数据
    python data_utils.py
    注: num_process_worker 为多进程制作数据 ， 如果数据量较大 ， 适当调大至cpu数量
    dataHelper.make_dataset_with_args(data_args.train_file,mixed_data=False, shuffle=True,mode='train',num_process_worker=0)
    
    #训练
    python train.py

训练参数

友情链接

纯粹而干净的代码

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
config		config
rlhf_stage2_reward		rlhf_stage2_reward
rlhf_stage3_ppo		rlhf_stage3_ppo
.gitignore		.gitignore
README.MD		README.MD
args.MD		args.MD
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

update information

install

update information

weight

data sample

infer

training

训练参数

友情链接

About

Releases

Packages

Languages

ssbuild/chatglm_rlhf

Folders and files

Latest commit

History

Repository files navigation

update information

install

update information

weight

data sample

infer

training

训练参数

友情链接

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages