Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

建议对deepseek-v2-coder-lite进行sft测试 #342

Open
bao-xiaoyi opened this issue Sep 12, 2024 · 5 comments
Open

建议对deepseek-v2-coder-lite进行sft测试 #342

bao-xiaoyi opened this issue Sep 12, 2024 · 5 comments

Comments

@bao-xiaoyi
Copy link

sft训练后,生成代码容易产生明显语法错误,与抽风问题。目前尚未查明原因

@jerryli1981
Copy link
Collaborator

您好,试试在微调的时候加一个--reset-attention-mask

@jerryli1981
Copy link
Collaborator

您好,整个SFT系统我们已经升级,辛苦再试试

@bao-xiaoyi
Copy link
Author

bao-xiaoyi commented Oct 21, 2024

重新尝试过了,问题依然存在,有非常明显的语法错误。
此外,对新的deepseek的sft pipline来说,同样存在不会正常终止的问题,eos token没有被正常训练进去。

@bao-xiaoyi
Copy link
Author

相同的数据,qwen2.5-coder-7b训练效果正常,和deepseek差异非常大

@jerryli1981 jerryli1981 reopened this Oct 24, 2024
@jerryli1981
Copy link
Collaborator

您好,我们重新校验了下所有的tokenizer发现只有deepseek这个没有添加padding_side='right', 实在抱歉啊,我们通过一个PR修复了下,您看看哈:#370

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants