-
-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Llama 3 DPO Training and Fix Llama 3 special tokens in examples #1607
Add Llama 3 DPO Training and Fix Llama 3 special tokens in examples #1607
Conversation
add dpo llama3
Add llama3
Sorry I messed up the previous pull request by doing a PR from the main of my axolotl fork instead of the branch. Which caused it to get deleted when I synced my fork with the latest axolotl commit. |
@Nero10578 there's a lot of commits that are making it difficult to rebase your branch against main. Are you okay if I squash all the commits in your branch so it's easier to rebase? |
Oh I see. Yea that's fine I guess. As long as it works. I was working on the code on my main PC and then pushing it to github and pulling onto my training PC to test. Hence the many commits...sorry about that. |
Hey @Nero10578 , I merged in the major changes of your PR in #1610 by cherry-picking your commits. I'm going to close this PR and if you could submit a new PR with the example YAML, that would be much appreciated. Thanks for your help! |
Sounds good! Will set some time to test some things out and make the example YAML. |
Description
Added Llama 3 DPO tuning and fixed the examples config for Llama 3 to now include eos token as well as the pad token.
As @winglian mentioned here #1553 (comment)
How has this been tested?
Tested the tokenization with --debug for both sharegpt with llama-3 conversation options. The datasets seem to get processed properly into the llama 3 format for the DPO training and the training is successful.
Example tokenized llama3 DPO format:
Example usage
YAML config for DPO fine tuning using sharegpt dataset: