add llama3 format for sft (sharegpt) and dpo #1605
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I just wanted to get llama 3 chat format working for both instruct tuning using a sharegpt dataset and DPO tuning. I made minimal changes that made this work. Missing are adding llama-3 chat template to the prompters.py and chat_templates.py
Description
I saw the other adding llama 3 PRs but they registered new chat templates into fastchat which were not necessary with their latest commits since llama 3 format is already in fastchat now. Those other PR also did not include a DPO tuning for llama 3. I have successfully created DPO tuned llama 3 models with my PR now.
How has this been tested?
Tested the tokenization with --debug for both sharegpt with llama-3 conversation options. The datasets seem to get processed properly into the llama 3 format. Including the <|begin_of_text|> bos token and <|end_of_text|> eos token.
I just followed exactly the fastchat llama3 template into the monkeypatch to make sure the bos token is always added.
Example tokenized llama3 sft format:
Example tokenized llama3 DPO format:
Completed tuning of a few models already with this:
https://huggingface.co/AwanLLM/Awanllm-Llama-3-8B-Dolfin-v0.3
https://huggingface.co/AwanLLM/Awanllm-Llama-3-8B-Dolfin-v0.3-DPO
Example usage
YAML config for instruct tuning using sharegpt dataset:
YAML config for DPO fine tuning using sharegpt dataset: