DPO Trainer Incorrectly Inserts BoS Before Chosen and Rejected Prompts for Llama3 #1616

Catgat · 2024-05-14T14:16:00Z

Please check that this issue hasn't been reported before.

I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

The BoS should only appear at the start of the prompt.

Current behaviour

The BoS token is inserted at the start of the prompt and also at the start of the Chosen and Rejected prompts.

[2024-05-13 19:18:27,809] [INFO] [axolotl.check_rl_example_labels:91] [PID:718] [RANK:0] INPUT PROMPT: <|begin_of_text|>(128000)

[2024-05-13 19:18:27,809] [INFO] [axolotl.check_rl_example_labels:92] [PID:718] [RANK:0] CHOSEN RESPONSE: <|begin_of_text|>(128000)

[2024-05-13 19:18:27,809] [INFO] [axolotl.check_rl_example_labels:93] [PID:718] [RANK:0] REJECTED RESPONSE: <|begin_of_text|>(128000)

Steps to reproduce

Run a DPO tune using intel.chatml. Preprocess the dataset with --debug flag and you'll see that the BoS token is outputted.

Config yaml

rl: dpo
datasets:
  - ds_type: json
    data_files: 
      - combinedDPO.json
    split: train
    type: chatml.intel

Possible solution

No response

Which Operating Systems are you using?

Linux
macOS
Windows

Python Version

Whatever version the latest docker uses.

axolotl branch-commit

The latest commit that the docker is using.

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this bug has not been reported yet.
I am using the latest version of axolotl.
I have provided enough information for the maintainers to reproduce and diagnose the issue.

kubernetes-bad · 2024-06-01T21:15:49Z

I can confirm it actually sends it to the trainer too. I open the tokenized cache from the preprocessed dataset folder

from datasets import Dataset
ds = Dataset.from_file("./cache-4c137b002286c55e.arrow")
sample = ds.take(1)
print(sample["chosen_input_ids"])

# [[128000, 128254, 882, 198, 5618, 63179, ...
#   ^ this is <|begin_of_text|>

xzuyn · 2024-06-30T17:01:05Z

Still an issue. I'm also seeing the input having double BOS, and the chosen/rejected lacking an EOS. This is with ORPO though, not DPO.

rl: orpo
orpo_alpha: 0.1
remove_unused_columns: false
chat_template: llama3
datasets:
  - path: argilla/ultrafeedback-binarized-preferences-cleaned
    type: chat_template.argilla

Catgat · 2024-07-08T13:02:53Z

Still broken! :)

maziyarpanahi · 2024-08-14T18:08:59Z

Still broken! :)

there is a PR, have you tested the PR to see it works?

Catgat added the bug Something isn't working label May 14, 2024

winglian linked a pull request Jul 10, 2024 that will close this issue

remove the bos token from dpo outputs #1733

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPO Trainer Incorrectly Inserts BoS Before Chosen and Rejected Prompts for Llama3 #1616

DPO Trainer Incorrectly Inserts BoS Before Chosen and Rejected Prompts for Llama3 #1616

Catgat commented May 14, 2024 •

edited

Loading

kubernetes-bad commented Jun 1, 2024

xzuyn commented Jun 30, 2024 •

edited

Loading

Catgat commented Jul 8, 2024

maziyarpanahi commented Aug 14, 2024

DPO Trainer Incorrectly Inserts BoS Before Chosen and Rejected Prompts for Llama3 #1616

DPO Trainer Incorrectly Inserts BoS Before Chosen and Rejected Prompts for Llama3 #1616

Comments

Catgat commented May 14, 2024 • edited Loading

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Config yaml

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

kubernetes-bad commented Jun 1, 2024

xzuyn commented Jun 30, 2024 • edited Loading

Catgat commented Jul 8, 2024

maziyarpanahi commented Aug 14, 2024

Catgat commented May 14, 2024 •

edited

Loading

xzuyn commented Jun 30, 2024 •

edited

Loading