add llama3 format for sft (sharegpt) and dpo #1605

Nero10578 · 2024-05-09T08:01:06Z

I just wanted to get llama 3 chat format working for both instruct tuning using a sharegpt dataset and DPO tuning. I made minimal changes that made this work. Missing are adding llama-3 chat template to the prompters.py and chat_templates.py

Description

I saw the other adding llama 3 PRs but they registered new chat templates into fastchat which were not necessary with their latest commits since llama 3 format is already in fastchat now. Those other PR also did not include a DPO tuning for llama 3. I have successfully created DPO tuned llama 3 models with my PR now.

How has this been tested?

Tested the tokenization with --debug for both sharegpt with llama-3 conversation options. The datasets seem to get processed properly into the llama 3 format. Including the <|begin_of_text|> bos token and <|end_of_text|> eos token.

I just followed exactly the fastchat llama3 template into the monkeypatch to make sure the bos token is always added.

Example tokenized llama3 sft format:

[2024-05-09 14:33:44,305] [INFO] [axolotl.check_example_labels:45] [PID:250136] [RANK:0] <|begin_of_text|>(-100, 128000) <|start_header_id|>(-100, 128006) user(-100, 882) <|end_header_id|>(-100, 128007)

(-100, 271) What(-100, 3923)  is(-100, 374)  the(-100, 279)  solution(-100, 6425) ?

(-100, 1980) S(-100, 50) olve(-100, 4035)  -(-100, 482) 983(-100, 24742) *z(-100, 57513)  -(-100, 482)  (-100, 220) 381(-100, 19162) *z(-100, 57513)  -(-100, 482)  (-100, 220) 711(-100, 22375) 0(-100, 15)  =(-100, 284)  (-100, 220) 361(-100, 18277) 30(-100, 966)  -(-100, 482)  (-100, 220) 641(-100, 23525) 2(-100, 17)  for(-100, 369)  z(-100, 1167) .(-100, 13) <|eot_id|>(-100, 128009) <|start_header_id|>(-100, 128006) assistant(-100, 78191) <|end_header_id|>(-100, 128007)

(271, 271) To(1271, 1271)  solve(11886, 11886)  the(279, 279)  equation(24524, 24524)  -(482, 482) 983(24742, 24742) z(89, 89)  -(482, 482)  (220, 220) 381(19162, 19162) z(89, 89)  -(482, 482)  (220, 220) 711(22375, 22375) 0(15, 15)  =(284, 284)  (220, 220) 361(18277, 18277) 30(966, 966)  -(482, 482)  (220, 220) 641(23525, 23525) 2(17, 17)  for(369, 369)  z(1167, 1167) ,(11, 11)  we(584, 584)  can(649, 649)  start(1212, 1212)  by(555, 555)  combining(35271, 35271)  the(279, 279)  terms(3878, 3878)  with(449, 449)  '(364, 364) z(89, 89) ':(1232, 1232)  (-(10505, 10505) 983(24742, 24742) z(89, 89)  -(482, 482)  (220, 220) 381(19162, 19162) z(89, 89) )(8, 8)  =(284, 284)  (-(10505, 10505) 136(9795, 9795) 4(19, 19) z(89, 89) ).(570, 570)  Then(5112, 5112) ,(11, 11)  we(584, 584)  simplify(40821, 40821)  the(279, 279)  numerical(35876, 35876)  terms(3878, 3878) :(25, 25)  (220, 220) 361(18277, 18277) 30(966, 966)  -(482, 482)  (220, 220) 641(23525, 23525) 2(17, 17)  -(482, 482)  (220, 220) 711(22375, 22375) 0(15, 15)  =(284, 284)  (220, 220) 226(14057, 14057) 08(2318, 2318) .(13, 13)  This(1115, 1115)  gives(6835, 6835)  us(603, 603)  the(279, 279)  equation(24524, 24524)  -(482, 482) 136(9795, 9795) 4(19, 19) z(89, 89)  =(284, 284)  (220, 220) 226(14057, 14057) 08(2318, 2318) .(13, 13)  Div(8940, 8940) iding(6714, 6714)  both(2225, 2225)  sides(11314, 11314)  by(555, 555)  -(482, 482) 136(9795, 9795) 4(19, 19)  yields(36508, 36508)  z(1167, 1167)  =(284, 284)  (220, 220) 226(14057, 14057) 08(2318, 2318)  /(611, 611)  -(482, 482) 136(9795, 9795) 4(19, 19)  =(284, 284)  -(482, 482) 16(845, 845) .(13, 13) 58(2970, 2970) .(13, 13)  Therefore(15636, 15636) ,(11, 11)  the(279, 279)  solution(6425, 6425)  is(374, 374)  z(1167, 1167)  =(284, 284)  -(482, 482) 16(845, 845) .(13, 13) 58(2970, 2970) .(13, 13) <|eot_id|>(128009, 128009) <|end_of_text|>(128001, 128001)

Example tokenized llama3 DPO format:

[2024-05-09 14:58:38,664] [INFO] [axolotl.check_rl_example_labels:91] [PID:254458] [RANK:0] INPUT PROMPT: <|begin_of_text|>(128000) <|start_header_id|>(128006) user(882) <|end_header_id|>(128007)

(271) Let(10267)  $(400) a(64) \(59) star(12134)  b(293)  =(284)  \(1144) d(67) frac(38118) {\(36802) sqrt(27986) {(90) a(64) +b(36193) }}(3500) {\(36802) sqrt(27986) {(90) a(64) -b(1481) }}(3500) $.(13244)  If(1442)  $(400)  x(865)  \(1144) star(12134)  (220) 24(1187)  =(284)  (220) 7(22) $,(55976)  find(1505)  $(400) x(87) $.(13244) <|eot_id|>(128009) <|start_header_id|>(128006) assistant(78191) <|end_header_id|>(128007)

(271)


[2024-05-09 14:58:38,664] [INFO] [axolotl.check_rl_example_labels:92] [PID:254458] [RANK:0] CHOSEN RESPONSE: This(2028)  problem(3575)  involves(18065)  a(264)  new(502)  operation(5784) ,(11)  $\(59060) star(12134) $,(55976)  that(430)  is(374)  defined(4613)  in(304)  terms(3878)  of(315)  the(279)  familiar(11537)  operations(7677)  of(315)  addition(5369) ,(11)  subtraction(76340) ,(11)  and(323)  square(9518)  roots(20282) .
(627) To(1271)  solve(11886)  for(369)  $(400) x(87) $,(55976)  I(358)  need(1205)  to(311)  undo(29821)  the(279)  operation(5784)  $\(59060) star(12134) $(3)  by(555)  reversing(76283)  the(279)  order(2015)  of(315)  the(279)  steps(7504)  and(323)  applying(19486)  the(279)  inverse(29049)  operations(7677) .
(627) To(1271)  do(656)  that(430) ,(11)  I(358)  can(649)  start(1212)  by(555)  cross(5425) -m(1474) ultip(10046) lying(6852)  to(311)  get(636)  rid(9463)  of(315)  the(279)  fractions(65995) :(25)  $(400) 7(22) \(59) sqrt(27986) {x(46440) -(12) 24(1187) }(92)  =(284)  \(1144) sqrt(27986) {x(46440) +(10) 24(1187) }$(32816) .
(627) Then(12487) ,(11)  I(358)  can(649)  square(9518)  both(2225)  sides(11314)  to(311)  get(636)  rid(9463)  of(315)  the(279)  square(9518)  roots(20282) :(25)  $(400) 49(2491) (x(2120) -(12) 24(1187) )(8)  =(284)  x(865) +(10) 24(1187) $(3) .
(627) Exp(8193) anding(26673)  and(323)  simpl(15858) ifying(7922) ,(11)  I(358)  get(636)  $(400) 49(2491) x(87)  -(482)  (220) 117(8546) 6(21)  =(284)  x(865)  +(489)  (220) 24(1187) $,(55976)  or(477)  $(400) 48(2166) x(87)  =(284)  (220) 120(4364) 0(15) $(3) .
(627) Div(12792) iding(6714)  by(555)  (220) 48(2166) ,(11)  I(358)  get(636)  $(400) x(87)  =(284)  (220) 25(914) $(3) .
(627) To(1271)  check(1817)  my(856)  answer(4320) ,(11)  I(358)  can(649)  plug(20206)  it(433)  back(1203)  into(1139)  the(279)  original(4113)  equation(24524)  and(323)  see(1518)  if(422)  it(433)  satisfies(69001)  it(433) :(25)  $(400) 25(914)  \(1144) star(12134)  (220) 24(1187)  =(284)  \(1144) d(67) frac(38118) {\(36802) sqrt(27986) {(90) 25(914) +(10) 24(1187) }}(3500) {\(36802) sqrt(27986) {(90) 25(914) -(12) 24(1187) }}(3500)  =(284)  \(1144) d(67) frac(38118) {\(36802) sqrt(27986) {(90) 49(2491) }}(3500) {\(36802) sqrt(27986) {(90) 1(16) }}(3500)  =(284)  \(1144) d(67) frac(38118) {(90) 7(22) }{(15523) 1(16) }(92)  =(284)  (220) 7(22) $(3) .
(627) Indeed(44623) ,(11)  it(433)  does(1587) ,(11)  so(779)  I(358)  am(1097)  confident(16913)  that(430)  $(400) x(87)  =(284)  (220) 25(914) $(3)  is(374)  the(279)  correct(4495)  solution(6425) .
(627) #(2)  Answer(22559)

(271) 25(914) <|eot_id|>(128009) <|end_of_text|>(128001)


[2024-05-09 14:58:38,664] [INFO] [axolotl.check_rl_example_labels:93] [PID:254458] [RANK:0] REJECTED RESPONSE: We(1687)  know(1440)  that(430)  $(400) x(87) \(59) star(12134) 24(1187) =\(35533) d(67) frac(38118) {\(36802) sqrt(27986) {x(46440) +(10) 24(1187) }}(3500) {\(36802) sqrt(27986) {x(46440) -(12) 24(1187) }}(3500) =(28) 7(22) $.(13244)  Because(9393)  we(584)  cannot(4250)  take(1935)  the(279)  square(9518)  root(3789)  of(315)  a(264)  negative(8389)  number(1396)  and(323)  because(1606)  the(279)  denominator(48012)  of(315)  a(264)  fraction(19983)  cannot(4250)  be(387)  zero(7315) ,(11)  we(584)  know(1440)  that(430)  $(400) x(87) -(12) 24(1187) >(29) 0(15) $.(13244)  Thus(14636) ,(11)  a(264)  reasonable(13579)  guess(8101)  for(369)  $(400) x(87) $(3)  would(1053)  be(387)  $(400) x(87) =(28) 25(914) $.(13244)  $\(59060) d(67) frac(38118) {\(36802) sqrt(27986) {(90) 25(914) +(10) 24(1187) }}(3500) {\(36802) sqrt(27986) {(90) 25(914) -(12) 24(1187) }}(3500) =\(35533) d(67) frac(38118) {\(36802) sqrt(27986) {(90) 49(2491) }}(3500) {\(36802) sqrt(27986) {(90) 1(16) }}(3500) =(28) 7(22) $,(55976)  as(439)  desired(12974) ,(11)  so(779)  our(1057)  answer(4320)  is(374)  indeed(13118)  $(400) x(87) =\(35533) boxed(80175) {(90) 25(914) }(92) $.(13244) <|eot_id|>(128009) <|end_of_text|>(128001)

Completed tuning of a few models already with this:
https://huggingface.co/AwanLLM/Awanllm-Llama-3-8B-Dolfin-v0.3
https://huggingface.co/AwanLLM/Awanllm-Llama-3-8B-Dolfin-v0.3-DPO

Example usage

YAML config for instruct tuning using sharegpt dataset:

# Data
datasets:
  - path: /home/datasets/no-robots-sharegpt-fixed.jsonl
    type: sharegpt
    conversation: llama-3

YAML config for DPO fine tuning using sharegpt dataset:

dpo_beta: 0.1
rl: dpo

# Data
datasets:
  - ds_type: json
    data_files:
      - /home/datasets/orpo-dpo-mix-40k.jsonl
    split: train
    type: llama3.argilla_chat

winglian · 2024-05-09T17:34:30Z

thanks @Nero10578 ! let me try to sequence the various open PRs for Llama3 chat format and we'll get this merged!

Nero10578 · 2024-05-09T19:25:59Z

thanks @Nero10578 ! let me try to sequence the various open PRs for Llama3 chat format and we'll get this merged!

Cool! Let me know if I made a mistake. I have only started to get familiarized with the codebase in a day and made this PR so I might've missed something.

winglian · 2024-05-09T20:18:32Z

I'll sequence this after #1553

Nero10578 · 2024-05-10T19:23:13Z

Oh shoot I just realized I accidentally made this PR using my main branch of my fork and then synced it with the latest axolotl commit.

Nero10578 closed this May 10, 2024

Nero10578 force-pushed the main branch from 3c4c680 to b32c08f Compare May 10, 2024 18:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add llama3 format for sft (sharegpt) and dpo #1605

add llama3 format for sft (sharegpt) and dpo #1605

Nero10578 commented May 9, 2024

winglian commented May 9, 2024

Nero10578 commented May 9, 2024 •

edited

Loading

winglian commented May 9, 2024

Nero10578 commented May 10, 2024

add llama3 format for sft (sharegpt) and dpo #1605

add llama3 format for sft (sharegpt) and dpo #1605

Conversation

Nero10578 commented May 9, 2024

Description

How has this been tested?

Example usage

winglian commented May 9, 2024

Nero10578 commented May 9, 2024 • edited Loading

winglian commented May 9, 2024

Nero10578 commented May 10, 2024

Nero10578 commented May 9, 2024 •

edited

Loading