Add Llama 3 DPO Training and Fix Llama 3 special tokens in examples #1607

Nero10578 · 2024-05-10T22:12:07Z

Description

Added Llama 3 DPO tuning and fixed the examples config for Llama 3 to now include eos token as well as the pad token.
As @winglian mentioned here #1553 (comment)

How has this been tested?

Tested the tokenization with --debug for both sharegpt with llama-3 conversation options. The datasets seem to get processed properly into the llama 3 format for the DPO training and the training is successful.

Example tokenized llama3 DPO format:

[2024-05-09 14:58:38,664] [INFO] [axolotl.check_rl_example_labels:91] [PID:254458] [RANK:0] INPUT PROMPT: <|begin_of_text|>(128000) <|start_header_id|>(128006) user(882) <|end_header_id|>(128007)

(271) Let(10267)  $(400) a(64) \(59) star(12134)  b(293)  =(284)  \(1144) d(67) frac(38118) {\(36802) sqrt(27986) {(90) a(64) +b(36193) }}(3500) {\(36802) sqrt(27986) {(90) a(64) -b(1481) }}(3500) $.(13244)  If(1442)  $(400)  x(865)  \(1144) star(12134)  (220) 24(1187)  =(284)  (220) 7(22) $,(55976)  find(1505)  $(400) x(87) $.(13244) <|eot_id|>(128009) <|start_header_id|>(128006) assistant(78191) <|end_header_id|>(128007)

(271)


[2024-05-09 14:58:38,664] [INFO] [axolotl.check_rl_example_labels:92] [PID:254458] [RANK:0] CHOSEN RESPONSE: This(2028)  problem(3575)  involves(18065)  a(264)  new(502)  operation(5784) ,(11)  $\(59060) star(12134) $,(55976)  that(430)  is(374)  defined(4613)  in(304)  terms(3878)  of(315)  the(279)  familiar(11537)  operations(7677)  of(315)  addition(5369) ,(11)  subtraction(76340) ,(11)  and(323)  square(9518)  roots(20282) .
(627) To(1271)  solve(11886)  for(369)  $(400) x(87) $,(55976)  I(358)  need(1205)  to(311)  undo(29821)  the(279)  operation(5784)  $\(59060) star(12134) $(3)  by(555)  reversing(76283)  the(279)  order(2015)  of(315)  the(279)  steps(7504)  and(323)  applying(19486)  the(279)  inverse(29049)  operations(7677) .
(627) To(1271)  do(656)  that(430) ,(11)  I(358)  can(649)  start(1212)  by(555)  cross(5425) -m(1474) ultip(10046) lying(6852)  to(311)  get(636)  rid(9463)  of(315)  the(279)  fractions(65995) :(25)  $(400) 7(22) \(59) sqrt(27986) {x(46440) -(12) 24(1187) }(92)  =(284)  \(1144) sqrt(27986) {x(46440) +(10) 24(1187) }$(32816) .
(627) Then(12487) ,(11)  I(358)  can(649)  square(9518)  both(2225)  sides(11314)  to(311)  get(636)  rid(9463)  of(315)  the(279)  square(9518)  roots(20282) :(25)  $(400) 49(2491) (x(2120) -(12) 24(1187) )(8)  =(284)  x(865) +(10) 24(1187) $(3) .
(627) Exp(8193) anding(26673)  and(323)  simpl(15858) ifying(7922) ,(11)  I(358)  get(636)  $(400) 49(2491) x(87)  -(482)  (220) 117(8546) 6(21)  =(284)  x(865)  +(489)  (220) 24(1187) $,(55976)  or(477)  $(400) 48(2166) x(87)  =(284)  (220) 120(4364) 0(15) $(3) .
(627) Div(12792) iding(6714)  by(555)  (220) 48(2166) ,(11)  I(358)  get(636)  $(400) x(87)  =(284)  (220) 25(914) $(3) .
(627) To(1271)  check(1817)  my(856)  answer(4320) ,(11)  I(358)  can(649)  plug(20206)  it(433)  back(1203)  into(1139)  the(279)  original(4113)  equation(24524)  and(323)  see(1518)  if(422)  it(433)  satisfies(69001)  it(433) :(25)  $(400) 25(914)  \(1144) star(12134)  (220) 24(1187)  =(284)  \(1144) d(67) frac(38118) {\(36802) sqrt(27986) {(90) 25(914) +(10) 24(1187) }}(3500) {\(36802) sqrt(27986) {(90) 25(914) -(12) 24(1187) }}(3500)  =(284)  \(1144) d(67) frac(38118) {\(36802) sqrt(27986) {(90) 49(2491) }}(3500) {\(36802) sqrt(27986) {(90) 1(16) }}(3500)  =(284)  \(1144) d(67) frac(38118) {(90) 7(22) }{(15523) 1(16) }(92)  =(284)  (220) 7(22) $(3) .
(627) Indeed(44623) ,(11)  it(433)  does(1587) ,(11)  so(779)  I(358)  am(1097)  confident(16913)  that(430)  $(400) x(87)  =(284)  (220) 25(914) $(3)  is(374)  the(279)  correct(4495)  solution(6425) .
(627) #(2)  Answer(22559)

(271) 25(914) <|eot_id|>(128009) <|end_of_text|>(128001)


[2024-05-09 14:58:38,664] [INFO] [axolotl.check_rl_example_labels:93] [PID:254458] [RANK:0] REJECTED RESPONSE: We(1687)  know(1440)  that(430)  $(400) x(87) \(59) star(12134) 24(1187) =\(35533) d(67) frac(38118) {\(36802) sqrt(27986) {x(46440) +(10) 24(1187) }}(3500) {\(36802) sqrt(27986) {x(46440) -(12) 24(1187) }}(3500) =(28) 7(22) $.(13244)  Because(9393)  we(584)  cannot(4250)  take(1935)  the(279)  square(9518)  root(3789)  of(315)  a(264)  negative(8389)  number(1396)  and(323)  because(1606)  the(279)  denominator(48012)  of(315)  a(264)  fraction(19983)  cannot(4250)  be(387)  zero(7315) ,(11)  we(584)  know(1440)  that(430)  $(400) x(87) -(12) 24(1187) >(29) 0(15) $.(13244)  Thus(14636) ,(11)  a(264)  reasonable(13579)  guess(8101)  for(369)  $(400) x(87) $(3)  would(1053)  be(387)  $(400) x(87) =(28) 25(914) $.(13244)  $\(59060) d(67) frac(38118) {\(36802) sqrt(27986) {(90) 25(914) +(10) 24(1187) }}(3500) {\(36802) sqrt(27986) {(90) 25(914) -(12) 24(1187) }}(3500) =\(35533) d(67) frac(38118) {\(36802) sqrt(27986) {(90) 49(2491) }}(3500) {\(36802) sqrt(27986) {(90) 1(16) }}(3500) =(28) 7(22) $,(55976)  as(439)  desired(12974) ,(11)  so(779)  our(1057)  answer(4320)  is(374)  indeed(13118)  $(400) x(87) =\(35533) boxed(80175) {(90) 25(914) }(92) $.(13244) <|eot_id|>(128009) <|end_of_text|>(128001)

Example usage

YAML config for DPO fine tuning using sharegpt dataset:

dpo_beta: 0.1
rl: dpo

# Data
datasets:
  - ds_type: json
    data_files:
      - /home/datasets/orpo-dpo-mix-40k.jsonl
    split: train
    type: llama3.argilla_chat

add dpo llama3

Add llama3

Nero10578 · 2024-05-10T22:12:55Z

Sorry I messed up the previous pull request by doing a PR from the main of my axolotl fork instead of the branch. Which caused it to get deleted when I synced my fork with the latest axolotl commit.

winglian · 2024-05-11T04:12:54Z

@Nero10578 there's a lot of commits that are making it difficult to rebase your branch against main. Are you okay if I squash all the commits in your branch so it's easier to rebase?

Nero10578 · 2024-05-11T12:32:00Z

@Nero10578 there's a lot of commits that are making it difficult to rebase your branch against main. Are you okay if I squash all the commits in your branch so it's easier to rebase?

Oh I see. Yea that's fine I guess. As long as it works.

I was working on the code on my main PC and then pushing it to github and pulling onto my training PC to test. Hence the many commits...sorry about that.

winglian · 2024-05-14T12:16:59Z

Hey @Nero10578 , I merged in the major changes of your PR in #1610 by cherry-picking your commits. I'm going to close this PR and if you could submit a new PR with the example YAML, that would be much appreciated. Thanks for your help!

Nero10578 · 2024-05-14T13:28:19Z

Hey @Nero10578 , I merged in the major changes of your PR in #1610 by cherry-picking your commits. I'm going to close this PR and if you could submit a new PR with the example YAML, that would be much appreciated. Thanks for your help!

Sounds good! Will set some time to test some things out and make the example YAML.

Nero10578 and others added 12 commits May 8, 2024 20:22

add dpo llama3

2b438c6

Merge pull request #1 from Nero10578/dpo-llama3

9ec8565

add dpo llama3

newer fastchat

4b74bfc

add llama3 to monkeypatch

1cbd655

add-llama3-monkeypatch

5b9825d

Merge pull request #2 from Nero10578/add-llama3

79c86a3

Add llama3

fix dpo bos and eos

9bd2085

Merge branch 'main' into fix-eos-and-bos-on-dpo

fa03349

remove-fastchat

ff5feab

update examples

b849d06

add back pad token to example

a219490

fix example configs special tokens

4fc18e5

winglian mentioned this pull request May 11, 2024

Llama3 dpo #1610

Merged

winglian closed this May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Llama 3 DPO Training and Fix Llama 3 special tokens in examples #1607

Add Llama 3 DPO Training and Fix Llama 3 special tokens in examples #1607

Nero10578 commented May 10, 2024

Nero10578 commented May 10, 2024

winglian commented May 11, 2024

Nero10578 commented May 11, 2024 •

edited

Loading

winglian commented May 14, 2024

Nero10578 commented May 14, 2024

Add Llama 3 DPO Training and Fix Llama 3 special tokens in examples #1607

Add Llama 3 DPO Training and Fix Llama 3 special tokens in examples #1607

Conversation

Nero10578 commented May 10, 2024

Description

How has this been tested?

Example usage

Nero10578 commented May 10, 2024

winglian commented May 11, 2024

Nero10578 commented May 11, 2024 • edited Loading

winglian commented May 14, 2024

Nero10578 commented May 14, 2024

Nero10578 commented May 11, 2024 •

edited

Loading