Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Llama 3 DPO Training and Fix Llama 3 special tokens in examples #1607

Closed

Conversation

Nero10578
Copy link
Contributor

Description

Added Llama 3 DPO tuning and fixed the examples config for Llama 3 to now include eos token as well as the pad token.
As @winglian mentioned here #1553 (comment)

How has this been tested?

Tested the tokenization with --debug for both sharegpt with llama-3 conversation options. The datasets seem to get processed properly into the llama 3 format for the DPO training and the training is successful.

Example tokenized llama3 DPO format:

[2024-05-09 14:58:38,664] [INFO] [axolotl.check_rl_example_labels:91] [PID:254458] [RANK:0] INPUT PROMPT: <|begin_of_text|>(128000) <|start_header_id|>(128006) user(882) <|end_header_id|>(128007)

(271) Let(10267)  $(400) a(64) \(59) star(12134)  b(293)  =(284)  \(1144) d(67) frac(38118) {\(36802) sqrt(27986) {(90) a(64) +b(36193) }}(3500) {\(36802) sqrt(27986) {(90) a(64) -b(1481) }}(3500) $.(13244)  If(1442)  $(400)  x(865)  \(1144) star(12134)  (220) 24(1187)  =(284)  (220) 7(22) $,(55976)  find(1505)  $(400) x(87) $.(13244) <|eot_id|>(128009) <|start_header_id|>(128006) assistant(78191) <|end_header_id|>(128007)

(271)


[2024-05-09 14:58:38,664] [INFO] [axolotl.check_rl_example_labels:92] [PID:254458] [RANK:0] CHOSEN RESPONSE: This(2028)  problem(3575)  involves(18065)  a(264)  new(502)  operation(5784) ,(11)  $\(59060) star(12134) $,(55976)  that(430)  is(374)  defined(4613)  in(304)  terms(3878)  of(315)  the(279)  familiar(11537)  operations(7677)  of(315)  addition(5369) ,(11)  subtraction(76340) ,(11)  and(323)  square(9518)  roots(20282) .
(627) To(1271)  solve(11886)  for(369)  $(400) x(87) $,(55976)  I(358)  need(1205)  to(311)  undo(29821)  the(279)  operation(5784)  $\(59060) star(12134) $(3)  by(555)  reversing(76283)  the(279)  order(2015)  of(315)  the(279)  steps(7504)  and(323)  applying(19486)  the(279)  inverse(29049)  operations(7677) .
(627) To(1271)  do(656)  that(430) ,(11)  I(358)  can(649)  start(1212)  by(555)  cross(5425) -m(1474) ultip(10046) lying(6852)  to(311)  get(636)  rid(9463)  of(315)  the(279)  fractions(65995) :(25)  $(400) 7(22) \(59) sqrt(27986) {x(46440) -(12) 24(1187) }(92)  =(284)  \(1144) sqrt(27986) {x(46440) +(10) 24(1187) }$(32816) .
(627) Then(12487) ,(11)  I(358)  can(649)  square(9518)  both(2225)  sides(11314)  to(311)  get(636)  rid(9463)  of(315)  the(279)  square(9518)  roots(20282) :(25)  $(400) 49(2491) (x(2120) -(12) 24(1187) )(8)  =(284)  x(865) +(10) 24(1187) $(3) .
(627) Exp(8193) anding(26673)  and(323)  simpl(15858) ifying(7922) ,(11)  I(358)  get(636)  $(400) 49(2491) x(87)  -(482)  (220) 117(8546) 6(21)  =(284)  x(865)  +(489)  (220) 24(1187) $,(55976)  or(477)  $(400) 48(2166) x(87)  =(284)  (220) 120(4364) 0(15) $(3) .
(627) Div(12792) iding(6714)  by(555)  (220) 48(2166) ,(11)  I(358)  get(636)  $(400) x(87)  =(284)  (220) 25(914) $(3) .
(627) To(1271)  check(1817)  my(856)  answer(4320) ,(11)  I(358)  can(649)  plug(20206)  it(433)  back(1203)  into(1139)  the(279)  original(4113)  equation(24524)  and(323)  see(1518)  if(422)  it(433)  satisfies(69001)  it(433) :(25)  $(400) 25(914)  \(1144) star(12134)  (220) 24(1187)  =(284)  \(1144) d(67) frac(38118) {\(36802) sqrt(27986) {(90) 25(914) +(10) 24(1187) }}(3500) {\(36802) sqrt(27986) {(90) 25(914) -(12) 24(1187) }}(3500)  =(284)  \(1144) d(67) frac(38118) {\(36802) sqrt(27986) {(90) 49(2491) }}(3500) {\(36802) sqrt(27986) {(90) 1(16) }}(3500)  =(284)  \(1144) d(67) frac(38118) {(90) 7(22) }{(15523) 1(16) }(92)  =(284)  (220) 7(22) $(3) .
(627) Indeed(44623) ,(11)  it(433)  does(1587) ,(11)  so(779)  I(358)  am(1097)  confident(16913)  that(430)  $(400) x(87)  =(284)  (220) 25(914) $(3)  is(374)  the(279)  correct(4495)  solution(6425) .
(627) #(2)  Answer(22559)

(271) 25(914) <|eot_id|>(128009) <|end_of_text|>(128001)


[2024-05-09 14:58:38,664] [INFO] [axolotl.check_rl_example_labels:93] [PID:254458] [RANK:0] REJECTED RESPONSE: We(1687)  know(1440)  that(430)  $(400) x(87) \(59) star(12134) 24(1187) =\(35533) d(67) frac(38118) {\(36802) sqrt(27986) {x(46440) +(10) 24(1187) }}(3500) {\(36802) sqrt(27986) {x(46440) -(12) 24(1187) }}(3500) =(28) 7(22) $.(13244)  Because(9393)  we(584)  cannot(4250)  take(1935)  the(279)  square(9518)  root(3789)  of(315)  a(264)  negative(8389)  number(1396)  and(323)  because(1606)  the(279)  denominator(48012)  of(315)  a(264)  fraction(19983)  cannot(4250)  be(387)  zero(7315) ,(11)  we(584)  know(1440)  that(430)  $(400) x(87) -(12) 24(1187) >(29) 0(15) $.(13244)  Thus(14636) ,(11)  a(264)  reasonable(13579)  guess(8101)  for(369)  $(400) x(87) $(3)  would(1053)  be(387)  $(400) x(87) =(28) 25(914) $.(13244)  $\(59060) d(67) frac(38118) {\(36802) sqrt(27986) {(90) 25(914) +(10) 24(1187) }}(3500) {\(36802) sqrt(27986) {(90) 25(914) -(12) 24(1187) }}(3500) =\(35533) d(67) frac(38118) {\(36802) sqrt(27986) {(90) 49(2491) }}(3500) {\(36802) sqrt(27986) {(90) 1(16) }}(3500) =(28) 7(22) $,(55976)  as(439)  desired(12974) ,(11)  so(779)  our(1057)  answer(4320)  is(374)  indeed(13118)  $(400) x(87) =\(35533) boxed(80175) {(90) 25(914) }(92) $.(13244) <|eot_id|>(128009) <|end_of_text|>(128001)

Example usage

YAML config for DPO fine tuning using sharegpt dataset:

dpo_beta: 0.1
rl: dpo

# Data
datasets:
  - ds_type: json
    data_files:
      - /home/datasets/orpo-dpo-mix-40k.jsonl
    split: train
    type: llama3.argilla_chat

@Nero10578
Copy link
Contributor Author

Sorry I messed up the previous pull request by doing a PR from the main of my axolotl fork instead of the branch. Which caused it to get deleted when I synced my fork with the latest axolotl commit.

@winglian
Copy link
Collaborator

@Nero10578 there's a lot of commits that are making it difficult to rebase your branch against main. Are you okay if I squash all the commits in your branch so it's easier to rebase?

@Nero10578
Copy link
Contributor Author

Nero10578 commented May 11, 2024

@Nero10578 there's a lot of commits that are making it difficult to rebase your branch against main. Are you okay if I squash all the commits in your branch so it's easier to rebase?

Oh I see. Yea that's fine I guess. As long as it works.

I was working on the code on my main PC and then pushing it to github and pulling onto my training PC to test. Hence the many commits...sorry about that.

@winglian winglian mentioned this pull request May 11, 2024
@winglian
Copy link
Collaborator

Hey @Nero10578 , I merged in the major changes of your PR in #1610 by cherry-picking your commits. I'm going to close this PR and if you could submit a new PR with the example YAML, that would be much appreciated. Thanks for your help!

@winglian winglian closed this May 14, 2024
@Nero10578
Copy link
Contributor Author

Hey @Nero10578 , I merged in the major changes of your PR in #1610 by cherry-picking your commits. I'm going to close this PR and if you could submit a new PR with the example YAML, that would be much appreciated. Thanks for your help!

Sounds good! Will set some time to test some things out and make the example YAML.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants