Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add llama3 format for sft (sharegpt) and dpo #1605

Closed
wants to merge 0 commits into from

Conversation

Nero10578
Copy link
Contributor

I just wanted to get llama 3 chat format working for both instruct tuning using a sharegpt dataset and DPO tuning. I made minimal changes that made this work. Missing are adding llama-3 chat template to the prompters.py and chat_templates.py

Description

I saw the other adding llama 3 PRs but they registered new chat templates into fastchat which were not necessary with their latest commits since llama 3 format is already in fastchat now. Those other PR also did not include a DPO tuning for llama 3. I have successfully created DPO tuned llama 3 models with my PR now.

How has this been tested?

Tested the tokenization with --debug for both sharegpt with llama-3 conversation options. The datasets seem to get processed properly into the llama 3 format. Including the <|begin_of_text|> bos token and <|end_of_text|> eos token.

I just followed exactly the fastchat llama3 template into the monkeypatch to make sure the bos token is always added.

Example tokenized llama3 sft format:

[2024-05-09 14:33:44,305] [INFO] [axolotl.check_example_labels:45] [PID:250136] [RANK:0] <|begin_of_text|>(-100, 128000) <|start_header_id|>(-100, 128006) user(-100, 882) <|end_header_id|>(-100, 128007)

(-100, 271) What(-100, 3923)  is(-100, 374)  the(-100, 279)  solution(-100, 6425) ?

(-100, 1980) S(-100, 50) olve(-100, 4035)  -(-100, 482) 983(-100, 24742) *z(-100, 57513)  -(-100, 482)  (-100, 220) 381(-100, 19162) *z(-100, 57513)  -(-100, 482)  (-100, 220) 711(-100, 22375) 0(-100, 15)  =(-100, 284)  (-100, 220) 361(-100, 18277) 30(-100, 966)  -(-100, 482)  (-100, 220) 641(-100, 23525) 2(-100, 17)  for(-100, 369)  z(-100, 1167) .(-100, 13) <|eot_id|>(-100, 128009) <|start_header_id|>(-100, 128006) assistant(-100, 78191) <|end_header_id|>(-100, 128007)

(271, 271) To(1271, 1271)  solve(11886, 11886)  the(279, 279)  equation(24524, 24524)  -(482, 482) 983(24742, 24742) z(89, 89)  -(482, 482)  (220, 220) 381(19162, 19162) z(89, 89)  -(482, 482)  (220, 220) 711(22375, 22375) 0(15, 15)  =(284, 284)  (220, 220) 361(18277, 18277) 30(966, 966)  -(482, 482)  (220, 220) 641(23525, 23525) 2(17, 17)  for(369, 369)  z(1167, 1167) ,(11, 11)  we(584, 584)  can(649, 649)  start(1212, 1212)  by(555, 555)  combining(35271, 35271)  the(279, 279)  terms(3878, 3878)  with(449, 449)  '(364, 364) z(89, 89) ':(1232, 1232)  (-(10505, 10505) 983(24742, 24742) z(89, 89)  -(482, 482)  (220, 220) 381(19162, 19162) z(89, 89) )(8, 8)  =(284, 284)  (-(10505, 10505) 136(9795, 9795) 4(19, 19) z(89, 89) ).(570, 570)  Then(5112, 5112) ,(11, 11)  we(584, 584)  simplify(40821, 40821)  the(279, 279)  numerical(35876, 35876)  terms(3878, 3878) :(25, 25)  (220, 220) 361(18277, 18277) 30(966, 966)  -(482, 482)  (220, 220) 641(23525, 23525) 2(17, 17)  -(482, 482)  (220, 220) 711(22375, 22375) 0(15, 15)  =(284, 284)  (220, 220) 226(14057, 14057) 08(2318, 2318) .(13, 13)  This(1115, 1115)  gives(6835, 6835)  us(603, 603)  the(279, 279)  equation(24524, 24524)  -(482, 482) 136(9795, 9795) 4(19, 19) z(89, 89)  =(284, 284)  (220, 220) 226(14057, 14057) 08(2318, 2318) .(13, 13)  Div(8940, 8940) iding(6714, 6714)  both(2225, 2225)  sides(11314, 11314)  by(555, 555)  -(482, 482) 136(9795, 9795) 4(19, 19)  yields(36508, 36508)  z(1167, 1167)  =(284, 284)  (220, 220) 226(14057, 14057) 08(2318, 2318)  /(611, 611)  -(482, 482) 136(9795, 9795) 4(19, 19)  =(284, 284)  -(482, 482) 16(845, 845) .(13, 13) 58(2970, 2970) .(13, 13)  Therefore(15636, 15636) ,(11, 11)  the(279, 279)  solution(6425, 6425)  is(374, 374)  z(1167, 1167)  =(284, 284)  -(482, 482) 16(845, 845) .(13, 13) 58(2970, 2970) .(13, 13) <|eot_id|>(128009, 128009) <|end_of_text|>(128001, 128001)

Example tokenized llama3 DPO format:

[2024-05-09 14:58:38,664] [INFO] [axolotl.check_rl_example_labels:91] [PID:254458] [RANK:0] INPUT PROMPT: <|begin_of_text|>(128000) <|start_header_id|>(128006) user(882) <|end_header_id|>(128007)

(271) Let(10267)  $(400) a(64) \(59) star(12134)  b(293)  =(284)  \(1144) d(67) frac(38118) {\(36802) sqrt(27986) {(90) a(64) +b(36193) }}(3500) {\(36802) sqrt(27986) {(90) a(64) -b(1481) }}(3500) $.(13244)  If(1442)  $(400)  x(865)  \(1144) star(12134)  (220) 24(1187)  =(284)  (220) 7(22) $,(55976)  find(1505)  $(400) x(87) $.(13244) <|eot_id|>(128009) <|start_header_id|>(128006) assistant(78191) <|end_header_id|>(128007)

(271)


[2024-05-09 14:58:38,664] [INFO] [axolotl.check_rl_example_labels:92] [PID:254458] [RANK:0] CHOSEN RESPONSE: This(2028)  problem(3575)  involves(18065)  a(264)  new(502)  operation(5784) ,(11)  $\(59060) star(12134) $,(55976)  that(430)  is(374)  defined(4613)  in(304)  terms(3878)  of(315)  the(279)  familiar(11537)  operations(7677)  of(315)  addition(5369) ,(11)  subtraction(76340) ,(11)  and(323)  square(9518)  roots(20282) .
(627) To(1271)  solve(11886)  for(369)  $(400) x(87) $,(55976)  I(358)  need(1205)  to(311)  undo(29821)  the(279)  operation(5784)  $\(59060) star(12134) $(3)  by(555)  reversing(76283)  the(279)  order(2015)  of(315)  the(279)  steps(7504)  and(323)  applying(19486)  the(279)  inverse(29049)  operations(7677) .
(627) To(1271)  do(656)  that(430) ,(11)  I(358)  can(649)  start(1212)  by(555)  cross(5425) -m(1474) ultip(10046) lying(6852)  to(311)  get(636)  rid(9463)  of(315)  the(279)  fractions(65995) :(25)  $(400) 7(22) \(59) sqrt(27986) {x(46440) -(12) 24(1187) }(92)  =(284)  \(1144) sqrt(27986) {x(46440) +(10) 24(1187) }$(32816) .
(627) Then(12487) ,(11)  I(358)  can(649)  square(9518)  both(2225)  sides(11314)  to(311)  get(636)  rid(9463)  of(315)  the(279)  square(9518)  roots(20282) :(25)  $(400) 49(2491) (x(2120) -(12) 24(1187) )(8)  =(284)  x(865) +(10) 24(1187) $(3) .
(627) Exp(8193) anding(26673)  and(323)  simpl(15858) ifying(7922) ,(11)  I(358)  get(636)  $(400) 49(2491) x(87)  -(482)  (220) 117(8546) 6(21)  =(284)  x(865)  +(489)  (220) 24(1187) $,(55976)  or(477)  $(400) 48(2166) x(87)  =(284)  (220) 120(4364) 0(15) $(3) .
(627) Div(12792) iding(6714)  by(555)  (220) 48(2166) ,(11)  I(358)  get(636)  $(400) x(87)  =(284)  (220) 25(914) $(3) .
(627) To(1271)  check(1817)  my(856)  answer(4320) ,(11)  I(358)  can(649)  plug(20206)  it(433)  back(1203)  into(1139)  the(279)  original(4113)  equation(24524)  and(323)  see(1518)  if(422)  it(433)  satisfies(69001)  it(433) :(25)  $(400) 25(914)  \(1144) star(12134)  (220) 24(1187)  =(284)  \(1144) d(67) frac(38118) {\(36802) sqrt(27986) {(90) 25(914) +(10) 24(1187) }}(3500) {\(36802) sqrt(27986) {(90) 25(914) -(12) 24(1187) }}(3500)  =(284)  \(1144) d(67) frac(38118) {\(36802) sqrt(27986) {(90) 49(2491) }}(3500) {\(36802) sqrt(27986) {(90) 1(16) }}(3500)  =(284)  \(1144) d(67) frac(38118) {(90) 7(22) }{(15523) 1(16) }(92)  =(284)  (220) 7(22) $(3) .
(627) Indeed(44623) ,(11)  it(433)  does(1587) ,(11)  so(779)  I(358)  am(1097)  confident(16913)  that(430)  $(400) x(87)  =(284)  (220) 25(914) $(3)  is(374)  the(279)  correct(4495)  solution(6425) .
(627) #(2)  Answer(22559)

(271) 25(914) <|eot_id|>(128009) <|end_of_text|>(128001)


[2024-05-09 14:58:38,664] [INFO] [axolotl.check_rl_example_labels:93] [PID:254458] [RANK:0] REJECTED RESPONSE: We(1687)  know(1440)  that(430)  $(400) x(87) \(59) star(12134) 24(1187) =\(35533) d(67) frac(38118) {\(36802) sqrt(27986) {x(46440) +(10) 24(1187) }}(3500) {\(36802) sqrt(27986) {x(46440) -(12) 24(1187) }}(3500) =(28) 7(22) $.(13244)  Because(9393)  we(584)  cannot(4250)  take(1935)  the(279)  square(9518)  root(3789)  of(315)  a(264)  negative(8389)  number(1396)  and(323)  because(1606)  the(279)  denominator(48012)  of(315)  a(264)  fraction(19983)  cannot(4250)  be(387)  zero(7315) ,(11)  we(584)  know(1440)  that(430)  $(400) x(87) -(12) 24(1187) >(29) 0(15) $.(13244)  Thus(14636) ,(11)  a(264)  reasonable(13579)  guess(8101)  for(369)  $(400) x(87) $(3)  would(1053)  be(387)  $(400) x(87) =(28) 25(914) $.(13244)  $\(59060) d(67) frac(38118) {\(36802) sqrt(27986) {(90) 25(914) +(10) 24(1187) }}(3500) {\(36802) sqrt(27986) {(90) 25(914) -(12) 24(1187) }}(3500) =\(35533) d(67) frac(38118) {\(36802) sqrt(27986) {(90) 49(2491) }}(3500) {\(36802) sqrt(27986) {(90) 1(16) }}(3500) =(28) 7(22) $,(55976)  as(439)  desired(12974) ,(11)  so(779)  our(1057)  answer(4320)  is(374)  indeed(13118)  $(400) x(87) =\(35533) boxed(80175) {(90) 25(914) }(92) $.(13244) <|eot_id|>(128009) <|end_of_text|>(128001)

Completed tuning of a few models already with this:
https://huggingface.co/AwanLLM/Awanllm-Llama-3-8B-Dolfin-v0.3
https://huggingface.co/AwanLLM/Awanllm-Llama-3-8B-Dolfin-v0.3-DPO

Example usage

YAML config for instruct tuning using sharegpt dataset:

# Data
datasets:
  - path: /home/datasets/no-robots-sharegpt-fixed.jsonl
    type: sharegpt
    conversation: llama-3

YAML config for DPO fine tuning using sharegpt dataset:

dpo_beta: 0.1
rl: dpo

# Data
datasets:
  - ds_type: json
    data_files:
      - /home/datasets/orpo-dpo-mix-40k.jsonl
    split: train
    type: llama3.argilla_chat

@winglian
Copy link
Collaborator

winglian commented May 9, 2024

thanks @Nero10578 ! let me try to sequence the various open PRs for Llama3 chat format and we'll get this merged!

@Nero10578
Copy link
Contributor Author

Nero10578 commented May 9, 2024

thanks @Nero10578 ! let me try to sequence the various open PRs for Llama3 chat format and we'll get this merged!

Cool! Let me know if I made a mistake. I have only started to get familiarized with the codebase in a day and made this PR so I might've missed something.

@winglian
Copy link
Collaborator

winglian commented May 9, 2024

I'll sequence this after #1553

@Nero10578
Copy link
Contributor Author

Oh shoot I just realized I accidentally made this PR using my main branch of my fork and then synced it with the latest axolotl commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants