Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small fixes to online dpo example #1879

Merged
merged 1 commit into from
Jul 26, 2024
Merged

Small fixes to online dpo example #1879

merged 1 commit into from
Jul 26, 2024

Conversation

edbeeching
Copy link
Collaborator

#1864 updated configs but missed a couple of changes.

@edbeeching edbeeching requested a review from kashif July 26, 2024 07:02
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@kashif
Copy link
Collaborator

kashif commented Jul 26, 2024

ah right.. i thought it was ok to have a different family of reference model, but yeah makes sense to also have the reference model be the original one... approving

@edbeeching
Copy link
Collaborator Author

ah right.. i thought it was ok to have a different family of reference model, but yeah makes sense to also have the reference model be the original one... approving

yes also the property sft_model_path no longer exists in the config

@edbeeching edbeeching merged commit 74e54b5 into main Jul 26, 2024
3 of 10 checks passed
@edbeeching edbeeching deleted the update-odpo-example branch July 26, 2024 07:36
@kashif
Copy link
Collaborator

kashif commented Jul 26, 2024

ah i see it exists in the OnPolicyConfig and the OnlineDPOConfig inherits from it...

qgallouedec added a commit that referenced this pull request Jul 28, 2024
commit 8bd2ab8
Author: Quentin Gallouédec <[email protected]>
Date:   Sun Jul 28 14:06:19 2024 +0200

    Refactor judges (#1856)

    * BaseJudge -> BasePairwiseJudge

    * hf judge asyncio

    * refactor judges

    * doc

    * doc

    * doc

    * memeber judge

    * :inherited-members:

    * :inherited-members:

    * doc

    * give up

    * judge tldr with judge class

    * fix rank in multithread

    * format

    * improve doc

    * update doc

    * typo doc

    * doc online dpo

    * Update judge_tldr.py

    ---------

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 82b07d6
Author: Quentin Gallouédec <[email protected]>
Date:   Fri Jul 26 11:43:48 2024 +0200

    Llama in modelling value head tests (#1878)

commit 72bf6c2
Author: Quentin Gallouédec <[email protected]>
Date:   Fri Jul 26 11:33:07 2024 +0200

    Skip BigBird save and load test until next transformers version (#1874)

commit 74e54b5
Author: Edward Beeching <[email protected]>
Date:   Fri Jul 26 09:36:25 2024 +0200

    fix online dpo example (#1879)

commit 3930973
Author: Rishav Dash <[email protected]>
Date:   Thu Jul 25 14:17:37 2024 +0530

    Bug Fix while training using SFTTrainer with DataCollatorForCompletionOnlyLM (#1861)

    * Bug Fix while training using SFTTrainer with DataCollatorForCompletionOnlyLM

    Added ```dataset_text_field``` in the SFTConfig while training

    * Update docs/source/sft_trainer.mdx

    ---------

    Co-authored-by: Kashif Rasul <[email protected]>

commit db8e09e
Author: Rishav Dash <[email protected]>
Date:   Thu Jul 25 14:06:57 2024 +0530

    Import missing ```setup_chat_format``` (#1862)

commit 1dae55f
Author: elie <[email protected]>
Date:   Thu Jul 25 10:27:34 2024 +0200

    add fsdp_qlora config and bnb_4bit_quant_storage (#1863)

commit c8cef79
Author: Quentin Gallouédec <[email protected]>
Date:   Wed Jul 24 21:06:57 2024 +0200

    arXiv to HF Papers (#1870)

commit 7dcf437
Author: Kashif Rasul <[email protected]>
Date:   Wed Jul 24 12:27:50 2024 +0200

    [online-DPO] online dpo cleanups (#1864)

    * online dpo cleanups

    * remove unused self.policy

    * add OnlineDPOTrainer and config to __init__.py

    * import from trainer

    * online dpo test

    * rename policy to model and ref_policy to ref_model

    * renamed internally

    * formatting

commit 4e85bd7
Author: Costa Huang <[email protected]>
Date:   Thu Jul 18 14:35:31 2024 -0400

    Online DPO and Online trainer refactor (#1809)

    * online dpo trainer based on rloo trainer

    * push changes

    * refactor

    * use `batch_generation` method

    * precommit

    * remove breakpoint()

    * quick refactor

    * push the current changes

    * quick change

    * refactor

    * use the config name as the experiment name

    * fix logging

    * update online DPO docs

    * push docs

    * increment global step so tensorboard works again.

    * precommit

    * remove unused common online trainer

    * add online DPO docs

    * quick refactor

    * push changes

    * Update docs/source/online_dpo_trainer.md

    Co-authored-by: Quentin Gallouédec <[email protected]>

    ---------

    Co-authored-by: Michael Noukhovitch <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit c9d5636
Author: Quentin Gallouédec <[email protected]>
Date:   Thu Jul 18 18:28:49 2024 +0200

    rm token (#1852)
kashif pushed a commit to claralp/trl that referenced this pull request Jul 28, 2024
qgallouedec added a commit that referenced this pull request Jul 30, 2024
commit 890232f
Author: Quentin Gallouédec <[email protected]>
Date:   Tue Jul 30 14:29:47 2024 +0200

    update example overview (#1883)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 9929370
Author: Clara Pohland <[email protected]>
Date:   Sun Jul 28 21:10:08 2024 +0200

    Move BCO to separate BCOTrainer with fixes (#1869)

    * kto_trainer: skip KL data for BCO

    * kto_trainer: BCO allow no positives or no negatives in batch

    * kto_trainer: make RunningMoments object serializable

    * add BCOTrainer

    * fix BCO UDM for not interleaved data

    * kto_trainer: remove unused UDM part

    * bco_trainer: add tests and docs, minor fixes

    * code style fixes

    * Update docs/source/bco_trainer.mdx

    Co-authored-by: Kashif Rasul <[email protected]>

    * fix BCO UDM for bfloat16

    * Update trl/trainer/bco_config.py

    * Update trl/trainer/bco_config.py

    Co-authored-by: Seungjae Jung <[email protected]>

    * Update trl/trainer/utils.py

    Co-authored-by: Seungjae Jung <[email protected]>

    * Update trl/trainer/bco_trainer.py

    Co-authored-by: Seungjae Jung <[email protected]>

    * Update trl/trainer/bco_config.py

    * Update _toctree.yml

    * Update trl/trainer/bco_config.py

    * Update trl/trainer/bco_trainer.py

    * RunningMoments, fix multi GPU serialization

    * fix tests

    ---------

    Co-authored-by: Clara Luise Pohland <[email protected]>
    Co-authored-by: Kashif Rasul <[email protected]>
    Co-authored-by: Seungjae Jung <[email protected]>

commit 6171cdd
Author: Quentin Gallouédec <[email protected]>
Date:   Sun Jul 28 15:51:38 2024 +0200

    Re-add BigBird Pegasus save/load test (#1882)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 33d2151
Author: Quentin Gallouédec <[email protected]>
Date:   Sun Jul 28 15:07:10 2024 +0200

    Re-add BigBird Pegasus save/load test (#1876)

    * skip bigbird in ci

    * readd big bird test

    * pytest parametrize

    * dont check the version

    * rm model name

    * re add big bird

    * Merge branch 'main' into readd-bigbird-save-load-test

    ---------

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 8bd2ab8
Author: Quentin Gallouédec <[email protected]>
Date:   Sun Jul 28 14:06:19 2024 +0200

    Refactor judges (#1856)

    * BaseJudge -> BasePairwiseJudge

    * hf judge asyncio

    * refactor judges

    * doc

    * doc

    * doc

    * memeber judge

    * :inherited-members:

    * :inherited-members:

    * doc

    * give up

    * judge tldr with judge class

    * fix rank in multithread

    * format

    * improve doc

    * update doc

    * typo doc

    * doc online dpo

    * Update judge_tldr.py

    ---------

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 82b07d6
Author: Quentin Gallouédec <[email protected]>
Date:   Fri Jul 26 11:43:48 2024 +0200

    Llama in modelling value head tests (#1878)

commit 72bf6c2
Author: Quentin Gallouédec <[email protected]>
Date:   Fri Jul 26 11:33:07 2024 +0200

    Skip BigBird save and load test until next transformers version (#1874)

commit 74e54b5
Author: Edward Beeching <[email protected]>
Date:   Fri Jul 26 09:36:25 2024 +0200

    fix online dpo example (#1879)

commit 3930973
Author: Rishav Dash <[email protected]>
Date:   Thu Jul 25 14:17:37 2024 +0530

    Bug Fix while training using SFTTrainer with DataCollatorForCompletionOnlyLM (#1861)

    * Bug Fix while training using SFTTrainer with DataCollatorForCompletionOnlyLM

    Added ```dataset_text_field``` in the SFTConfig while training

    * Update docs/source/sft_trainer.mdx

    ---------

    Co-authored-by: Kashif Rasul <[email protected]>

commit db8e09e
Author: Rishav Dash <[email protected]>
Date:   Thu Jul 25 14:06:57 2024 +0530

    Import missing ```setup_chat_format``` (#1862)

commit 1dae55f
Author: elie <[email protected]>
Date:   Thu Jul 25 10:27:34 2024 +0200

    add fsdp_qlora config and bnb_4bit_quant_storage (#1863)

commit c8cef79
Author: Quentin Gallouédec <[email protected]>
Date:   Wed Jul 24 21:06:57 2024 +0200

    arXiv to HF Papers (#1870)

commit 7dcf437
Author: Kashif Rasul <[email protected]>
Date:   Wed Jul 24 12:27:50 2024 +0200

    [online-DPO] online dpo cleanups (#1864)

    * online dpo cleanups

    * remove unused self.policy

    * add OnlineDPOTrainer and config to __init__.py

    * import from trainer

    * online dpo test

    * rename policy to model and ref_policy to ref_model

    * renamed internally

    * formatting

commit 4e85bd7
Author: Costa Huang <[email protected]>
Date:   Thu Jul 18 14:35:31 2024 -0400

    Online DPO and Online trainer refactor (#1809)

    * online dpo trainer based on rloo trainer

    * push changes

    * refactor

    * use `batch_generation` method

    * precommit

    * remove breakpoint()

    * quick refactor

    * push the current changes

    * quick change

    * refactor

    * use the config name as the experiment name

    * fix logging

    * update online DPO docs

    * push docs

    * increment global step so tensorboard works again.

    * precommit

    * remove unused common online trainer

    * add online DPO docs

    * quick refactor

    * push changes

    * Update docs/source/online_dpo_trainer.md

    Co-authored-by: Quentin Gallouédec <[email protected]>

    ---------

    Co-authored-by: Michael Noukhovitch <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit c9d5636
Author: Quentin Gallouédec <[email protected]>
Date:   Thu Jul 18 18:28:49 2024 +0200

    rm token (#1852)
@edbeeching edbeeching restored the update-odpo-example branch July 31, 2024 13:52
qgallouedec added a commit that referenced this pull request Aug 2, 2024
* fix vsft example commands

* fix use_cache and get tokenizer from processor

* rm unused AutoTokenizer

* Squashed commit of the following:

commit 8bd2ab8
Author: Quentin Gallouédec <[email protected]>
Date:   Sun Jul 28 14:06:19 2024 +0200

    Refactor judges (#1856)

    * BaseJudge -> BasePairwiseJudge

    * hf judge asyncio

    * refactor judges

    * doc

    * doc

    * doc

    * memeber judge

    * :inherited-members:

    * :inherited-members:

    * doc

    * give up

    * judge tldr with judge class

    * fix rank in multithread

    * format

    * improve doc

    * update doc

    * typo doc

    * doc online dpo

    * Update judge_tldr.py

    ---------

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 82b07d6
Author: Quentin Gallouédec <[email protected]>
Date:   Fri Jul 26 11:43:48 2024 +0200

    Llama in modelling value head tests (#1878)

commit 72bf6c2
Author: Quentin Gallouédec <[email protected]>
Date:   Fri Jul 26 11:33:07 2024 +0200

    Skip BigBird save and load test until next transformers version (#1874)

commit 74e54b5
Author: Edward Beeching <[email protected]>
Date:   Fri Jul 26 09:36:25 2024 +0200

    fix online dpo example (#1879)

commit 3930973
Author: Rishav Dash <[email protected]>
Date:   Thu Jul 25 14:17:37 2024 +0530

    Bug Fix while training using SFTTrainer with DataCollatorForCompletionOnlyLM (#1861)

    * Bug Fix while training using SFTTrainer with DataCollatorForCompletionOnlyLM

    Added ```dataset_text_field``` in the SFTConfig while training

    * Update docs/source/sft_trainer.mdx

    ---------

    Co-authored-by: Kashif Rasul <[email protected]>

commit db8e09e
Author: Rishav Dash <[email protected]>
Date:   Thu Jul 25 14:06:57 2024 +0530

    Import missing ```setup_chat_format``` (#1862)

commit 1dae55f
Author: elie <[email protected]>
Date:   Thu Jul 25 10:27:34 2024 +0200

    add fsdp_qlora config and bnb_4bit_quant_storage (#1863)

commit c8cef79
Author: Quentin Gallouédec <[email protected]>
Date:   Wed Jul 24 21:06:57 2024 +0200

    arXiv to HF Papers (#1870)

commit 7dcf437
Author: Kashif Rasul <[email protected]>
Date:   Wed Jul 24 12:27:50 2024 +0200

    [online-DPO] online dpo cleanups (#1864)

    * online dpo cleanups

    * remove unused self.policy

    * add OnlineDPOTrainer and config to __init__.py

    * import from trainer

    * online dpo test

    * rename policy to model and ref_policy to ref_model

    * renamed internally

    * formatting

commit 4e85bd7
Author: Costa Huang <[email protected]>
Date:   Thu Jul 18 14:35:31 2024 -0400

    Online DPO and Online trainer refactor (#1809)

    * online dpo trainer based on rloo trainer

    * push changes

    * refactor

    * use `batch_generation` method

    * precommit

    * remove breakpoint()

    * quick refactor

    * push the current changes

    * quick change

    * refactor

    * use the config name as the experiment name

    * fix logging

    * update online DPO docs

    * push docs

    * increment global step so tensorboard works again.

    * precommit

    * remove unused common online trainer

    * add online DPO docs

    * quick refactor

    * push changes

    * Update docs/source/online_dpo_trainer.md

    Co-authored-by: Quentin Gallouédec <[email protected]>

    ---------

    Co-authored-by: Michael Noukhovitch <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit c9d5636
Author: Quentin Gallouédec <[email protected]>
Date:   Thu Jul 18 18:28:49 2024 +0200

    rm token (#1852)

* add section in doc

* Squashed commit of the following:

commit 890232f
Author: Quentin Gallouédec <[email protected]>
Date:   Tue Jul 30 14:29:47 2024 +0200

    update example overview (#1883)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 9929370
Author: Clara Pohland <[email protected]>
Date:   Sun Jul 28 21:10:08 2024 +0200

    Move BCO to separate BCOTrainer with fixes (#1869)

    * kto_trainer: skip KL data for BCO

    * kto_trainer: BCO allow no positives or no negatives in batch

    * kto_trainer: make RunningMoments object serializable

    * add BCOTrainer

    * fix BCO UDM for not interleaved data

    * kto_trainer: remove unused UDM part

    * bco_trainer: add tests and docs, minor fixes

    * code style fixes

    * Update docs/source/bco_trainer.mdx

    Co-authored-by: Kashif Rasul <[email protected]>

    * fix BCO UDM for bfloat16

    * Update trl/trainer/bco_config.py

    * Update trl/trainer/bco_config.py

    Co-authored-by: Seungjae Jung <[email protected]>

    * Update trl/trainer/utils.py

    Co-authored-by: Seungjae Jung <[email protected]>

    * Update trl/trainer/bco_trainer.py

    Co-authored-by: Seungjae Jung <[email protected]>

    * Update trl/trainer/bco_config.py

    * Update _toctree.yml

    * Update trl/trainer/bco_config.py

    * Update trl/trainer/bco_trainer.py

    * RunningMoments, fix multi GPU serialization

    * fix tests

    ---------

    Co-authored-by: Clara Luise Pohland <[email protected]>
    Co-authored-by: Kashif Rasul <[email protected]>
    Co-authored-by: Seungjae Jung <[email protected]>

commit 6171cdd
Author: Quentin Gallouédec <[email protected]>
Date:   Sun Jul 28 15:51:38 2024 +0200

    Re-add BigBird Pegasus save/load test (#1882)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 33d2151
Author: Quentin Gallouédec <[email protected]>
Date:   Sun Jul 28 15:07:10 2024 +0200

    Re-add BigBird Pegasus save/load test (#1876)

    * skip bigbird in ci

    * readd big bird test

    * pytest parametrize

    * dont check the version

    * rm model name

    * re add big bird

    * Merge branch 'main' into readd-bigbird-save-load-test

    ---------

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 8bd2ab8
Author: Quentin Gallouédec <[email protected]>
Date:   Sun Jul 28 14:06:19 2024 +0200

    Refactor judges (#1856)

    * BaseJudge -> BasePairwiseJudge

    * hf judge asyncio

    * refactor judges

    * doc

    * doc

    * doc

    * memeber judge

    * :inherited-members:

    * :inherited-members:

    * doc

    * give up

    * judge tldr with judge class

    * fix rank in multithread

    * format

    * improve doc

    * update doc

    * typo doc

    * doc online dpo

    * Update judge_tldr.py

    ---------

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 82b07d6
Author: Quentin Gallouédec <[email protected]>
Date:   Fri Jul 26 11:43:48 2024 +0200

    Llama in modelling value head tests (#1878)

commit 72bf6c2
Author: Quentin Gallouédec <[email protected]>
Date:   Fri Jul 26 11:33:07 2024 +0200

    Skip BigBird save and load test until next transformers version (#1874)

commit 74e54b5
Author: Edward Beeching <[email protected]>
Date:   Fri Jul 26 09:36:25 2024 +0200

    fix online dpo example (#1879)

commit 3930973
Author: Rishav Dash <[email protected]>
Date:   Thu Jul 25 14:17:37 2024 +0530

    Bug Fix while training using SFTTrainer with DataCollatorForCompletionOnlyLM (#1861)

    * Bug Fix while training using SFTTrainer with DataCollatorForCompletionOnlyLM

    Added ```dataset_text_field``` in the SFTConfig while training

    * Update docs/source/sft_trainer.mdx

    ---------

    Co-authored-by: Kashif Rasul <[email protected]>

commit db8e09e
Author: Rishav Dash <[email protected]>
Date:   Thu Jul 25 14:06:57 2024 +0530

    Import missing ```setup_chat_format``` (#1862)

commit 1dae55f
Author: elie <[email protected]>
Date:   Thu Jul 25 10:27:34 2024 +0200

    add fsdp_qlora config and bnb_4bit_quant_storage (#1863)

commit c8cef79
Author: Quentin Gallouédec <[email protected]>
Date:   Wed Jul 24 21:06:57 2024 +0200

    arXiv to HF Papers (#1870)

commit 7dcf437
Author: Kashif Rasul <[email protected]>
Date:   Wed Jul 24 12:27:50 2024 +0200

    [online-DPO] online dpo cleanups (#1864)

    * online dpo cleanups

    * remove unused self.policy

    * add OnlineDPOTrainer and config to __init__.py

    * import from trainer

    * online dpo test

    * rename policy to model and ref_policy to ref_model

    * renamed internally

    * formatting

commit 4e85bd7
Author: Costa Huang <[email protected]>
Date:   Thu Jul 18 14:35:31 2024 -0400

    Online DPO and Online trainer refactor (#1809)

    * online dpo trainer based on rloo trainer

    * push changes

    * refactor

    * use `batch_generation` method

    * precommit

    * remove breakpoint()

    * quick refactor

    * push the current changes

    * quick change

    * refactor

    * use the config name as the experiment name

    * fix logging

    * update online DPO docs

    * push docs

    * increment global step so tensorboard works again.

    * precommit

    * remove unused common online trainer

    * add online DPO docs

    * quick refactor

    * push changes

    * Update docs/source/online_dpo_trainer.md

    Co-authored-by: Quentin Gallouédec <[email protected]>

    ---------

    Co-authored-by: Michael Noukhovitch <[email protected]>
    Co-authored-by: Quentin Gallouédec <[email protected]>

commit c9d5636
Author: Quentin Gallouédec <[email protected]>
Date:   Thu Jul 18 18:28:49 2024 +0200

    rm token (#1852)

* simplify script

* doc

* use traning args

* args instead of trianing args

* fix doc

* drop eval

* rm eval section

* re-add bigbirg

---------

Co-authored-by: Quentin Gallouédec <[email protected]>
@kashif kashif deleted the update-odpo-example branch October 7, 2024 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants