Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Model merging #1423

Merged
merged 6 commits into from
Feb 15, 2024
Merged

[docs] Model merging #1423

merged 6 commits into from
Feb 15, 2024

Conversation

stevhliu
Copy link
Member

@stevhliu stevhliu commented Jan 31, 2024

A guide to new model merging methods introduced in #1364.

todo:

  • add API reference for merging utilities (once the other PR is merged, I'll rerun the build_pr_documentation test and it should pass)
  • test and run code examples

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@pacman100 pacman100 mentioned this pull request Feb 1, 2024
3 tasks
@prateeky2806
Copy link
Contributor

Hi, is there an estimated timeline about by when this would be merged ?

@stevhliu
Copy link
Member Author

Hi, is there an estimated timeline about by when this would be merged ?

Hi, it should be ready in the next few days and at the end of the week by the latest if there are no major issues!

docs/source/developer_guides/model_merging.md Outdated Show resolved Hide resolved
docs/source/developer_guides/model_merging.md Outdated Show resolved Hide resolved
adapters = ["norobots", "adcopy", "sql"]
weights = [2.0, 0.3, 0.7]
adapter_name = "merge"
density = 0.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how this density parameter works in dare_ties, I am assuming it is used to keep 20% params and then rescale as in DARE. However, I am not sure if we do TIES on top of it then will it again only keep the 20% params of the pruned and rescaled checkpoint essentially leading to 0.2*0.2 *100 = 4% remaining parameter or if it will keep 20% of the parameters. This is not a comment on the documentation but this behaviour is not very clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, in dare_ties, first random pruning happens based on density followed by scaling. After this, the majority_sign_mask and disjoint_merge are performed similar to the ties method. So, pruning is taken from dare which is random and rescaled followed by majority sign and disjoint merge from ties.

Comment on lines 49 to 55
config = PeftConfig.from_pretrained("smangrul/tinyllama_lora_norobots")
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, load_in_4bit=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("smangrul/tinyllama_lora_norobots")

model = PeftModel.from_pretrained(model, "smangrul/tinyllama_lora_norobots", adapter_name="norobots")
_ = model.load_adapter("smangrul/tinyllama_lora_sql", adapter_name="sql")
_ = model.load_adapter("smangrul/tinyllama_lora_adcopy", adapter_name="adcopy")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pacman100 @BenjaminBossan makes sense to have copies of these checkpoints under the PEFT testing org?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's try to put those artifacts on https://huggingface.co/peft-internal-testing.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking solid!

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done. On top of what has already been mentioned, I just have one comment about there actually being more than 2 methods. Otherwise, this LGTM.

docs/source/developer_guides/model_merging.md Outdated Show resolved Hide resolved
docs/source/developer_guides/model_merging.md Outdated Show resolved Hide resolved
@stevhliu stevhliu merged commit cde8f1a into huggingface:main Feb 15, 2024
14 checks passed
@stevhliu stevhliu deleted the model-merging branch February 15, 2024 16:13
BenjaminBossan pushed a commit to BenjaminBossan/peft that referenced this pull request Mar 14, 2024
* content

* code snippets

* api reference

* update

* feedback

* feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants