From 0d0eff4529aca675167e0f7e0556aae90efc4431 Mon Sep 17 00:00:00 2001
From: Steven Liu <steven.liu@huggingface.co>
Date: Tue, 30 Jan 2024 13:19:48 -0800
Subject: [PATCH 1/6] content

---
 docs/source/_toctree.yml                      |   2 +
 docs/source/developer_guides/model_merging.md | 111 ++++++++++++++++++
 2 files changed, 113 insertions(+)
 create mode 100644 docs/source/developer_guides/model_merging.md

diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
index 8e493518fa..9ed38aaa9d 100644
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@@ -25,6 +25,8 @@
 
 - title: Developer guides
   sections:
+  - local: developer_guides/model_merging
+    title: Model merging
   - local: developer_guides/quantization
     title: Quantization
   - local: developer_guides/lora
diff --git a/docs/source/developer_guides/model_merging.md b/docs/source/developer_guides/model_merging.md
new file mode 100644
index 0000000000..608c8b5f40
--- /dev/null
+++ b/docs/source/developer_guides/model_merging.md
@@ -0,0 +1,111 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# Model merging
+
+Training a model for each task can be costly and take up storage, and these models aren't able to learn new information to improve performance. Multitask learning can train a model to learn multiple tasks, but this is costly to train and designing a dataset for it can be challenging. *Model merging* offers a solution to these challenges by combining multiple pretrained models into one model, giving it the combined abilities of each individual model, without any additional training.
+
+PEFT provides two methods for model merging:
+
+* [TIES-Merging](https://hf.co/papers/2306.01708) - TrIm, Elect, and Merge (TIES) is a three-step method for merging models. First, redundant parameters are trimmed, then conflicting signs are resolved into an aggregated vector, and finally the parameters whose signs are the same as the aggregate sign are averaged. This method takes into account that some values (redundant and sign disagreement) can reduce model performance in the merged model.
+* [DARE](https://hf.co/papers/2311.03099) - Drop And REscale is a method that can be used to prepare model merging methods like TIES-Merging. It works by randomly dropping parameters according to a drop rate, and rescaling the remaining parameters. This helps to reduce the number of redundant and potentially interfering parameters among multiple models.
+
+Models are merged with the [`~LoraModel.add_weighted_adapter`] method, and the specific model merging method is specified in the `combination_type` parameter. This guide will show you how to merge models with TIES, DARE, and a combination of both TIES and DARE.
+
+## TIES
+
+The [`~utils.ties`] method uses a [`~utils.magnitude_based_pruning`] approach to trim redundant parameters such that only the top-k percent of values are kept from each task vector. The number of values to keep are specified by the `density` parameter. The task tensors are weighted, and the [`~utils.calculate_majority_sign_mask`] *elects* the sign vector which means calculating the total magnitude for each parameter across all models. Lastly, the [`~utils.disjoint_merge`] function calculates the mean of the parameter values whose sign is the same as the *elected sign vector*.
+
+With PEFT, TIES merging is enabled by setting `combination_type="ties"` and setting `ties_density` to a value of the weights to keep from the individual models. For example, let's merge three [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) models trained with LoRA: [tinyllama_lora_nobots](https://huggingface.co/smangrul/tinyllama_lora_norobots), [tinyllama_lora_sql](https://huggingface.co/smangrul/tinyllama_lora_sql), and [tinyllama_lora_adcopy](https://huggingface.co/smangrul/tinyllama_lora_adcopy).
+
+Load the base model and then use the [`~PeftModel.load_adapter`] method to load and assign each adapter a name:
+
+```py
+
+```
+
+Use [`~LoraModel.add_weighted_adapter`] to set the weights for each adapter, the `adapter_name`, the `combination_type`, and `ties_density`.
+
+```py
+
+```
+
+Make the newly merged model the active model with the [`~LoraModel.set_adapter`] method, using the new `adapter_name`.
+
+```py
+
+```
+
+Now you can use the merged model as an instruction-tuned model to write ad copy or SQL queries!
+
+<hfoptions id="ties">
+<hfoption id="instruct">
+
+</hfoption>
+<hfoption id="ad copy">
+
+</hfoption>
+<hfoption id="SQL">
+
+</hfoption>
+</hfoptions>
+
+## DARE
+
+The DARE method uses the [`~utils.random_pruning`] approach to randomly drop parameters, and only preserving a percentage of the parameters set in the `density` parameter. The remaining tensors are rescaled to keep the expected output unchanged.
+
+With PEFT, DARE is enabled by setting `combination_type="dare_ties"` and setting `density` to a value of the weights to keep from the individual models.
+
+> [!TIP]
+> DARE is a super useful method for preparing models for merging which means it can be combined with other methods like TIES, `linear` (a weighted average of the task tensors) or `svd` (calculated from the *delta weights*, the model parameters before and after finetuning) or a combination of all of the above like `dare_ties_svd`.
+
+Let's merge three diffusion models to generate a variety of images in different styles using only one model. The models you'll use are (feel free to choose your own): [nerijs/pixel-art-xl](https://huggingface.co/nerijs/pixel-art-xl), [ostris/super-cereal-sdxl-lora](https://huggingface.co/ostris/super-cereal-sdxl-lora), and [KappaNeuro/studio-ghibli-style](https://huggingface.co/KappaNeuro/studio-ghibli-style).
+
+```py
+
+```
+
+Load the base model and then use the [`~PeftModel.load_adapter`] method to load and assign each adapter a name:
+
+```py
+
+```
+
+Use [`~LoraModel.add_weighted_adapter`] to set the weights for each adapter, the `adapter_name`, the `combination_type`, and `ties_density`.
+
+```py
+
+```
+
+Make the newly merged model the active model with the [`~LoraModel.set_adapter`] method, using the new `adapter_name`.
+
+```py
+
+```
+
+Now you can use the merged model to generate images in three different styles!
+
+<hfoptions id="dare">
+<hfoption id="pixel art">
+
+</hfoption>
+<hfoption id="cereal box cover">
+
+</hfoption>
+<hfoption id="Studio Ghibli">
+
+</hfoption>
+</hfoptions>

From 4faa9a0b2fea6b305c9ee9c25320b22ba55565f9 Mon Sep 17 00:00:00 2001
From: Steven Liu <steven.liu@huggingface.co>
Date: Wed, 31 Jan 2024 13:43:27 -0800
Subject: [PATCH 2/6] code snippets

---
 docs/source/developer_guides/model_merging.md | 133 +++++++++++++-----
 1 file changed, 100 insertions(+), 33 deletions(-)

diff --git a/docs/source/developer_guides/model_merging.md b/docs/source/developer_guides/model_merging.md
index 608c8b5f40..db27a87773 100644
--- a/docs/source/developer_guides/model_merging.md
+++ b/docs/source/developer_guides/model_merging.md
@@ -16,37 +16,53 @@ rendered properly in your Markdown viewer.
 
 # Model merging
 
-Training a model for each task can be costly and take up storage, and these models aren't able to learn new information to improve performance. Multitask learning can train a model to learn multiple tasks, but this is costly to train and designing a dataset for it can be challenging. *Model merging* offers a solution to these challenges by combining multiple pretrained models into one model, giving it the combined abilities of each individual model, without any additional training.
+Training a model for each task can be costly, take up storage space, and the models aren't able to learn new information to improve their performance. Multitask learning can overcome some of these limitations by training a model to learn several tasks, but this is expensive to train and designing a dataset for it can be challenging. *Model merging* offers a solution to these challenges by combining multiple pretrained models into one model, giving it the combined abilities of each individual model, without any additional training.
 
 PEFT provides two methods for model merging:
 
-* [TIES-Merging](https://hf.co/papers/2306.01708) - TrIm, Elect, and Merge (TIES) is a three-step method for merging models. First, redundant parameters are trimmed, then conflicting signs are resolved into an aggregated vector, and finally the parameters whose signs are the same as the aggregate sign are averaged. This method takes into account that some values (redundant and sign disagreement) can reduce model performance in the merged model.
-* [DARE](https://hf.co/papers/2311.03099) - Drop And REscale is a method that can be used to prepare model merging methods like TIES-Merging. It works by randomly dropping parameters according to a drop rate, and rescaling the remaining parameters. This helps to reduce the number of redundant and potentially interfering parameters among multiple models.
+* [TIES-Merging](https://hf.co/papers/2306.01708) - TrIm, Elect, and Merge (TIES) is a three-step method for merging models. First, redundant parameters are trimmed, then conflicting signs are resolved into an aggregated vector, and finally the parameters whose signs are the same as the aggregate sign are averaged. This method takes into account that some values (redundant and sign disagreement) can degrade performance in the merged model.
+* [DARE](https://hf.co/papers/2311.03099) - Drop And REscale is a method that can be used to prepare for model merging methods like TIES. It works by randomly dropping parameters according to a drop rate and rescaling the remaining parameters. This helps to reduce the number of redundant and potentially interfering parameters among multiple models.
 
 Models are merged with the [`~LoraModel.add_weighted_adapter`] method, and the specific model merging method is specified in the `combination_type` parameter. This guide will show you how to merge models with TIES, DARE, and a combination of both TIES and DARE.
 
 ## TIES
 
-The [`~utils.ties`] method uses a [`~utils.magnitude_based_pruning`] approach to trim redundant parameters such that only the top-k percent of values are kept from each task vector. The number of values to keep are specified by the `density` parameter. The task tensors are weighted, and the [`~utils.calculate_majority_sign_mask`] *elects* the sign vector which means calculating the total magnitude for each parameter across all models. Lastly, the [`~utils.disjoint_merge`] function calculates the mean of the parameter values whose sign is the same as the *elected sign vector*.
+The [`~utils.ties`] method uses a [`~utils.magnitude_based_pruning`] approach to trim redundant parameters such that only the top-k percent of values are kept from each task vector. The number of values to keep are specified by the `density` parameter. The task tensors are weighted, and the [`~utils.calculate_majority_sign_mask`] *elects* the sign vector. This means calculating the total magnitude for each parameter across all models. Lastly, the [`~utils.disjoint_merge`] function calculates the mean of the parameter values whose sign is the same as the *elected sign vector*.
 
-With PEFT, TIES merging is enabled by setting `combination_type="ties"` and setting `ties_density` to a value of the weights to keep from the individual models. For example, let's merge three [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) models trained with LoRA: [tinyllama_lora_nobots](https://huggingface.co/smangrul/tinyllama_lora_norobots), [tinyllama_lora_sql](https://huggingface.co/smangrul/tinyllama_lora_sql), and [tinyllama_lora_adcopy](https://huggingface.co/smangrul/tinyllama_lora_adcopy).
+With PEFT, TIES merging is enabled by setting `combination_type="ties"` and setting `ties_density` to a value of the weights to keep from the individual models. For example, let's merge three finetuned [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) models: [tinyllama_lora_nobots](https://huggingface.co/smangrul/tinyllama_lora_norobots), [tinyllama_lora_sql](https://huggingface.co/smangrul/tinyllama_lora_sql), and [tinyllama_lora_adcopy](https://huggingface.co/smangrul/tinyllama_lora_adcopy).
 
 Load the base model and then use the [`~PeftModel.load_adapter`] method to load and assign each adapter a name:
 
 ```py
-
+from peft import PeftConfig, PeftModel
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+
+config = PeftConfig.from_pretrained("smangrul/tinyllama_lora_norobots")
+model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, load_in4bit=True, device_map="auto")
+tokenizer = AutoTokenizer.from_pretrained("smangrul/tinyllama_lora_norobots")
+
+model.resize_token_embeddings(len(tokenizer))
+model = PeftModel.from_pretrained(model, "smangrul/tinyllama_lora_norobots", adapter_name="norobots")
+_ = model.load_adapter("smangrul/tinyllama_lora_sql", adapter_name="sql")
+_ = model.load_adapter("smangrul/tinyllama_lora_adcopy", adapter_name="adcopy")
 ```
 
-Use [`~LoraModel.add_weighted_adapter`] to set the weights for each adapter, the `adapter_name`, the `combination_type`, and `ties_density`.
+Set the adapters, weights, `adapter_name`, `combination_type`, and `ties_density` with the [`~LoraModel.add_weighted_adapter`] method.
 
 ```py
-
+adapters = ["norobots", "adcopy", "sql"]
+weights = [2.0, 0.3, 0.7]
+adapter_name = "merge"
+density = 0.2
+model.add_weighted_adapter(adapters, weights, adapter_name, combination_type="ties", ties_density=density)
 ```
 
-Make the newly merged model the active model with the [`~LoraModel.set_adapter`] method, using the new `adapter_name`.
+Set the newly merged model as the active model with the [`~LoraModel.set_adapter`] method.
 
 ```py
-
+model.eval()
+model.set_adapter("merge")
 ```
 
 Now you can use the merged model as an instruction-tuned model to write ad copy or SQL queries!
@@ -54,58 +70,109 @@ Now you can use the merged model as an instruction-tuned model to write ad copy
 <hfoptions id="ties">
 <hfoption id="instruct">
 
+```py
+messages = [
+    {"role": "user", "content": "Write an essay about Generative AI."},
+]
+text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
+inputs = tokenizer(text, return_tensors="pt")
+inputs = {k: v.to("cuda") for k, v in inputs.items()}
+outputs = model.generate(**inputs, max_new_tokens=256, do_sample=True, top_p=0.95, temperature=0.2, repetition_penalty=1.2, eos_token_id=tokenizer.eos_token_id)
+print(tokenizer.decode(outputs[0]))
+```
+
 </hfoption>
 <hfoption id="ad copy">
 
+```py
+messages = [
+    {"role": "system", "content": "Create a text ad given the following product and description."},
+    {"role": "user", "content": "Product: Sony PS5 PlayStation Console\nDescription: The PS5 console unleashes new gaming possibilities that you never anticipated."},
+]
+text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
+inputs = tokenizer(text, return_tensors="pt")
+inputs = {k: v.to("cuda") for k, v in inputs.items()}
+outputs = model.generate(**inputs, max_new_tokens=128, do_sample=True, top_p=0.95, temperature=0.2, repetition_penalty=1.2, eos_token_id=tokenizer.eos_token_id)
+print(tokenizer.decode(outputs[0]))
+```
+
 </hfoption>
 <hfoption id="SQL">
 
+```py
+text = """Table: 2-11365528-2
+Columns: ['Team', 'Head Coach', 'President', 'Home Ground', 'Location']
+Natural Query: Who is the Head Coach of the team whose President is Mario Volarevic?
+SQL Query:"""
+
+inputs = tokenizer(text, return_tensors="pt")
+inputs = {k: v.to("cuda") for k, v in inputs.items()}
+outputs = model.generate(**inputs, max_new_tokens=64, repetition_penalty=1.1, eos_token_id=tokenizer("</s>").input_ids[-1])
+print(tokenizer.decode(outputs[0]))
+```
+
 </hfoption>
 </hfoptions>
 
 ## DARE
 
-The DARE method uses the [`~utils.random_pruning`] approach to randomly drop parameters, and only preserving a percentage of the parameters set in the `density` parameter. The remaining tensors are rescaled to keep the expected output unchanged.
+The DARE method uses the [`~utils.random_pruning`] approach to randomly drop parameters, and only preserve a percentage of the parameters set in the `density` parameter. The remaining tensors are rescaled to keep the expected output unchanged.
 
 With PEFT, DARE is enabled by setting `combination_type="dare_ties"` and setting `density` to a value of the weights to keep from the individual models.
 
 > [!TIP]
 > DARE is a super useful method for preparing models for merging which means it can be combined with other methods like TIES, `linear` (a weighted average of the task tensors) or `svd` (calculated from the *delta weights*, the model parameters before and after finetuning) or a combination of all of the above like `dare_ties_svd`.
 
-Let's merge three diffusion models to generate a variety of images in different styles using only one model. The models you'll use are (feel free to choose your own): [nerijs/pixel-art-xl](https://huggingface.co/nerijs/pixel-art-xl), [ostris/super-cereal-sdxl-lora](https://huggingface.co/ostris/super-cereal-sdxl-lora), and [KappaNeuro/studio-ghibli-style](https://huggingface.co/KappaNeuro/studio-ghibli-style).
+Let's merge three diffusion models to generate a variety of images in different styles using only one model. The models you'll use are (feel free to choose your own): [nerijs/pixel-art-xl](https://huggingface.co/nerijs/pixel-art-xl) and [ostris/super-cereal-sdxl-lora](https://huggingface.co/ostris/super-cereal-sdxl-lora).
 
-```py
+Load the base model and then use the [`~diffusers.loaders.StableDiffusionXLLoraLoaderMixin.load_lora_weights`] method to load and assign each adapter a name:
 
+```py
+from diffusers import StableDiffusionXLPipeline, AutoencoderKL
+import torch
+
+vae = AutoencoderKL.from_pretrained(
+        "madebyollin/sdxl-vae-fp16-fix",
+)
+pipeline = StableDiffusionXLPipeline.from_pretrained(
+    "stabilityai/stable-diffusion-xl-base-1.0",
+    vae=vae,
+    torch_dtype=torch.float16,
+).to("cuda")
+
+pipeline.load_lora_weights(
+    "nerijs/pixel-art-xl", 
+    weight_name="pixel-art-xl.safetensors", 
+    adapter_name="pixel"
+)
+pipeline.load_lora_weights(
+    "ostris/super-cereal-sdxl-lora", 
+    weight_name="super-cereal-sdxl-lora.safetensors", 
+    adapter_name="cereal"
+)
 ```
 
-Load the base model and then use the [`~PeftModel.load_adapter`] method to load and assign each adapter a name:
+Use [`~LoraModel.add_weighted_adapter`] on the pipeline's UNet to set the weights for each adapter, the `adapter_name`, the `combination_type`, and `density`.
 
 ```py
-
+adapters = ["pixel", "cereal"]
+weights = [1.0, 1.0]
+adapter_name = "merge"
+density = 0.5
+pipeline.unet.add_weighted_adapter(adapters, weights, adapter_name, combination_type="dare_ties", density=density)
 ```
 
-Use [`~LoraModel.add_weighted_adapter`] to set the weights for each adapter, the `adapter_name`, the `combination_type`, and `ties_density`.
+Make the newly merged model the active model with the [`~LoraModel.set_adapter`] method.
 
 ```py
-
+pipeline.unet.set_adapter("merge")
 ```
 
-Make the newly merged model the active model with the [`~LoraModel.set_adapter`] method, using the new `adapter_name`.
+Now you can use the merged model to generate images in two different styles!
 
 ```py
-
+prompt = "soft fluffy pancakes shaped like kawaii bear faces"
+generator = [torch.Generator(device="cuda").manual_seed(0)]
+image = pipeline(prompt, generator=generator).images[0]
+image
 ```
-
-Now you can use the merged model to generate images in three different styles!
-
-<hfoptions id="dare">
-<hfoption id="pixel art">
-
-</hfoption>
-<hfoption id="cereal box cover">
-
-</hfoption>
-<hfoption id="Studio Ghibli">
-
-</hfoption>
-</hfoptions>

From b8badeeaa473d77d9013a139dffd62254546839c Mon Sep 17 00:00:00 2001
From: Steven Liu <steven.liu@huggingface.co>
Date: Tue, 6 Feb 2024 10:24:36 -0800
Subject: [PATCH 3/6] api reference

---
 docs/source/_toctree.yml                      |  4 +++
 docs/source/developer_guides/model_merging.md |  6 ++--
 docs/source/package_reference/merge_utils.md  | 33 +++++++++++++++++++
 3 files changed, 41 insertions(+), 2 deletions(-)
 create mode 100644 docs/source/package_reference/merge_utils.md

diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
index 9ed38aaa9d..b0c8ccd2a1 100644
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@@ -99,5 +99,9 @@
     - local: package_reference/prompt_tuning
       title: Prompt tuning
     title: Adapters
+  - sections:
+    - local: package_reference/merge_utils
+      title: Model merge
+    title: Utilities
   title: API reference
 
diff --git a/docs/source/developer_guides/model_merging.md b/docs/source/developer_guides/model_merging.md
index db27a87773..ecd9f655af 100644
--- a/docs/source/developer_guides/model_merging.md
+++ b/docs/source/developer_guides/model_merging.md
@@ -20,7 +20,7 @@ Training a model for each task can be costly, take up storage space, and the mod
 
 PEFT provides two methods for model merging:
 
-* [TIES-Merging](https://hf.co/papers/2306.01708) - TrIm, Elect, and Merge (TIES) is a three-step method for merging models. First, redundant parameters are trimmed, then conflicting signs are resolved into an aggregated vector, and finally the parameters whose signs are the same as the aggregate sign are averaged. This method takes into account that some values (redundant and sign disagreement) can degrade performance in the merged model.
+* [TIES](https://hf.co/papers/2306.01708) - TrIm, Elect, and Merge (TIES) is a three-step method for merging models. First, redundant parameters are trimmed, then conflicting signs are resolved into an aggregated vector, and finally the parameters whose signs are the same as the aggregate sign are averaged. This method takes into account that some values (redundant and sign disagreement) can degrade performance in the merged model.
 * [DARE](https://hf.co/papers/2311.03099) - Drop And REscale is a method that can be used to prepare for model merging methods like TIES. It works by randomly dropping parameters according to a drop rate and rescaling the remaining parameters. This helps to reduce the number of redundant and potentially interfering parameters among multiple models.
 
 Models are merged with the [`~LoraModel.add_weighted_adapter`] method, and the specific model merging method is specified in the `combination_type` parameter. This guide will show you how to merge models with TIES, DARE, and a combination of both TIES and DARE.
@@ -31,7 +31,9 @@ The [`~utils.ties`] method uses a [`~utils.magnitude_based_pruning`] approach to
 
 With PEFT, TIES merging is enabled by setting `combination_type="ties"` and setting `ties_density` to a value of the weights to keep from the individual models. For example, let's merge three finetuned [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) models: [tinyllama_lora_nobots](https://huggingface.co/smangrul/tinyllama_lora_norobots), [tinyllama_lora_sql](https://huggingface.co/smangrul/tinyllama_lora_sql), and [tinyllama_lora_adcopy](https://huggingface.co/smangrul/tinyllama_lora_adcopy).
 
-Load the base model and then use the [`~PeftModel.load_adapter`] method to load and assign each adapter a name:
+Load the base model and use the [`~transformers.PreTrainedModel.resize_token_embeddings`] method to account for special tokens added to the embedding layer for each finetuned model. This method ensures the special tokens and initialization of the embedding layers are consistent.
+
+Then you can use the [`~PeftModel.load_adapter`] method to load and assign each adapter a name:
 
 ```py
 from peft import PeftConfig, PeftModel
diff --git a/docs/source/package_reference/merge_utils.md b/docs/source/package_reference/merge_utils.md
new file mode 100644
index 0000000000..c146ea4c10
--- /dev/null
+++ b/docs/source/package_reference/merge_utils.md
@@ -0,0 +1,33 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# Model merge
+
+PEFT provides several internal utilities for [model merging](../developer_guides/model_merging) with the TIES and DARE methods.
+
+[[autodoc]] utils.merge_utils.prune
+
+[[autodoc]] utils.merge_utils.calculate_majority_sign_mask
+
+[[autodoc]] utils.merge_utils.disjoint_merge
+
+[[autodoc]] utils.merge_utils.task_arithmetic
+
+[[autodoc]] utils.merge_utils.ties
+
+[[autodoc]] utils.merge_utils.dare_linear
+
+[[autodoc]] utils.merge_utils.dare_ties

From bde2219bb48d6650afdf39237104347c18f98167 Mon Sep 17 00:00:00 2001
From: Steven Liu <steven.liu@huggingface.co>
Date: Mon, 12 Feb 2024 08:49:21 -0800
Subject: [PATCH 4/6] update

---
 docs/source/developer_guides/model_merging.md | 104 +++++-------------
 1 file changed, 30 insertions(+), 74 deletions(-)

diff --git a/docs/source/developer_guides/model_merging.md b/docs/source/developer_guides/model_merging.md
index ecd9f655af..7c0ca7c0ef 100644
--- a/docs/source/developer_guides/model_merging.md
+++ b/docs/source/developer_guides/model_merging.md
@@ -16,24 +16,26 @@ rendered properly in your Markdown viewer.
 
 # Model merging
 
-Training a model for each task can be costly, take up storage space, and the models aren't able to learn new information to improve their performance. Multitask learning can overcome some of these limitations by training a model to learn several tasks, but this is expensive to train and designing a dataset for it can be challenging. *Model merging* offers a solution to these challenges by combining multiple pretrained models into one model, giving it the combined abilities of each individual model, without any additional training.
+Training a model for each task can be costly, take up storage space, and the models aren't able to learn new information to improve their performance. Multitask learning can overcome some of these limitations by training a model to learn several tasks, but it is expensive to train and designing a dataset for it is challenging. *Model merging* offers a solution to these challenges by combining multiple pretrained models into one model, giving it the combined abilities of each individual model without any additional training.
 
 PEFT provides two methods for model merging:
 
 * [TIES](https://hf.co/papers/2306.01708) - TrIm, Elect, and Merge (TIES) is a three-step method for merging models. First, redundant parameters are trimmed, then conflicting signs are resolved into an aggregated vector, and finally the parameters whose signs are the same as the aggregate sign are averaged. This method takes into account that some values (redundant and sign disagreement) can degrade performance in the merged model.
-* [DARE](https://hf.co/papers/2311.03099) - Drop And REscale is a method that can be used to prepare for model merging methods like TIES. It works by randomly dropping parameters according to a drop rate and rescaling the remaining parameters. This helps to reduce the number of redundant and potentially interfering parameters among multiple models.
+* [DARE](https://hf.co/papers/2311.03099) - Drop And REscale is a method that can be used to prepare for other model merging methods like TIES. It works by randomly dropping parameters according to a drop rate and rescaling the remaining parameters. This helps to reduce the number of redundant and potentially interfering parameters among multiple models.
 
-Models are merged with the [`~LoraModel.add_weighted_adapter`] method, and the specific model merging method is specified in the `combination_type` parameter. This guide will show you how to merge models with TIES, DARE, and a combination of both TIES and DARE.
+Models are merged with the [`~LoraModel.add_weighted_adapter`] method, and the specific model merging method is specified in the `combination_type` parameter. This guide will show you how to merge models with TIES and DARE.
 
-## TIES
+## Merge method
 
-The [`~utils.ties`] method uses a [`~utils.magnitude_based_pruning`] approach to trim redundant parameters such that only the top-k percent of values are kept from each task vector. The number of values to keep are specified by the `density` parameter. The task tensors are weighted, and the [`~utils.calculate_majority_sign_mask`] *elects* the sign vector. This means calculating the total magnitude for each parameter across all models. Lastly, the [`~utils.disjoint_merge`] function calculates the mean of the parameter values whose sign is the same as the *elected sign vector*.
+With PEFT, merging is enabled by setting `combination_type` and `ties_density` to a value of the weights to keep from the individual models. For example, let's merge three finetuned [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) models: [tinyllama_lora_nobots](https://huggingface.co/smangrul/tinyllama_lora_norobots), [tinyllama_lora_sql](https://huggingface.co/smangrul/tinyllama_lora_sql), and [tinyllama_lora_adcopy](https://huggingface.co/smangrul/tinyllama_lora_adcopy).
 
-With PEFT, TIES merging is enabled by setting `combination_type="ties"` and setting `ties_density` to a value of the weights to keep from the individual models. For example, let's merge three finetuned [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) models: [tinyllama_lora_nobots](https://huggingface.co/smangrul/tinyllama_lora_norobots), [tinyllama_lora_sql](https://huggingface.co/smangrul/tinyllama_lora_sql), and [tinyllama_lora_adcopy](https://huggingface.co/smangrul/tinyllama_lora_adcopy).
+<Tip>
 
-Load the base model and use the [`~transformers.PreTrainedModel.resize_token_embeddings`] method to account for special tokens added to the embedding layer for each finetuned model. This method ensures the special tokens and initialization of the embedding layers are consistent.
+If you're merging with TIES, use the [`~transformers.PreTrainedModel.resize_token_embeddings`] method to account for special tokens added to the embedding layer for each finetuned model. This method ensures the special tokens and initialization of the embedding layers are consistent.
 
-Then you can use the [`~PeftModel.load_adapter`] method to load and assign each adapter a name:
+</Tip>
+
+Load a base model and can use the [`~PeftModel.load_adapter`] method to load and assign each adapter a name:
 
 ```py
 from peft import PeftConfig, PeftModel
@@ -41,10 +43,10 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
 
 config = PeftConfig.from_pretrained("smangrul/tinyllama_lora_norobots")
-model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, load_in4bit=True, device_map="auto")
+model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, load_in_4bit=True, device_map="auto")
 tokenizer = AutoTokenizer.from_pretrained("smangrul/tinyllama_lora_norobots")
 
-model.resize_token_embeddings(len(tokenizer))
+# model.resize_token_embeddings(len(tokenizer)) if using TIES method
 model = PeftModel.from_pretrained(model, "smangrul/tinyllama_lora_norobots", adapter_name="norobots")
 _ = model.load_adapter("smangrul/tinyllama_lora_sql", adapter_name="sql")
 _ = model.load_adapter("smangrul/tinyllama_lora_adcopy", adapter_name="adcopy")
@@ -52,14 +54,31 @@ _ = model.load_adapter("smangrul/tinyllama_lora_adcopy", adapter_name="adcopy")
 
 Set the adapters, weights, `adapter_name`, `combination_type`, and `ties_density` with the [`~LoraModel.add_weighted_adapter`] method.
 
+<hfoptions id="merge-method">
+<hfoption id="TIES">
+
 ```py
 adapters = ["norobots", "adcopy", "sql"]
 weights = [2.0, 0.3, 0.7]
 adapter_name = "merge"
 density = 0.2
-model.add_weighted_adapter(adapters, weights, adapter_name, combination_type="ties", ties_density=density)
+model.add_weighted_adapter(adapters, weights, adapter_name, combination_type="ties", density=density)
 ```
 
+</hfoption>
+<hfoption id="DARE">
+
+```py
+adapters = ["norobots", "adcopy", "sql"]
+weights = [2.0, 0.3, 0.7]
+adapter_name = "merge"
+density = 0.2
+model.add_weighted_adapter(adapters, weights, adapter_name, combination_type="dare_ties", density=density)
+```
+
+</hfoption>
+</hfoptions>
+
 Set the newly merged model as the active model with the [`~LoraModel.set_adapter`] method.
 
 ```py
@@ -115,66 +134,3 @@ print(tokenizer.decode(outputs[0]))
 
 </hfoption>
 </hfoptions>
-
-## DARE
-
-The DARE method uses the [`~utils.random_pruning`] approach to randomly drop parameters, and only preserve a percentage of the parameters set in the `density` parameter. The remaining tensors are rescaled to keep the expected output unchanged.
-
-With PEFT, DARE is enabled by setting `combination_type="dare_ties"` and setting `density` to a value of the weights to keep from the individual models.
-
-> [!TIP]
-> DARE is a super useful method for preparing models for merging which means it can be combined with other methods like TIES, `linear` (a weighted average of the task tensors) or `svd` (calculated from the *delta weights*, the model parameters before and after finetuning) or a combination of all of the above like `dare_ties_svd`.
-
-Let's merge three diffusion models to generate a variety of images in different styles using only one model. The models you'll use are (feel free to choose your own): [nerijs/pixel-art-xl](https://huggingface.co/nerijs/pixel-art-xl) and [ostris/super-cereal-sdxl-lora](https://huggingface.co/ostris/super-cereal-sdxl-lora).
-
-Load the base model and then use the [`~diffusers.loaders.StableDiffusionXLLoraLoaderMixin.load_lora_weights`] method to load and assign each adapter a name:
-
-```py
-from diffusers import StableDiffusionXLPipeline, AutoencoderKL
-import torch
-
-vae = AutoencoderKL.from_pretrained(
-        "madebyollin/sdxl-vae-fp16-fix",
-)
-pipeline = StableDiffusionXLPipeline.from_pretrained(
-    "stabilityai/stable-diffusion-xl-base-1.0",
-    vae=vae,
-    torch_dtype=torch.float16,
-).to("cuda")
-
-pipeline.load_lora_weights(
-    "nerijs/pixel-art-xl", 
-    weight_name="pixel-art-xl.safetensors", 
-    adapter_name="pixel"
-)
-pipeline.load_lora_weights(
-    "ostris/super-cereal-sdxl-lora", 
-    weight_name="super-cereal-sdxl-lora.safetensors", 
-    adapter_name="cereal"
-)
-```
-
-Use [`~LoraModel.add_weighted_adapter`] on the pipeline's UNet to set the weights for each adapter, the `adapter_name`, the `combination_type`, and `density`.
-
-```py
-adapters = ["pixel", "cereal"]
-weights = [1.0, 1.0]
-adapter_name = "merge"
-density = 0.5
-pipeline.unet.add_weighted_adapter(adapters, weights, adapter_name, combination_type="dare_ties", density=density)
-```
-
-Make the newly merged model the active model with the [`~LoraModel.set_adapter`] method.
-
-```py
-pipeline.unet.set_adapter("merge")
-```
-
-Now you can use the merged model to generate images in two different styles!
-
-```py
-prompt = "soft fluffy pancakes shaped like kawaii bear faces"
-generator = [torch.Generator(device="cuda").manual_seed(0)]
-image = pipeline(prompt, generator=generator).images[0]
-image
-```

From b5ed8c8c1b0dc249bb00779c7582b4f2627c642c Mon Sep 17 00:00:00 2001
From: Steven Liu <steven.liu@huggingface.co>
Date: Tue, 13 Feb 2024 11:07:23 -0800
Subject: [PATCH 5/6] feedback

---
 docs/source/developer_guides/model_merging.md | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/docs/source/developer_guides/model_merging.md b/docs/source/developer_guides/model_merging.md
index 7c0ca7c0ef..20829814fd 100644
--- a/docs/source/developer_guides/model_merging.md
+++ b/docs/source/developer_guides/model_merging.md
@@ -29,9 +29,13 @@ Models are merged with the [`~LoraModel.add_weighted_adapter`] method, and the s
 
 With PEFT, merging is enabled by setting `combination_type` and `ties_density` to a value of the weights to keep from the individual models. For example, let's merge three finetuned [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) models: [tinyllama_lora_nobots](https://huggingface.co/smangrul/tinyllama_lora_norobots), [tinyllama_lora_sql](https://huggingface.co/smangrul/tinyllama_lora_sql), and [tinyllama_lora_adcopy](https://huggingface.co/smangrul/tinyllama_lora_adcopy).
 
-<Tip>
+<Tip warninig={true}>
 
-If you're merging with TIES, use the [`~transformers.PreTrainedModel.resize_token_embeddings`] method to account for special tokens added to the embedding layer for each finetuned model. This method ensures the special tokens and initialization of the embedding layers are consistent.
+When you're attempting to merge fully trained models with TIES, you should be aware of any special tokens each model may have added to the embedding layer which are not a part of the original checkpoint's vocabulary. This may cause an issue because each model may have added a special token to the same embedding position. If this is the case, you should use the [`~transformers.PreTrainedModel.resize_token_embeddings`] method to avoid merging the special tokens at the same embedding index.
+
+<br>
+
+This shouldn't be an issue if you're only merging PEFT modules trained from a shared base model.
 
 </Tip>
 
@@ -46,7 +50,6 @@ config = PeftConfig.from_pretrained("smangrul/tinyllama_lora_norobots")
 model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, load_in_4bit=True, device_map="auto")
 tokenizer = AutoTokenizer.from_pretrained("smangrul/tinyllama_lora_norobots")
 
-# model.resize_token_embeddings(len(tokenizer)) if using TIES method
 model = PeftModel.from_pretrained(model, "smangrul/tinyllama_lora_norobots", adapter_name="norobots")
 _ = model.load_adapter("smangrul/tinyllama_lora_sql", adapter_name="sql")
 _ = model.load_adapter("smangrul/tinyllama_lora_adcopy", adapter_name="adcopy")
@@ -57,9 +60,11 @@ Set the adapters, weights, `adapter_name`, `combination_type`, and `ties_density
 <hfoptions id="merge-method">
 <hfoption id="TIES">
 
+Weight values greater than `1.0` typically produce better results because they preserve the correct scale. A good default starting value for the weights is to set all values to `1.0`.
+
 ```py
 adapters = ["norobots", "adcopy", "sql"]
-weights = [2.0, 0.3, 0.7]
+weights = [2.0, 1.0, 1.0]
 adapter_name = "merge"
 density = 0.2
 model.add_weighted_adapter(adapters, weights, adapter_name, combination_type="ties", density=density)

From a4f772a412f6c0e4bff009d2134fe29c610aab4c Mon Sep 17 00:00:00 2001
From: Steven Liu <steven.liu@huggingface.co>
Date: Wed, 14 Feb 2024 11:39:50 -0800
Subject: [PATCH 6/6] feedback

---
 docs/source/developer_guides/model_merging.md | 13 ++++++-------
 docs/source/package_reference/merge_utils.md  |  2 +-
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/docs/source/developer_guides/model_merging.md b/docs/source/developer_guides/model_merging.md
index 20829814fd..4ecc1f8b5e 100644
--- a/docs/source/developer_guides/model_merging.md
+++ b/docs/source/developer_guides/model_merging.md
@@ -18,16 +18,16 @@ rendered properly in your Markdown viewer.
 
 Training a model for each task can be costly, take up storage space, and the models aren't able to learn new information to improve their performance. Multitask learning can overcome some of these limitations by training a model to learn several tasks, but it is expensive to train and designing a dataset for it is challenging. *Model merging* offers a solution to these challenges by combining multiple pretrained models into one model, giving it the combined abilities of each individual model without any additional training.
 
-PEFT provides two methods for model merging:
+PEFT provides several methods for merging models like a linear or SVD combination. This guide focuses on two methods that are more efficient for merging LoRA adapters by eliminating redundant parameters:
 
 * [TIES](https://hf.co/papers/2306.01708) - TrIm, Elect, and Merge (TIES) is a three-step method for merging models. First, redundant parameters are trimmed, then conflicting signs are resolved into an aggregated vector, and finally the parameters whose signs are the same as the aggregate sign are averaged. This method takes into account that some values (redundant and sign disagreement) can degrade performance in the merged model.
 * [DARE](https://hf.co/papers/2311.03099) - Drop And REscale is a method that can be used to prepare for other model merging methods like TIES. It works by randomly dropping parameters according to a drop rate and rescaling the remaining parameters. This helps to reduce the number of redundant and potentially interfering parameters among multiple models.
 
-Models are merged with the [`~LoraModel.add_weighted_adapter`] method, and the specific model merging method is specified in the `combination_type` parameter. This guide will show you how to merge models with TIES and DARE.
+Models are merged with the [`~LoraModel.add_weighted_adapter`] method, and the specific model merging method is specified in the `combination_type` parameter.
 
 ## Merge method
 
-With PEFT, merging is enabled by setting `combination_type` and `ties_density` to a value of the weights to keep from the individual models. For example, let's merge three finetuned [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) models: [tinyllama_lora_nobots](https://huggingface.co/smangrul/tinyllama_lora_norobots), [tinyllama_lora_sql](https://huggingface.co/smangrul/tinyllama_lora_sql), and [tinyllama_lora_adcopy](https://huggingface.co/smangrul/tinyllama_lora_adcopy).
+With TIES and DARE, merging is enabled by setting `combination_type` and `density` to a value of the weights to keep from the individual models. For example, let's merge three finetuned [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) models: [tinyllama_lora_nobots](https://huggingface.co/smangrul/tinyllama_lora_norobots), [tinyllama_lora_sql](https://huggingface.co/smangrul/tinyllama_lora_sql), and [tinyllama_lora_adcopy](https://huggingface.co/smangrul/tinyllama_lora_adcopy).
 
 <Tip warninig={true}>
 
@@ -35,7 +35,7 @@ When you're attempting to merge fully trained models with TIES, you should be aw
 
 <br>
 
-This shouldn't be an issue if you're only merging PEFT modules trained from a shared base model.
+This shouldn't be an issue if you're only merging LoRA adapters trained from the same base model.
 
 </Tip>
 
@@ -47,7 +47,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
 
 config = PeftConfig.from_pretrained("smangrul/tinyllama_lora_norobots")
-model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, load_in_4bit=True, device_map="auto")
+model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, load_in_4bit=True, device_map="auto").eval()
 tokenizer = AutoTokenizer.from_pretrained("smangrul/tinyllama_lora_norobots")
 
 model = PeftModel.from_pretrained(model, "smangrul/tinyllama_lora_norobots", adapter_name="norobots")
@@ -55,7 +55,7 @@ _ = model.load_adapter("smangrul/tinyllama_lora_sql", adapter_name="sql")
 _ = model.load_adapter("smangrul/tinyllama_lora_adcopy", adapter_name="adcopy")
 ```
 
-Set the adapters, weights, `adapter_name`, `combination_type`, and `ties_density` with the [`~LoraModel.add_weighted_adapter`] method.
+Set the adapters, weights, `adapter_name`, `combination_type`, and `density` with the [`~LoraModel.add_weighted_adapter`] method.
 
 <hfoptions id="merge-method">
 <hfoption id="TIES">
@@ -87,7 +87,6 @@ model.add_weighted_adapter(adapters, weights, adapter_name, combination_type="da
 Set the newly merged model as the active model with the [`~LoraModel.set_adapter`] method.
 
 ```py
-model.eval()
 model.set_adapter("merge")
 ```
 
diff --git a/docs/source/package_reference/merge_utils.md b/docs/source/package_reference/merge_utils.md
index c146ea4c10..e5746127dc 100644
--- a/docs/source/package_reference/merge_utils.md
+++ b/docs/source/package_reference/merge_utils.md
@@ -16,7 +16,7 @@ rendered properly in your Markdown viewer.
 
 # Model merge
 
-PEFT provides several internal utilities for [model merging](../developer_guides/model_merging) with the TIES and DARE methods.
+PEFT provides several internal utilities for [merging LoRA adapters](../developer_guides/model_merging) with the TIES and DARE methods.
 
 [[autodoc]] utils.merge_utils.prune