Deepspeed Zero3 QLoRA Fine-tuning #11048

Uxito-Ada · 2024-05-16T06:11:37Z

Description

Use Deepspeed Zero3 to split and distribute layers of a large model to multiple XPUs and executes QLoRA fine-tuning.

1. Why the change?

as above

2. User API changes

append enable_deepspeed_zero3 in from_pretrained and qlora.py

3. Summary of the change

as above

4. How to test?

qiyuangong · 2024-05-22T06:30:59Z

python/llm/src/ipex_llm/transformers/low_bit_linear.py

@@ -524,7 +536,10 @@ class MatMulLowBit(torch.autograd.Function):
    def forward(ctx, A, weight, input_seq_size):
        ctx.is_empty = False
        import linear_q4_0
-        result = linear_q4_0.forward_new(A, weight.data, weight.qtype, input_seq_size)
+        if hasattr(weight, "enable_deepspeed_zero3") and weight.enable_deepspeed_zero3:
+            result = linear_q4_0.forward_new(A, weight.data.byte(), NF4, input_seq_size)


@cyita Please check if this packing can get correct result.

qiyuangong · 2024-05-22T06:37:05Z

Please resolve the conflict with rebase.
Other LGBM.

qiyuangong · 2024-05-22T06:39:31Z

python/llm/example/GPU/LLM-Finetuning/Deepspeed-Zero3/alpaca_qlora_zero3_finetuning.py

+    tokenizer = LlamaTokenizer.from_pretrained(base_model, trust_remote_code=True)
+    print(f"Tokenizer loaded on rank {os.environ.get('LOCAL_RANK')}")
+
+    tokenizer.pad_token_id = (


tokenizer.pad_token_id = ( 0 # unk. we want this to be different from the eos token ) tokenizer.padding_side = "left" # Allow batched inference

This code is not necessary anymore.

yangw1234 · 2024-06-04T00:15:04Z

python/llm/src/ipex_llm/transformers/low_bit_linear.py

-    dst_tensor = torch.empty(dst_size, dtype=torch.uint8,
-                             device=device)
+    if enable_deepspeed_zero3:
+        dst_tensor = torch.empty(dst_size, dtype=torch.bfloat16,


If using torch.bfloat16, I think we only need half the dst_size, right? (assuming dst_size is a even number)

yangw1234 · 2024-06-04T00:21:25Z

python/llm/example/GPU/LLM-Finetuning/Deepspeed-Zero3/alpaca_qlora_zero3_finetuning.py

+import accelerate
+import transformers
+
+from transformers import AutoTokenizer, BitsAndBytesConfig, AutoConfig, AutoModelForCausalLM


Why not using the AutoModelForCausalLM from ipex-llm?

yangw1234 · 2024-06-04T00:22:07Z

python/llm/example/GPU/LLM-Finetuning/Deepspeed-Zero3/alpaca_qlora_zero3_finetuning.py

+
+    model_config = model_config = AutoConfig.from_pretrained(base_model)
+    with ds.zero.Init(config_dict_or_path=deepspeed):
+        model = AutoModelForCausalLM.from_pretrained(


why not setting load_in_low_bit?

yangw1234 · 2024-06-04T00:28:06Z

python/llm/src/ipex_llm/transformers/low_bit_linear.py

@@ -524,7 +536,10 @@ class MatMulLowBit(torch.autograd.Function):
    def forward(ctx, A, weight, input_seq_size):
        ctx.is_empty = False
        import linear_q4_0
-        result = linear_q4_0.forward_new(A, weight.data, weight.qtype, input_seq_size)
+        if hasattr(weight, "enable_deepspeed_zero3") and weight.enable_deepspeed_zero3:
+            result = linear_q4_0.forward_new(A, weight.data.byte(), NF4, input_seq_size)


weight.data.byte() means converting every element to torch.uint8. Do you mean something like weight.data.view(torch.uint8)?

And shouldn't the qtype be the same as the load_in_low_bit in from_pretrained, instead of hard-coded to NF4?

Uxito-Ada · 2024-07-19T07:39:53Z

deprecated

Uxito-Ada and others added 3 commits May 16, 2024 14:04

Deepspeed Zero3 QLoRA Fine-tuning

259290f

Update low_bit_linear.py

930afd3

Update alpaca_qlora_zero3_finetuning.py

5827f42

qiyuangong reviewed May 22, 2024

View reviewed changes

qiyuangong requested a review from cyita May 22, 2024 06:36

qiyuangong reviewed May 22, 2024

View reviewed changes

Uxito-Ada mentioned this pull request May 28, 2024

Shape Mismatch with Checkpoint for Deepspeed Zero3 #11093

Open

yangw1234 reviewed Jun 4, 2024

View reviewed changes

Uxito-Ada closed this Jul 19, 2024

Uxito-Ada mentioned this pull request Jul 19, 2024

deepspeed zero3 QLoRA finetuning #11625

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepspeed Zero3 QLoRA Fine-tuning #11048

Deepspeed Zero3 QLoRA Fine-tuning #11048

Uxito-Ada commented May 16, 2024

qiyuangong May 22, 2024

qiyuangong commented May 22, 2024

qiyuangong May 22, 2024

yangw1234 Jun 4, 2024

yangw1234 Jun 4, 2024

yangw1234 Jun 4, 2024

yangw1234 Jun 4, 2024

yangw1234 Jun 4, 2024 •

edited

Loading

Uxito-Ada commented Jul 19, 2024

Deepspeed Zero3 QLoRA Fine-tuning #11048

Deepspeed Zero3 QLoRA Fine-tuning #11048

Conversation

Uxito-Ada commented May 16, 2024

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

qiyuangong May 22, 2024

Choose a reason for hiding this comment

qiyuangong commented May 22, 2024

qiyuangong May 22, 2024

Choose a reason for hiding this comment

yangw1234 Jun 4, 2024

Choose a reason for hiding this comment

yangw1234 Jun 4, 2024

Choose a reason for hiding this comment

yangw1234 Jun 4, 2024

Choose a reason for hiding this comment

yangw1234 Jun 4, 2024

Choose a reason for hiding this comment

yangw1234 Jun 4, 2024 • edited Loading

Choose a reason for hiding this comment

Uxito-Ada commented Jul 19, 2024

yangw1234 Jun 4, 2024 •

edited

Loading