deepspeed zero3 QLoRA finetuning #11625

Uxito-Ada · 2024-07-19T07:42:05Z

Description

transferred from #11048

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

N/A
Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
Application test
Document test
...

5. New dependencies

New Python dependencies
- Dependency1
- Dependency2
- ...
New Java/Scala dependencies and their license
- Dependency1 and license1
- Dependency2 and license2
- ...

jason-dai · 2024-07-21T13:17:24Z

python/llm/src/ipex_llm/transformers/low_bit_linear.py

+        if enable_deepspeed_zero3:
+            dst_tensor = torch.empty(dst_size // 2, dtype=torch.bfloat16,
+                                     device=device)


I think we should always do that for NF4 (only)?

Other NF4s are packed in torch.uint8, which do not make the buffer length redundant.
Only deepspeed zero3 needs NF4 to be packed in torch.bfloat16, which needs to halve the buffer.

qiyuangong · 2024-07-22T02:17:53Z

python/llm/src/ipex_llm/transformers/low_bit_linear.py

@@ -259,9 +264,12 @@ def ggml_convert_qtype(tensor: torch.Tensor, qtype: int,


 def ggml_q_format_convet_cpu2xpu(tensor: torch.Tensor, num_elem: int, qtype: int):
+    import os


Move os import to top, because other module may share this import.

qiyuangong · 2024-07-22T06:16:24Z

python/llm/src/ipex_llm/transformers/low_bit_linear.py

-        dst_tensor = torch.empty(dst_size, dtype=torch.uint8,
-                                 device=device)
+        if enable_deepspeed_zero3:
+            dst_tensor = torch.empty(dst_size // 2, dtype=torch.bfloat16,


Add comments for magic value 2 and hard-coded type bfloat16.

qiyuangong · 2024-07-22T06:31:32Z

python/llm/src/ipex_llm/transformers/utils.py

+
+
+# Arc platfrom does not support FP64,
+# Disable FP64 in DeepSpeedZeroOptimizer_Stage3's _constant_buffered_norm2  method


Add link to https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage3.py#L1365-L1372

What's different between our implementation and ds's one?

ds is double(), fp64
here removes double(), as Arc does not support fp64

Uxito-Ada · 2024-08-05T02:37:49Z

Any more comment or approve? @qiyuangong

jason-dai · 2024-08-05T09:47:06Z

python/llm/src/ipex_llm/transformers/model.py

@@ -524,7 +525,8 @@ def load_convert(cls, q_k, optimize_model, *args, **kwargs):
                                     imatrix_data=imatrix_data,
                                     embedding_qtype=embedding_qtype,
                                     enable_xetla=enable_xetla,
-                                     mixed_precision=mixed_precision)
+                                     mixed_precision=mixed_precision,
+                                     enable_deepspeed_zero3=enable_deepspeed_zero3)


I don't think we want to introduce this user-level parameter; we should either change all NF4 to BF16, or all training (QLoRA) NF4 to BF16, instead of doing something special for zero3 only.

pls take a look again @jason-dai @qiyuangong

qiyuangong · 2024-08-13T00:50:01Z

python/llm/src/ipex_llm/transformers/low_bit_linear.py

-
-    invalidInputError(tensor.dtype == torch.uint8,
-                      "Input tensor must be uint8")
+    invalidInputError(tensor.dtype == torch.bfloat16,


Will this change impact other features?

NF4 applications e.g. QLoRA (zero2) will not be influenced. While maybe better add judgement qtype == NF4?

qiyuangong

LGTM

Uxito-Ada · 2024-08-13T08:15:22Z

Passed PR validation.

deepspeed zero3 QLoRA finetuning

526aa20

Uxito-Ada requested review from qiyuangong and glorysdj July 19, 2024 07:42

Uxito-Ada added 10 commits July 21, 2024 19:45

Update convert.py

ce04901

Update low_bit_linear.py

e8c083a

Update utils.py

baec9e9

Update qlora_finetune_llama2_13b_arch_2_card.sh

a329756

Update low_bit_linear.py

2f7ba16

Update alpaca_qlora_finetuning.py

65d5403

Update low_bit_linear.py

6bd5811

Update utils.py

876266a

Update convert.py

154a110

Update alpaca_qlora_finetuning.py

dc2bb4d

jason-dai reviewed Jul 21, 2024

View reviewed changes

Uxito-Ada added 4 commits July 21, 2024 21:29

Update alpaca_qlora_finetuning.py

ccd53ee

Update low_bit_linear.py

8b3e9e4

Update deepspeed_zero3.json

1f53ba8

Update qlora_finetune_llama2_13b_arch_2_card.sh

3f4b35b

qiyuangong reviewed Jul 22, 2024

View reviewed changes

Update low_bit_linear.py

a69d038

qiyuangong reviewed Jul 22, 2024

View reviewed changes

Uxito-Ada added 5 commits August 2, 2024 15:05

Update low_bit_linear.py

dad4684

Update utils.py

6df300c

Merge branch 'main' into heyang_24_7_19

4d243ff

fix style

3f3d612

fix style

2ab7220

jason-dai reviewed Aug 5, 2024

View reviewed changes

Update alpaca_qlora_finetuning.py

5118595

Uxito-Ada added 5 commits August 12, 2024 14:41

Update qlora_finetune_llama2_13b_arch_2_card.sh

13884f5

Update convert.py

486da9c

Update low_bit_linear.py

2d8550f

Update model.py

95c252b

Update alpaca_qlora_finetuning.py

369c369

qiyuangong reviewed Aug 13, 2024

View reviewed changes

qiyuangong approved these changes Aug 13, 2024

View reviewed changes

Uxito-Ada added 3 commits August 13, 2024 15:15

Update low_bit_linear.py

ce99bca

Update low_bit_linear.py

17dbf80

Update low_bit_linear.py

1c8cd6c

Uxito-Ada merged commit 70c828b into intel-analytics:main Aug 13, 2024
1 check passed

Uxito-Ada deleted the heyang_24_7_19 branch August 13, 2024 08:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepspeed zero3 QLoRA finetuning #11625

deepspeed zero3 QLoRA finetuning #11625

Uxito-Ada commented Jul 19, 2024

jason-dai Jul 21, 2024

Uxito-Ada Jul 22, 2024

qiyuangong Jul 22, 2024

qiyuangong Jul 22, 2024

Uxito-Ada Aug 2, 2024

qiyuangong Jul 22, 2024

qiyuangong Jul 22, 2024

Uxito-Ada Aug 2, 2024

qiyuangong Aug 2, 2024

Uxito-Ada commented Aug 5, 2024

jason-dai Aug 5, 2024

Uxito-Ada Aug 12, 2024

qiyuangong Aug 13, 2024

Uxito-Ada Aug 13, 2024

qiyuangong left a comment

Uxito-Ada commented Aug 13, 2024

		@@ -259,9 +264,12 @@ def ggml_convert_qtype(tensor: torch.Tensor, qtype: int,


		def ggml_q_format_convet_cpu2xpu(tensor: torch.Tensor, num_elem: int, qtype: int):
		import os



		# Arc platfrom does not support FP64,
		# Disable FP64 in DeepSpeedZeroOptimizer_Stage3's _constant_buffered_norm2 method

deepspeed zero3 QLoRA finetuning #11625

deepspeed zero3 QLoRA finetuning #11625

Conversation

Uxito-Ada commented Jul 19, 2024

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

5. New dependencies

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Uxito-Ada commented Aug 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qiyuangong left a comment

Choose a reason for hiding this comment

Uxito-Ada commented Aug 13, 2024