Fixes to Accelerated Peft #89

fabianlim · 2024-10-08T07:58:10Z

This PR provides two fixes

fallback to target module defaults. To keep in line with regular HF behavior, we fallback to HF default target modules if --target_modules happen to be omitted from the peft_config
disabling the gptq v1 to v2 conversion: we revert back to default HF behavior and disable offloading. Upon reviewing the accelerate.load_checkpoint_in_model i know start to feel the offloading may not be needed moving forward. The main purpose of that function is to read the (sharded) checkpoint and load it into the model (on the CPU, since low_cpu_mem_mode is not working for GPTQ). I think the proper way forward is to get low_cpu_mem_mode working for GPTQ, and not worry about offloading the state dict (as it is a hack in itself).

However do note that for low_cpu_mem_mode we have observed some problems in the BNB case for certain models (e.g., GraniteCausalForLM), see #83

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

plugins/accelerated-peft/src/fms_acceleration_peft/autogptq_utils.py

anhuong

The changes for removing offload_state_dict and offload_buffers looks good as well as using the mapping from Peft. Small comment on the error and I can try to test this change on a small model to ensure it works with setting target_modules=None. Also note there are a few small changes to the benchmark scenarios for power granite models

anhuong · 2024-10-08T16:34:51Z

plugins/accelerated-peft/src/fms_acceleration_peft/autogptq_utils.py

+            )
+
+            tm = TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING[model_type]
+        except (ImportError, IndexError) as e:


If the model_type isn't able to be found then it will pass it in as None and then when trying to do TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING[None] this leads to a KeyError. I think this should also be captured in this error. The same would happen if a model_type is given that is not in the dictionary. I don't know if we would hit IndexError

Ah thanks for his catch. i intended to use KeyError but somehow mistook it as an IndexError. The latter is for arrays but here we are dealing with a dict. I have made the change.

anhuong · 2024-10-08T23:12:55Z

I confirmed on my pod with fms-acceleration and fms-acceleration-peft plugins that I got error AssertionError: target modules can only be list, set or string.

I installed fms-acceleration both the framework plugin and the accelerated-peft plugin from this branch. I then ran qlora with config below and it ran successfully:

{
  "model_name_or_path": "/fmaas-integration-tests/models/granite-8b-code-instruct-gptq/",
  "training_data_path": "/fmaas-integration-tests/tuning/input/twitter-complaints.json",
  "output_dir": "/fmaas-integration-tests/tuning/output/anhuong/granite-20b-gptq-qlora-twitter_202401008_1706",
  "num_train_epochs": 10.0,
  "per_device_train_batch_size": 4,
  "gradient_accumulation_steps": 1,
  "learning_rate": 1e-4,
  "response_template": "\n### Label:",
  "dataset_text_field": "output",
  "peft_method": "lora",
  "r": 8,
  "lora_dropout": 0.05,
  "lora_alpha": 16,
  "max_seq_length": 4096,
  "lora_post_process_for_vllm": true,
  "auto_gptq": ["triton_v2"],
  "torch_dtype": "float16",
  "fp16": true
}

In the adapter_config.json you can see it used the default target modules:

"target_modules": [
    "v_proj",
    "q_proj"
  ],

I do see these messages and I'm curious what autogptq_patch_tensors_as_float_parameters is?

***** FMS AccelerationFramework *****
INFO:framework.py:***** FMS AccelerationFramework *****
Active Plugin: AutoGPTQAccelerationPlugin. Python package: fms_acceleration_peft. Version: 0.3.0.dev0.
INFO:framework.py:Active Plugin: AutoGPTQAccelerationPlugin. Python package: fms_acceleration_peft. Version: 0.3.0.dev0.
***************** Module Forwards Patching *************
INFO:framework.py:***************** Module Forwards Patching *************
Rule: autogptq_patch_tensors_as_float_parameters Module: base_layer                Class: QuantLinear     Num: 72
INFO:framework.py:Rule: autogptq_patch_tensors_as_float_parameters Module: base_layer                Class: QuantLinear     Num: 72
Rule: autogptq_patch_tensors_as_float_parameters Module: down_proj                 Class: QuantLinear     Num: 36
INFO:framework.py:Rule: autogptq_patch_tensors_as_float_parameters Module: down_proj                 Class: QuantLinear     Num: 36
Rule: autogptq_patch_tensors_as_float_parameters Module: gate_proj                 Class: QuantLinear     Num: 36
INFO:framework.py:Rule: autogptq_patch_tensors_as_float_parameters Module: gate_proj                 Class: QuantLinear     Num: 36
Rule: autogptq_patch_tensors_as_float_parameters Module: k_proj                    Class: QuantLinear     Num: 36
INFO:framework.py:Rule: autogptq_patch_tensors_as_float_parameters Module: k_proj                    Class: QuantLinear     Num: 36
Rule: autogptq_patch_tensors_as_float_parameters Module: o_proj                    Class: QuantLinear     Num: 36
INFO:framework.py:Rule: autogptq_patch_tensors_as_float_parameters Module: o_proj                    Class: QuantLinear     Num: 36
Rule: autogptq_patch_tensors_as_float_parameters Module: up_proj                   Class: QuantLinear     Num: 36
INFO:framework.py:Rule: autogptq_patch_tensors_as_float_parameters Module: up_proj                   Class: QuantLinear     Num: 36
***************** Accelerator Patching *************
INFO:framework.py:***************** Accelerator Patching *************
Currently training with a batch size of: 4
***** Running training *****
  Num examples = 50
  Num Epochs = 10
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 70
  Number of trainable parameters = 1,916,928
  0%|                                                                              | 0/70 [00:00<?, ?it/s]Detected flash_attn version: 2.6.3
{'loss': 6.4873, 'grad_norm': 3.5396203994750977, 'learning_rate': 9e-05, 'epoch': 1.0}

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fabianlim · 2024-10-08T23:36:52Z

@anhuong autogptq_patch_tensors_as_float_parameters is a rule that will be registered only when the world_size > 2. This is for FSDP sharding of the quantized weights

Quant weights are in int32, whereby FSDP only supports float types (I think its because torch.distributed primitives do not support non-floats). Hence

we need to upfront convert them to float types (in this case torch_dtype, which is the type passed by user), so sharding is possible.
then before the forward call, we need to view them into the int32 dtype, so it can be passed to the dequant.

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

anhuong

Thanks Fabian for the changes and the explanation! This looks great!

fabianlim added 2 commits October 8, 2024 07:49

fallback to target module defaults

28fef68

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

disable cache when convering gptq_v1 to v2

53f9dc5

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fabianlim force-pushed the fixes/peft branch from 77feb4b to 53f9dc5 Compare October 8, 2024 12:24

adjust scenarios-granite.yaml

9b881f1

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

willmj reviewed Oct 8, 2024

View reviewed changes

plugins/accelerated-peft/src/fms_acceleration_peft/autogptq_utils.py Show resolved Hide resolved

anhuong reviewed Oct 8, 2024

View reviewed changes

change index to key error (anhuong)

c5d9fd0

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

correct the previous commit.

a9622cf

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

anhuong approved these changes Oct 9, 2024

View reviewed changes

anhuong merged commit b43c35c into main Oct 9, 2024
6 checks passed

fabianlim deleted the fixes/peft branch October 11, 2024 00:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes to Accelerated Peft #89

Fixes to Accelerated Peft #89

fabianlim commented Oct 8, 2024 •

edited

Loading

anhuong left a comment

anhuong Oct 8, 2024

fabianlim Oct 8, 2024 •

edited

Loading

anhuong commented Oct 8, 2024 •

edited

Loading

fabianlim commented Oct 8, 2024 •

edited

Loading

anhuong left a comment •

edited

Loading

Fixes to Accelerated Peft #89

Fixes to Accelerated Peft #89

Conversation

fabianlim commented Oct 8, 2024 • edited Loading

anhuong left a comment

Choose a reason for hiding this comment

anhuong Oct 8, 2024

Choose a reason for hiding this comment

fabianlim Oct 8, 2024 • edited Loading

Choose a reason for hiding this comment

anhuong commented Oct 8, 2024 • edited Loading

fabianlim commented Oct 8, 2024 • edited Loading

anhuong left a comment • edited Loading

Choose a reason for hiding this comment

fabianlim commented Oct 8, 2024 •

edited

Loading

fabianlim Oct 8, 2024 •

edited

Loading

anhuong commented Oct 8, 2024 •

edited

Loading

fabianlim commented Oct 8, 2024 •

edited

Loading

anhuong left a comment •

edited

Loading