Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to Accelerated Peft #89

Merged
merged 5 commits into from
Oct 9, 2024
Merged

Fixes to Accelerated Peft #89

merged 5 commits into from
Oct 9, 2024

Conversation

fabianlim
Copy link
Contributor

@fabianlim fabianlim commented Oct 8, 2024

This PR provides two fixes

  1. fallback to target module defaults. To keep in line with regular HF behavior, we fallback to HF default target modules if --target_modules happen to be omitted from the peft_config
  2. disabling the gptq v1 to v2 conversion: we revert back to default HF behavior and disable offloading. Upon reviewing the accelerate.load_checkpoint_in_model i know start to feel the offloading may not be needed moving forward. The main purpose of that function is to read the (sharded) checkpoint and load it into the model (on the CPU, since low_cpu_mem_mode is not working for GPTQ). I think the proper way forward is to get low_cpu_mem_mode working for GPTQ, and not worry about offloading the state dict (as it is a hack in itself).

However do note that for low_cpu_mem_mode we have observed some problems in the BNB case for certain models (e.g., GraniteCausalForLM), see #83

Signed-off-by: Yu Chin Fabian Lim <[email protected]>
Copy link
Collaborator

@anhuong anhuong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes for removing offload_state_dict and offload_buffers looks good as well as using the mapping from Peft. Small comment on the error and I can try to test this change on a small model to ensure it works with setting target_modules=None. Also note there are a few small changes to the benchmark scenarios for power granite models

)

tm = TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING[model_type]
except (ImportError, IndexError) as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the model_type isn't able to be found then it will pass it in as None and then when trying to do TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING[None] this leads to a KeyError. I think this should also be captured in this error. The same would happen if a model_type is given that is not in the dictionary. I don't know if we would hit IndexError

Copy link
Contributor Author

@fabianlim fabianlim Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for his catch. i intended to use KeyError but somehow mistook it as an IndexError. The latter is for arrays but here we are dealing with a dict. I have made the change.

@anhuong
Copy link
Collaborator

anhuong commented Oct 8, 2024

I confirmed on my pod with fms-acceleration and fms-acceleration-peft plugins that I got error AssertionError: target modules can only be list, set or string.

I installed fms-acceleration both the framework plugin and the accelerated-peft plugin from this branch. I then ran qlora with config below and it ran successfully:

{
  "model_name_or_path": "/fmaas-integration-tests/models/granite-8b-code-instruct-gptq/",
  "training_data_path": "/fmaas-integration-tests/tuning/input/twitter-complaints.json",
  "output_dir": "/fmaas-integration-tests/tuning/output/anhuong/granite-20b-gptq-qlora-twitter_202401008_1706",
  "num_train_epochs": 10.0,
  "per_device_train_batch_size": 4,
  "gradient_accumulation_steps": 1,
  "learning_rate": 1e-4,
  "response_template": "\n### Label:",
  "dataset_text_field": "output",
  "peft_method": "lora",
  "r": 8,
  "lora_dropout": 0.05,
  "lora_alpha": 16,
  "max_seq_length": 4096,
  "lora_post_process_for_vllm": true,
  "auto_gptq": ["triton_v2"],
  "torch_dtype": "float16",
  "fp16": true
}

In the adapter_config.json you can see it used the default target modules:

"target_modules": [
    "v_proj",
    "q_proj"
  ],

I do see these messages and I'm curious what autogptq_patch_tensors_as_float_parameters is?

***** FMS AccelerationFramework *****
INFO:framework.py:***** FMS AccelerationFramework *****
Active Plugin: AutoGPTQAccelerationPlugin. Python package: fms_acceleration_peft. Version: 0.3.0.dev0.
INFO:framework.py:Active Plugin: AutoGPTQAccelerationPlugin. Python package: fms_acceleration_peft. Version: 0.3.0.dev0.
***************** Module Forwards Patching *************
INFO:framework.py:***************** Module Forwards Patching *************
Rule: autogptq_patch_tensors_as_float_parameters Module: base_layer                Class: QuantLinear     Num: 72
INFO:framework.py:Rule: autogptq_patch_tensors_as_float_parameters Module: base_layer                Class: QuantLinear     Num: 72
Rule: autogptq_patch_tensors_as_float_parameters Module: down_proj                 Class: QuantLinear     Num: 36
INFO:framework.py:Rule: autogptq_patch_tensors_as_float_parameters Module: down_proj                 Class: QuantLinear     Num: 36
Rule: autogptq_patch_tensors_as_float_parameters Module: gate_proj                 Class: QuantLinear     Num: 36
INFO:framework.py:Rule: autogptq_patch_tensors_as_float_parameters Module: gate_proj                 Class: QuantLinear     Num: 36
Rule: autogptq_patch_tensors_as_float_parameters Module: k_proj                    Class: QuantLinear     Num: 36
INFO:framework.py:Rule: autogptq_patch_tensors_as_float_parameters Module: k_proj                    Class: QuantLinear     Num: 36
Rule: autogptq_patch_tensors_as_float_parameters Module: o_proj                    Class: QuantLinear     Num: 36
INFO:framework.py:Rule: autogptq_patch_tensors_as_float_parameters Module: o_proj                    Class: QuantLinear     Num: 36
Rule: autogptq_patch_tensors_as_float_parameters Module: up_proj                   Class: QuantLinear     Num: 36
INFO:framework.py:Rule: autogptq_patch_tensors_as_float_parameters Module: up_proj                   Class: QuantLinear     Num: 36
***************** Accelerator Patching *************
INFO:framework.py:***************** Accelerator Patching *************
Currently training with a batch size of: 4
***** Running training *****
  Num examples = 50
  Num Epochs = 10
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 70
  Number of trainable parameters = 1,916,928
  0%|                                                                              | 0/70 [00:00<?, ?it/s]Detected flash_attn version: 2.6.3
{'loss': 6.4873, 'grad_norm': 3.5396203994750977, 'learning_rate': 9e-05, 'epoch': 1.0}       

Signed-off-by: Yu Chin Fabian Lim <[email protected]>
@fabianlim
Copy link
Contributor Author

fabianlim commented Oct 8, 2024

@anhuong autogptq_patch_tensors_as_float_parameters is a rule that will be registered only when the world_size > 2. This is for FSDP sharding of the quantized weights

Quant weights are in int32, whereby FSDP only supports float types (I think its because torch.distributed primitives do not support non-floats). Hence

  • we need to upfront convert them to float types (in this case torch_dtype, which is the type passed by user), so sharding is possible.
  • then before the forward call, we need to view them into the int32 dtype, so it can be passed to the dequant.

Signed-off-by: Yu Chin Fabian Lim <[email protected]>
Copy link
Collaborator

@anhuong anhuong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Fabian for the changes and the explanation! This looks great!

@anhuong anhuong merged commit b43c35c into main Oct 9, 2024
6 checks passed
@fabianlim fabianlim deleted the fixes/peft branch October 11, 2024 00:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants