-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes to Accelerated Peft #89
Conversation
Signed-off-by: Yu Chin Fabian Lim <[email protected]>
Signed-off-by: Yu Chin Fabian Lim <[email protected]>
Signed-off-by: Yu Chin Fabian Lim <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes for removing offload_state_dict and offload_buffers looks good as well as using the mapping from Peft. Small comment on the error and I can try to test this change on a small model to ensure it works with setting target_modules=None. Also note there are a few small changes to the benchmark scenarios for power granite models
) | ||
|
||
tm = TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING[model_type] | ||
except (ImportError, IndexError) as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the model_type isn't able to be found then it will pass it in as None and then when trying to do TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING[None]
this leads to a KeyError. I think this should also be captured in this error. The same would happen if a model_type is given that is not in the dictionary. I don't know if we would hit IndexError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks for his catch. i intended to use KeyError
but somehow mistook it as an IndexError
. The latter is for arrays but here we are dealing with a dict. I have made the change.
I confirmed on my pod with fms-acceleration and fms-acceleration-peft plugins that I got error I installed fms-acceleration both the framework plugin and the accelerated-peft plugin from this branch. I then ran qlora with config below and it ran successfully: {
"model_name_or_path": "/fmaas-integration-tests/models/granite-8b-code-instruct-gptq/",
"training_data_path": "/fmaas-integration-tests/tuning/input/twitter-complaints.json",
"output_dir": "/fmaas-integration-tests/tuning/output/anhuong/granite-20b-gptq-qlora-twitter_202401008_1706",
"num_train_epochs": 10.0,
"per_device_train_batch_size": 4,
"gradient_accumulation_steps": 1,
"learning_rate": 1e-4,
"response_template": "\n### Label:",
"dataset_text_field": "output",
"peft_method": "lora",
"r": 8,
"lora_dropout": 0.05,
"lora_alpha": 16,
"max_seq_length": 4096,
"lora_post_process_for_vllm": true,
"auto_gptq": ["triton_v2"],
"torch_dtype": "float16",
"fp16": true
} In the adapter_config.json you can see it used the default target modules: "target_modules": [
"v_proj",
"q_proj"
], I do see these messages and I'm curious what ***** FMS AccelerationFramework *****
INFO:framework.py:***** FMS AccelerationFramework *****
Active Plugin: AutoGPTQAccelerationPlugin. Python package: fms_acceleration_peft. Version: 0.3.0.dev0.
INFO:framework.py:Active Plugin: AutoGPTQAccelerationPlugin. Python package: fms_acceleration_peft. Version: 0.3.0.dev0.
***************** Module Forwards Patching *************
INFO:framework.py:***************** Module Forwards Patching *************
Rule: autogptq_patch_tensors_as_float_parameters Module: base_layer Class: QuantLinear Num: 72
INFO:framework.py:Rule: autogptq_patch_tensors_as_float_parameters Module: base_layer Class: QuantLinear Num: 72
Rule: autogptq_patch_tensors_as_float_parameters Module: down_proj Class: QuantLinear Num: 36
INFO:framework.py:Rule: autogptq_patch_tensors_as_float_parameters Module: down_proj Class: QuantLinear Num: 36
Rule: autogptq_patch_tensors_as_float_parameters Module: gate_proj Class: QuantLinear Num: 36
INFO:framework.py:Rule: autogptq_patch_tensors_as_float_parameters Module: gate_proj Class: QuantLinear Num: 36
Rule: autogptq_patch_tensors_as_float_parameters Module: k_proj Class: QuantLinear Num: 36
INFO:framework.py:Rule: autogptq_patch_tensors_as_float_parameters Module: k_proj Class: QuantLinear Num: 36
Rule: autogptq_patch_tensors_as_float_parameters Module: o_proj Class: QuantLinear Num: 36
INFO:framework.py:Rule: autogptq_patch_tensors_as_float_parameters Module: o_proj Class: QuantLinear Num: 36
Rule: autogptq_patch_tensors_as_float_parameters Module: up_proj Class: QuantLinear Num: 36
INFO:framework.py:Rule: autogptq_patch_tensors_as_float_parameters Module: up_proj Class: QuantLinear Num: 36
***************** Accelerator Patching *************
INFO:framework.py:***************** Accelerator Patching *************
Currently training with a batch size of: 4
***** Running training *****
Num examples = 50
Num Epochs = 10
Instantaneous batch size per device = 4
Total train batch size (w. parallel, distributed & accumulation) = 8
Gradient Accumulation steps = 1
Total optimization steps = 70
Number of trainable parameters = 1,916,928
0%| | 0/70 [00:00<?, ?it/s]Detected flash_attn version: 2.6.3
{'loss': 6.4873, 'grad_norm': 3.5396203994750977, 'learning_rate': 9e-05, 'epoch': 1.0} |
Signed-off-by: Yu Chin Fabian Lim <[email protected]>
@anhuong Quant weights are in
|
Signed-off-by: Yu Chin Fabian Lim <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Fabian for the changes and the explanation! This looks great!
This PR provides two fixes
--target_modules
happen to be omitted from thepeft_config
low_cpu_mem_mode
working for GPTQ, and not worry about offloading the state dict (as it is a hack in itself).However do note that for
low_cpu_mem_mode
we have observed some problems in the BNB case for certain models (e.g.,GraniteCausalForLM
), see #83