Fix Low CPU Memory Mode Issues for Quantized Peft #90

fabianlim · 2024-10-11T03:55:55Z

Since transformers >=0.45 we are having issues with low_cpu_mem_mode as described in #83 . This PR address them

remove disabling of low_cpu_mem_mode in BNBAccelerationPlugin
remove disabling of low_cpu_mem_mode in AutoGPTQAccelerationPlugin, including update of README
address Issue 2 in Distributed Training Problems for QLoRA models with Transformers pre-release 4.45 #83 with FOAK.

Note: this PR fixes will apply only if the following commits are applied:

Fix Inconsistency with IsShardedQLoRA Setting huggingface/trl#2089
Fix excessive CPU memory usage with FSDP and cpu_ram_efficient_loading huggingface/transformers#33154

Regressionss

Mistral 7B

Granite 3b

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

anhuong

Left a few questions for my understanding, I don't know if I'm the best person to review this change but overall if the changes were tested and it successfully loaded the model for GPTQ with low_cpu_mem_usage set then that seems good to me

plugins/accelerated-peft/src/fms_acceleration_peft/gptqmodel/models/base.py

plugins/fused-ops-and-kernels/src/fms_acceleration_foak/framework_plugin_fast_quantized_peft.py

plugins/accelerated-peft/src/fms_acceleration_peft/framework_plugin_autogptq.py

anhuong

Thanks for answering my questions, this looks good to me but might want someone else to give it a closer review

fabianlim marked this pull request as draft October 11, 2024 03:56

fabianlim force-pushed the fix/quant branch from f2c5e42 to dce3c00 Compare October 14, 2024 02:55

fabianlim linked an issue Oct 14, 2024 that may be closed by this pull request

Distributed Training Problems for QLoRA models with Transformers pre-release 4.45 #83

Closed

fabianlim added 7 commits October 14, 2024 06:24

address issue 2 in #83

aacd5bd

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

properly handle broadcast of adapters

1e99ee9

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

handle param_init_fn_tied_param

0f5c9ae

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

trl version error

18aa160

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

tied weights fix and meta fix for autogptq

e02993d

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

update readme

510e0d1

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fmt + lint

a85faac

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fabianlim force-pushed the fix/quant branch from 343c852 to a85faac Compare October 14, 2024 06:24

fabianlim requested a review from anhuong October 14, 2024 06:25

fabianlim marked this pull request as ready for review October 14, 2024 06:25

upgrade granite benches

5a24f02

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

anhuong reviewed Oct 16, 2024

View reviewed changes

anhuong approved these changes Oct 17, 2024

View reviewed changes

fabianlim merged commit fc78b55 into main Oct 17, 2024
6 checks passed

fabianlim deleted the fix/quant branch October 17, 2024 23:12

This was referenced Oct 25, 2024

Apply Retie Weights Fix Regardless of Transformers and TRL version for AutoGPTQ #94

Merged

Fix Issue with Resizing Parameters on the Meta Device in Low CPU Mem Mode #96

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Low CPU Memory Mode Issues for Quantized Peft #90

Fix Low CPU Memory Mode Issues for Quantized Peft #90

fabianlim commented Oct 11, 2024 •

edited

Loading

anhuong left a comment

anhuong left a comment

Fix Low CPU Memory Mode Issues for Quantized Peft #90

Fix Low CPU Memory Mode Issues for Quantized Peft #90

Conversation

fabianlim commented Oct 11, 2024 • edited Loading

Regressionss

anhuong left a comment

Choose a reason for hiding this comment

anhuong left a comment

Choose a reason for hiding this comment

fabianlim commented Oct 11, 2024 •

edited

Loading