Add Support for Gradient Checkpointing #759

lenglaender · 2024-11-07T12:34:33Z

Add Support for Gradient Checkpointing

This PR adds support for gradient checkpointing Gradient checkpointing is a technique that trades computation for memory by recomputing intermediate activations during the backward pass instead of storing them. This is particularly useful when training large models. Because we recompute values during the backpropagation, we need to preserve the original ForwardContext in this phase. I solved this by overwriting the gradient_checkpointing_enable function so that the checkpoint function receives the current ForwardContext as the backward pass context manager.

- oerwrite the gradient_checkpointing_enable to provide our ForwardContext during the recomputation of values during backpropagation - 2 bugs remaining: bottleneck adapter for models with the legacy implementation (BERT) & Parallel. Parallel has the problem that we manipulate the batch dimension and this currently leads to an error

docs & style & fixes - albert: skip unsupported tests - deberta(V2): fix embedding bug with inplace operations. - deberta: fix LoRAMergedLinear Bug with device mismatch

calpt

Thanks a lot for digging into this and enabling compatibility w adapters, this looks great! Just a couple of small comments before we're good to merge

src/adapters/models/deberta_v2/modeling_deberta_v2.py

tests/methods/base.py

src/adapters/model_mixin.py

lenglaender · 2025-01-21T19:25:50Z

Hey @calpt, can you quickly review my replies to your comments and approve the PR if everything is alright so I can merge this? We need to merge this PR first, then add all new tests to test refactoring #740, and then we can merge #763 (because Julian has already merged the test refactoring into his PR). So, currently, this PR is blocking everything we want to have in the next release.

akatief added a commit to akatief/adapters that referenced this pull request Nov 11, 2024

fix: apply & change @lenglaender solution in adapter-hub#759

17fb225

lenglaender and others added 3 commits November 11, 2024 22:51

minor fix but doesn't resolve the remaining issues

cb07dd4

Only run adjust_tensors_for_parallel_ if bsz is different

8421f63

remove parallel grad checkpointing test

94df2fe

calpt force-pushed the fix/gradient_checkpointing branch from 5edfd4d to 94df2fe Compare November 25, 2024 22:09

calpt linked an issue Dec 7, 2024 that may be closed by this pull request

ForwardContext is None with gradient checkpointing enabled #732

Closed

Merge branch 'main' into fix/gradient_checkpointing

e1a6f71

lenglaender mentioned this pull request Jan 7, 2025

Upgrade Transformers to v4.47.x #776

Merged

lenglaender added 6 commits January 9, 2025 15:48

Merge branch 'main' into fix/gradient_checkpointing

db5bd27

fix deberta and albert tests.

ea5b68f

docs & style & fixes - albert: skip unsupported tests - deberta(V2): fix embedding bug with inplace operations. - deberta: fix LoRAMergedLinear Bug with device mismatch

Add docs and notebook

5999bb7

Fix all remaining bugs (T5 and BeIT HF bug)

b89b941

fix handling of CLIP and fix MT5 like we did for T5

e7de20b

fix CLIP

0d3f0a1

lenglaender marked this pull request as ready for review January 14, 2025 10:49

lenglaender changed the title ~~WIP: Add Support for Gradient Checkpointing~~ Add Support for Gradient Checkpointing Jan 14, 2025

lenglaender requested review from calpt and TimoImhof January 14, 2025 10:50

calpt reviewed Jan 19, 2025

View reviewed changes

src/adapters/models/deberta_v2/modeling_deberta_v2.py Show resolved Hide resolved

tests/methods/base.py Show resolved Hide resolved

src/adapters/model_mixin.py Show resolved Hide resolved

lenglaender added 2 commits January 20, 2025 23:43

Merge branch 'main' into fix/gradient_checkpointing

77e4241

Add comments

c7e178b

calpt approved these changes Jan 22, 2025

View reviewed changes

lenglaender mentioned this pull request Jan 26, 2025

Fix Gradient Checkpointing for Deberta & Deberta-V2 using PEFT / Adapters huggingface/transformers#35898

Open

5 tasks

lenglaender merged commit adef6dc into adapter-hub:main Jan 26, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Support for Gradient Checkpointing #759

Add Support for Gradient Checkpointing #759

lenglaender commented Nov 7, 2024 •

edited

Loading

calpt left a comment

lenglaender commented Jan 21, 2025 •

edited

Loading

Add Support for Gradient Checkpointing #759

Add Support for Gradient Checkpointing #759

Conversation

lenglaender commented Nov 7, 2024 • edited Loading

Add Support for Gradient Checkpointing

calpt left a comment

Choose a reason for hiding this comment

lenglaender commented Jan 21, 2025 • edited Loading

lenglaender commented Nov 7, 2024 •

edited

Loading

lenglaender commented Jan 21, 2025 •

edited

Loading