Make new optimizer more extensible, easier to integrate downstream for FSDP #181

muellerzr · 2024-08-15T15:16:43Z

Description
This PR makes it easier for users to use FSDP with MS-AMP from their existing optimizers. This is especially beneficial for library authors, as currently we need to go through quite a bit to get the FSDP version of these optimizers working when a user passes in optim.Adam.

Instead we delegate the FSDPAdamW to an OptimWrapper, which calls an underlying optimizer as a passthrough. This lets us add in any logic that should be done before/after said logic easier, and it takes in a constructed Optimizer rather than being inherited.

Let me know what we think about this, currently I'm going through integrating FSDP and DeepSpeed w/ MS-AMP into Accelerate and found this to be a critical painpoint, as our users pass in normal PyTorch optimizers and don't create special versions themselves.

@tocean @wkcn let me know what you two think :)

New working FSDP:

model, optimizer = ...
model, optimizer = msamp.initialize(model, optimizer, use_fsdp=True, weight_qtype=Dtypes.kfloat8_e4m3)
model = FP8FullyShardedDataParallel(model, use_orig_params=True, auto_wrap_policy=my_auto_wrap_policy)
optimizer = FSDPAdamW(optimizer)

muellerzr · 2024-08-15T15:19:24Z

@microsoft-github-policy-service agree company="Hugging Face"

muellerzr · 2024-08-15T16:09:09Z

Sorry for the extraneous pushes while I was figuring something out. Good to go now :)

muellerzr · 2024-08-15T18:45:32Z

You can see our new accelerate benchmarking scripts here: https://github.com/huggingface/accelerate/tree/muellerzr-msamp-ds-fsdp/benchmarks/fp8/ms_amp

muellerzr · 2024-08-21T17:35:56Z

@tocean @wkcn any particular issues with this? :)

(Ideally it'd be great to include this in the next accelerate release on the 1st :) )

wkcn · 2024-08-22T01:35:43Z

@muellerzr Thanks for your contribution!

The PR looks good to me.
Sorry that I am not at Microsoft and do not have the authorization to review and merge the pull request.

muellerzr · 2024-08-22T12:37:57Z

Ack okay, I suppose we'll have to wait for @tocean /@abuccts /@guoshzhao to take a look. Thanks for the flag 🤗

muellerzr added 5 commits August 15, 2024 10:54

Make extensible

8ff00fb

Continue

ee673bb

Continue

03d05ed

Update example

28cc3b7

update example

c964944

muellerzr added 2 commits August 15, 2024 11:31

Update initialize to include weight qtype

a544d9d

Continue working through it

9120db3

muellerzr marked this pull request as draft August 15, 2024 15:41

muellerzr added 4 commits August 15, 2024 11:51

Continue trying

f127f45

Try with new patch

77ee41a

use cast_model

fac7f9a

Fin

ed5b86f

muellerzr mentioned this pull request Aug 15, 2024

MS-AMP support for FSDP huggingface/accelerate#2972

Closed

muellerzr added 4 commits August 15, 2024 12:16

Actually cast optimizer

ef1c6d3

Continuing debugging

567c267

Truly fin

8c563fd

Include dtypes

bed7bd0

muellerzr marked this pull request as ready for review August 15, 2024 16:23

muellerzr added 4 commits August 15, 2024 12:26

Import err

c979229

dtypes -> dtype

b903d13

Closure explicitly?

ffaabc9

Give it closure

15b9a78

muellerzr mentioned this pull request Aug 16, 2024

Fixup MS-AMP integration huggingface/accelerate#3023

Closed

5 tasks

muellerzr mentioned this pull request Sep 9, 2024

MS-AMP support (w/o FSDP) huggingface/accelerate#3093

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make new optimizer more extensible, easier to integrate downstream for FSDP #181

Make new optimizer more extensible, easier to integrate downstream for FSDP #181

muellerzr commented Aug 15, 2024 •

edited

Loading

muellerzr commented Aug 15, 2024

muellerzr commented Aug 15, 2024 •

edited

Loading

muellerzr commented Aug 15, 2024

muellerzr commented Aug 21, 2024

wkcn commented Aug 22, 2024

muellerzr commented Aug 22, 2024 •

edited

Loading

Make new optimizer more extensible, easier to integrate downstream for FSDP #181

Are you sure you want to change the base?

Make new optimizer more extensible, easier to integrate downstream for FSDP #181

Conversation

muellerzr commented Aug 15, 2024 • edited Loading

muellerzr commented Aug 15, 2024

muellerzr commented Aug 15, 2024 • edited Loading

muellerzr commented Aug 15, 2024

muellerzr commented Aug 21, 2024

wkcn commented Aug 22, 2024

muellerzr commented Aug 22, 2024 • edited Loading

muellerzr commented Aug 15, 2024 •

edited

Loading

muellerzr commented Aug 15, 2024 •

edited

Loading

muellerzr commented Aug 22, 2024 •

edited

Loading