-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plan to support FSDP2? #2873
Comments
Thanks for bringing FSDP2 to our (or at least my) attention. The changes described in the document you linked sound very reasonable and could remove some of the common pain points of using FSDP. Reading this, it got the impression that this is a very new addition to PyTorch. When searching for |
Thanks @BenjaminBossan! If I understand correctly, PyTorch team wants to replace FSDP1 with FSDP2 in the long term. |
Very sorry for the confusion! There are two separate functions called We proposed FSDP2 as prototype for 2.4 release, and we are investing in it heavily. |
Thanks a lot for clarifying my confusion. In that case, I think it makes sense to wait until FSDP2 is released and then run experiments with accelerate to see how it can be best supported. |
The main worry with FSDPv2 is if it's stable enough that it makes sense to include it in Accelerate. At the worst case, we can keep a draft PR open and/or an experimental feature (and advertise it as such). So my main question is:
I planned on looking into FSDP2 in the near future anyways, so I'm open to having some early-ish support in Accelerate for it as long as I can get a full grasp of how long into the development it is. (We did something similar with PiPPy, so okay do so here too) I know we need to do some heavy uprooting to add in custom process support into Accelerate, which I believe FSDP2 relies on if I'm not mistaken? |
What'd be helpful on my end is some bare-bones FSDP2 examples in PyTorch with how things are operating end-to-end |
Barebones example of fsdpv2 is available in https://github.com/pytorch/torchtitan. |
Thanks @raghukiran1224 :) Yes indeed I plan on looking into these w/ some of the torch folks. It's in our close future to get something small going. (Probably highly experimental, since they're still not settled with things yet) |
We have been looking at this, will be happy to help in bringing in FSDP2 as experimental parallel to accelerate. RFC PR - #3231 |
look forward to more generic N-D parallel (device mesh, TP, CP) support instead of fsdp2 only. I have implemented a simple
|
FSDP2 provides smaller memory footprint, compatibility with torch compile, and more flexibility due to per param sharding. Does huggingface have plan to support FSDP2?
https://github.com/pytorch/torchtitan/blob/main/docs/fsdp.md
The text was updated successfully, but these errors were encountered: