-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate Colossal.AI Engine #12733
Comments
Hey @siddk that would be awesome. Would you be interested to contribute? I believe that you are right and the integration should be very similar to DeepSpeed. Maybe it is even possible to factor out some common utilities. We would really appreciate your contribution and definitely help you to iterate over an initial PR :) |
Hey @justusschock - I wish I had the time, but this would be a high-value feature for both some of the work we're doing on training large models at Stanford (https://crfm.stanford.edu/), and a crucial part of several large-scale projects. If there was 1-2 members of the team (or other contributors), I'd definitely be down to help test, and provide feedback, I just don't think I'd be able to take on all of this alone. |
A good suggestion! Any progress now? |
Hey @marsggbo @siddk @rohitgr7, I reached out to Collosal-AI Team hpcaitech/ColossalAI#1330. We will keep you updated on progresses. Best, |
Hi, thank you for your attention and advice, Colossal-AI Team is willing to help provide support for this feature as soon as possible. |
🚀 Feature
Similar to the DeepSpeed/FairScale integrations, it'd be really cool for PyTorch Lightning to expose an API for integrating Colossal.AI.
Motivation
Having 3D parallelism and other optimizations as a simple plugin for PyTorch Lightning would make scaling large models super easy!
Pitch
Colossal AI exposes a simple engine-based API similar to DeepSpeed (https://www.colossalai.org/docs/basics/engine_trainer#engine)... should be straightforward to integrate?
cc @Borda @akihironitta @justusschock
The text was updated successfully, but these errors were encountered: