-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ColossalAI strategy #14224
Add ColossalAI strategy #14224
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work so far!
High-level review.
Are you planning to add docs in this PR or another?
Dockers are hopefully fixed. Will check in the morning |
@rohitgr7, dockers are building. Docker image build of However, there is one GPU test, that is consistently failing, and I think it was written by @awaelchli, do you think you can take a look there, please? |
So, I found the issue with the failing test. In #14984 we started making sure, that the fork process is not poisoned by CUDA calls. However, just importing cc @carmocca to give a decision on whether it's okay enough solution |
This reverts commit 754cfd7.
Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: HELSON <[email protected]> Co-authored-by: rohitgr7 <[email protected]> Co-authored-by: otaj <[email protected]> Co-authored-by: Carlos Mocholí <[email protected]>
What does this PR do?
Fixes hpcaitech/ColossalAI#1330
Fixes #12733
Add ColossalAI strategy which supports ZeRO-DP with chunk-based memory management.
Does your PR introduce any breaking changes? If yes, please list them.
No.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃