-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Community Integration: Colossal-AI for Large AI Models #18624
Comments
If you have any difficulties or concerns, please let me know. |
I haven't had a change to read on Colossal-AI yet, why do you believe it's much better based on your research, @flozi00? I did notice that it suggests the integration of PatrickStar's functionality. CAI appears to be its own eco-system - not sure how easy it'd be to integrate with our eco-system. |
In terms of code, it looks very similar to a normal pytorch training loop I already noticed some time ago, that is was for a range of time in the trends of paperswithcode The benchmarks looks pretty nice on the first take, but are a little bit confusing too. In any case, I think it's not bad to test alternatives to deepspeed. |
Thank you for sharing your insights, @flozi00! I read their paper and I'm not quite sure of what type of integration is proposed here. Unlike Deepspeed which is meant to be integrated with the user code, CAI seems to be a standalone solution. One of the biggest issues with any parallelism proposals (other than DDP) is that they all require rewriting the model's code, which with 100+ models and growing in our arsenal would be prohibitively expensive. Therefore we always welcome automated solutions like Deepspeed which require no changes whatsoever to most models and sometimes a small tweak for some peculiar models. It's definitely worth exploring all the different versions of TP (2/2.5/3D) mentioned in the paper, but we need this automated and not manually rewritten. The paper briefly mentions PP, but as we all know this one definitely requires a complete rewrite of the model for most frameworks. So again let's ask a very concrete question - other than being part of the HF ecosystem what is the vision for the proposed integration? We already have 2 trainer loop systems (HF Trainer and Accelerate) and we won't want to maintain a 3rd one. Do you need to inject something into the Do you propose to rewrite the models to support? Perhaps let's take one HF Transformers model of your choice and tell us what would you like to do with it to have it run on CAI? This would be more practical. and specifically to your interest @flozi00 - yes, I hear you like the advanced memory utilization proposed in PatrickStar and CAI suggests to have integrated that functionality. I hope my commentary was constructive, we are definitely open for good improvements to our tools. It's just I'm weary to add yet another tool unless a clear advantage and ease of integration can be shown. |
Also, let's ping @hyunwoongko - Kevin, I know you have studied many frameworks while building https://github.com/tunib-ai/oslo - have you by chance researched Colossal-AI on your journey? If you did, would you kindly share a few insights if you have any? I know you were cherry picking the best parts from many systems in addition to your own innovations. |
I'm sorry to admit that I didn't think of the backwards compatibility, totally forgot about that point, sorry. I focused mainly on the integration in the trainer and did not include the now very many architectures and weights. Maybe CAI has an idea to automate that ? I have some ideas in mind but that would be more part of CAI itself or third party tools, about finding JIT methods to convert the required model parts, instead of the HF integration. |
No harm done. This is totally understandable - the HF transformers eco-system has been becoming more and more complex so often it's far from trivial to add yet another component to it. We are super welcoming solutions that can automate performance enhancements (like torchdynamo - see below).
PL is a training framework/loop, last I looked they didn't have the model library and were using transformers, so they don't need to deal with modeling.
there is already work being done on that with torchdynamo/nvfuser - it's not fully stable yet, but shows some impressive speed ups (and lower memory usage) for converting normal pytorch code to fused kernels - but this is a different dimension to parallelism and advanced memory management systems. It's definitely not a replacement for parallelism, as it can save 2x memory, or provide a 2x speed up, but it's far from enough for 100B+ models. Please see the HF integration details here: |
Hi, we drafted a pull request which intergrates ColossalAI to lightning. And here are exmaples and benchmark https://github.com/hpcaitech/ColossalAI-Pytorch-lightning. We have impletemented ZeRO-DP with chunk-based memory management and heterogeneous memory management. I think this is not hard to intergrate to HF. Besides, we are working on auto parallelism. I believe we can use TP/PP without modifying model in the future. |
OK, so at the moment you're proposing to integrate CAI for:
@sgugger, should this perhaps go straight into (Sylvain is on vacation, so please let's wait a bit for him to be back and advise on how to best to proceed.) |
We'll probably need to duplicate the integration in the Trainer and Accelerate for now, since the Trainer does not depend on Accelerate. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Feature request
Dear Hugging Face Team,
My name is Yongbin Li. I am part of Colossal-AI Team.
Thanks for your previous invitation to Colossal-AI org to join Hugging Face. We are happy to share our founder's blog about Hugging Face.
We are thinking about further collaboration, eg. integrating Colossal-AI into Hugging Face to help your community members use large AI models in an efficient and easier manner.
For example, we can democratize its access to all your users in the same way as you did with DeepSpeed.
https://huggingface.co/docs/transformers/v4.21.0/en/main_classes/deepspeed
Motivation
We believe the democratization of large AI models is also very helpful for Hugging Face members. We are very appreciated if we could build the integration with you to benefit both of our users.
Actually, we are working on similar integrations with Meta OPT(done), PyTorch Lightning(in process), etc.
Your contribution
We can provide help you need in this cooperation for free. Actually, we have reached a preliminary idea with your team member: omar, lysandre, and julien via email([email protected]) and look forward to your further reply.
Feel free to reach out to me on Hugging Face Discord. My username is billy2022. We can discuss more details with other colleagues in a private group.
Thank you very much.
Best regards,
Yongbin Li, Chief Marketing Officer, HPC-AI Tech
The text was updated successfully, but these errors were encountered: