-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increasing model coverage of Tensor MP and Inference Engine. #1248
Comments
@stas00 it has been checked that the training can be implemented without the CUDA kernel. The parallelformers can train almost all models with tensor parallel. However, I need little more time to use the CUDA kernel. What do you want?
refer to https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/mpu/mappings.py#L76 and https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/mpu/mappings.py#L92 There are backward function. |
I'd say, that since you already have it implemented, let's make it available to users now. We just need to make sure that once CUDA kernels become available the user have little to no change in how they deploy the models, so ideally just having a flag to choose python vs. cuda and we sort out all the requirements changes behind the scenes. So if we have a pure-python class, we just swap in a CUDA version of the same. And we will need a mechanism to build the extensions, so for example it can be Deepspeed or some other separate repo/package. This is so that Please let me know if you have any questions. |
@stas00 Ok. I will proceed with the two tasks in parallel. First of all, on the HF transformers side, I will implement the tensor parallel feature without CUDA. At the same time, on the DeepSpeed side, I will refactor the code and find a way to use CUDA kernel together. By making the interface of both side the same, I will make HF transformers automatically use the DeepSpeed code when the CUDA kernel becomes available in the future. Thanks. |
@RezaYazdaniAminabadi |
Sounds like a great plan to me, @hyunwoongko! Thank you! If you need me please don't hesitate to tag me. |
Hi @hyunwoongko Sorry for the late reply, I was OOF for the last couple of days. I agree with you that the API needs some refactoring to help make it more readable. Please go ahead and submit a PR with your current changes and I will review and add changes as needed. Btw, most of the Bert- and GPT-like models that can easily benefit from the kernels to run faster inference. We can add the support orthogonal to the model-parallelization. Best, |
Currently, our team is developing our large-scale model for the Asian language, and most of my resources are used to this work, so I Haven't been able to invest a lot of time in collaboration until now. Sorry. I plan to resume work from this weekend. @RezaYazdaniAminabadi Are you not considering having your own implementation of mpu and tensor mp inside deepspeed? I want to implement mpu and tensor mp similar to those used in megatron lm. It also includes a training function, so it is no longer limited to deep speed-inference. How about making both inference and training possible? I think there is some misalignment between each other. @stas00 wants a training function based on tensor mp. |
A small correction - HF transformers wants both training and inference. It's just that we have a lot more users using the library for training. So there is definitely not misalignment between the two. Remember that Deepspeed already has TP and PP, so they are just missing inference. (correction: no TP!) HF Transformers doesn't have those yet, hence the difference. |
@stas00 where is deepspeed TP? Do you mean Megatron LM Integration? |
You know what - you're absolutely correct, @hyunwoongko! Deepspeed doesn't have it! I have just never tried to use it so I assumed that since many Deepspeed docs talk about 3D, it included tensor parallelism, but it doesn't! I just found this comment by Shaden:
(source: #673 (comment)) Thank you! I will update this doc to remove Deepspeed: https://huggingface.co/transformers/parallelism.html#tensor-parallelism |
@stas00 That's why we should use Megatron-LM for 3D parallelization. (nowadays, I am working with Megatron-LM everyday like you 😅) But, To support all kinds of parallelization for Huggingface transformers, we need TP and MPU like Megatron, but without Megatron. (Because Megatron is too complex and I think it's not good design to cover various model architectures in HF transformers) Anyway, we should implement TP and MPU anywhere (DS or HF) I can implement TP and MPU into huggingface sides. I already tested training feature of parallelformers and it works. but MPU and TP training feature are needed. but you told me you want to integrate through deepspeed. That's why I told you we should adjust it. because It is not clear deepspeed team also want to implement TP training feature and MPU inside deepseed. currently deepspeed team support inference-only TP. |
I think we have a miscommunication here. I was trying to communicate that in the process of implementing a pure-python TP in HF transformers, it'd be great to keep in mind DS's cuda kernels so that we could swap those in when such are available. I have not suggested to implement TP through deepspeed. We want HF transformers to be independent for the core 3D parallelism functionality and optionally be sped up by various extensions. @hyunwoongko, would it help for us to have a call so that we could flesh out the details and hopefully there will be more clarity about this. I remember you were concerned with language issues. Since my Korean consists of one word, I wonder if @jaketae would be interested to participate and help with translation. I'm in the PST. If we do that perhaps let's invite @siddk as well, so that we perhaps could create a shared plan to integrate your and his projects, as you both are working on some overlapping solutions. |
I'd be more than happy to assist with translation, though I doubt @hyunwoongko will have issues. Plus I know you're secretly hiding your Korean skills, @stas00. We can coordinate a date and time for further discussion and proceed from there? Perhaps a when2meet? |
I think it would be good to discuss the issue on the Huggingface side. This is a deepspeed repo. |
no secret Korean yet. I'm good with languages, but Asian ones I find too hard for me - since my native tongue isn't tonal. I'm arranging a slack channel for all involved, including Reza. and we can then sort things out including a meeting. I hope all should be added/invited in the next day or two once I find an admin who is not on vacation. |
Good. Could you invite me slack channel? |
@hyunwoongko, it is definitely exciting to bring the tensor-slicing in deepspeed both for training and inference. Looking forward to talking with you all :) |
@RezaYazdaniAminabadi Great. Firstly, Is it ok to bring tensor MP and MPU without fused kernel into DeepSpeed? I hope DeepSpeed supports Tensor MP without Megatron LM. Let's implement pure python version first, and we can discuss about applying kernel fusion. |
The invite was sent to your naver.com email you emailed me from. Please email me if you haven't received it. |
@RezaYazdaniAminabadi
Let's continue the discussion we had in the offline chat here.
First of all I want to refactor the
module_injection.py
andinference/engine.py
codes. The current codes consist of too many nested functions, and all logic is contained in one function, so it is a bit complicated. What do you think?The text was updated successfully, but these errors were encountered: