-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is pipeline-parallel in conflict with ZeRO-stage2? #823
Comments
Hi @gongjingcs Pipeline Parallelism can work with zero stage 1 but not stage 2, due to the gradient accumulation in PP that requires that all gradients be present across multiple forward/backward passes. Since zero stage 2 partitions the gradients, they are simply incompatible unfortunately. |
(paraphrasing some communications with @samyam here, but I have also tried it myself in some experiments and got the same error. It would be a useful thing to add to the docs imo.) |
Thanks for your reply. Yes, I change the config setting to zero-1, it doesn't report any error, however zero stage1 does not really work in this example: https://github.com/microsoft/DeepSpeedExamples/tree/master/Megatron-LM-v1.1.5-3D_parallelism. In this example, it uses " buffered_allreduce_fallback" func to allreduce gradients, could you help check it? |
I can't train gpt with 3D-parallel and ZeRO-stage2 at the same time.
It seems peline-parallel in conflict with ZeRO-stage2. I use the pipeline example here: https://github.com/microsoft/DeepSpeedExamples/tree/master/Megatron-LM-v1.1.5-3D_parallelism.
looking forward to your reply
The text was updated successfully, but these errors were encountered: