You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add parallelize method to GPT-neo models, so we can finetune them using model parallelism using less expensive GPUs.
Motivation
I want to finetune a GPT-neo model using model parallelism in order to do it using less expensive GPUs. It is not yet implemented, and, as higher-end GPUs are too expensive, it would be better if we distributed the model along several less expensive GPUs rather than using a very expensive one. It would also make it possible for us to iterate using larger batches, what can have big impact on the model fitting.
I would be very glad if you people could do it and I think it would enable the finetuning of specific purpose GPT-neo language models.
The text was updated successfully, but these errors were encountered:
We decided that it's not worth investing time into porting the naive MP solution to other models beyond t5+gpt2 since this solution doesn't scale well resource-wise. And given the 2 existing implementations of ZeRO (fairscale and DeepSpeed) this is by far more scalable solution, in particular now that ZeRO stage 3 has been released. You don't need high-end GPUs for ZeRO.
We have everything ready on our side #10753, just waiting for the DeepSpeed team to merge several PRs and make a new release. If you want to try it right away, you can use the 2 branches posted here #11044
Also there are notes comparing the different scalability solutions here: #9766
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
🚀 Feature request
Add
parallelize
method to GPT-neo models, so we can finetune them using model parallelism using less expensive GPUs.Motivation
I want to finetune a GPT-neo model using model parallelism in order to do it using less expensive GPUs. It is not yet implemented, and, as higher-end GPUs are too expensive, it would be better if we distributed the model along several less expensive GPUs rather than using a very expensive one. It would also make it possible for us to iterate using larger batches, what can have big impact on the model fitting.
I would be very glad if you people could do it and I think it would enable the finetuning of specific purpose GPT-neo language models.
The text was updated successfully, but these errors were encountered: