Add parallelize method to GPT-neo models #11054

anamtamais · 2021-04-04T23:11:36Z

🚀 Feature request

Add parallelize method to GPT-neo models, so we can finetune them using model parallelism using less expensive GPUs.

Motivation

I want to finetune a GPT-neo model using model parallelism in order to do it using less expensive GPUs. It is not yet implemented, and, as higher-end GPUs are too expensive, it would be better if we distributed the model along several less expensive GPUs rather than using a very expensive one. It would also make it possible for us to iterate using larger batches, what can have big impact on the model fitting.

I would be very glad if you people could do it and I think it would enable the finetuning of specific purpose GPT-neo language models.

The text was updated successfully, but these errors were encountered:

stas00 · 2021-04-05T15:49:02Z

Hi @anamtamais,

We decided that it's not worth investing time into porting the naive MP solution to other models beyond t5+gpt2 since this solution doesn't scale well resource-wise. And given the 2 existing implementations of ZeRO (fairscale and DeepSpeed) this is by far more scalable solution, in particular now that ZeRO stage 3 has been released. You don't need high-end GPUs for ZeRO.

We have everything ready on our side #10753, just waiting for the DeepSpeed team to merge several PRs and make a new release. If you want to try it right away, you can use the 2 branches posted here #11044

Also there are notes comparing the different scalability solutions here: #9766

If you have any questions please let me know.

github-actions · 2021-05-05T15:01:58Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions bot closed this as completed May 13, 2021

aphedges mentioned this issue May 25, 2021

parallelize and deparallelize method for GPT-Neo series model #11751

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parallelize method to GPT-neo models #11054

Add parallelize method to GPT-neo models #11054

anamtamais commented Apr 4, 2021

stas00 commented Apr 5, 2021 •

edited

Loading

github-actions bot commented May 5, 2021

Add parallelize method to GPT-neo models #11054

Add parallelize method to GPT-neo models #11054

Comments

anamtamais commented Apr 4, 2021

🚀 Feature request

Motivation

stas00 commented Apr 5, 2021 • edited Loading

github-actions bot commented May 5, 2021

stas00 commented Apr 5, 2021 •

edited

Loading