Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum GPU memory size for training RQ-Transformer #14

Closed
Baekpica opened this issue Apr 27, 2022 · 2 comments
Closed

Minimum GPU memory size for training RQ-Transformer #14

Baekpica opened this issue Apr 27, 2022 · 2 comments

Comments

@Baekpica
Copy link

First of all, thank you all the authors for releasing this remarkable researches and models!

I tried to finetune this RQ-Transformer model(3.9B) at certain domain. (I'm already aware that it is impossible to release official training code.) In my training code, 'CUDA out of memory' error occurred with 8 NVIDIA RTX A6000(48GB) in training phase(optimizer step). (Batch size 1 per each device) I'm trying to find out reason of errors and alternative solutions.

So I have a question about minimum GPU memory size for this training task. I saw that NVIDIA A100 was used in your research paper. Was that 80GB memory? (I ask this because there are 2 versions in A100 GPU, 40GB/80GB.)

And should I implement 'model parallelism' code for this task with this resource? If your opinion is that learning process is possible with 48gb, I will look for the wrong part in my code.

@ttt733
Copy link

ttt733 commented May 9, 2022

I was able to do some tweaks to the configuration in their notebook and get it running on a single 3090 (24 GB memory.) Please see my PR: #3

The memory requirement seemed to be dramatically lowered by disabling mixed precision.

@LeeDoYup
Copy link
Contributor

LeeDoYup commented Sep 6, 2022

Thanks for @ttt733 's pull request.
@Baekpica , you can reduce the required memory size by disabling mixed precision.

We will update the example notebook soon.

@LeeDoYup LeeDoYup closed this as completed Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants