Minimum GPU memory size for training RQ-Transformer #14

Baekpica · 2022-04-27T08:04:58Z

First of all, thank you all the authors for releasing this remarkable researches and models!

I tried to finetune this RQ-Transformer model(3.9B) at certain domain. (I'm already aware that it is impossible to release official training code.) In my training code, 'CUDA out of memory' error occurred with 8 NVIDIA RTX A6000(48GB) in training phase(optimizer step). (Batch size 1 per each device) I'm trying to find out reason of errors and alternative solutions.

So I have a question about minimum GPU memory size for this training task. I saw that NVIDIA A100 was used in your research paper. Was that 80GB memory? (I ask this because there are 2 versions in A100 GPU, 40GB/80GB.)

And should I implement 'model parallelism' code for this task with this resource? If your opinion is that learning process is possible with 48gb, I will look for the wrong part in my code.

ttt733 · 2022-05-09T17:59:31Z

I was able to do some tweaks to the configuration in their notebook and get it running on a single 3090 (24 GB memory.) Please see my PR: #3

The memory requirement seemed to be dramatically lowered by disabling mixed precision.

LeeDoYup · 2022-09-06T17:48:20Z

Thanks for @ttt733 's pull request.
@Baekpica , you can reduce the required memory size by disabling mixed precision.

We will update the example notebook soon.

LeeDoYup closed this as completed Sep 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimum GPU memory size for training RQ-Transformer #14

Minimum GPU memory size for training RQ-Transformer #14

Baekpica commented Apr 27, 2022

ttt733 commented May 9, 2022

LeeDoYup commented Sep 6, 2022

Minimum GPU memory size for training RQ-Transformer #14

Minimum GPU memory size for training RQ-Transformer #14

Comments

Baekpica commented Apr 27, 2022

ttt733 commented May 9, 2022

LeeDoYup commented Sep 6, 2022