You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thank you all the authors for releasing this remarkable researches and models!
I tried to finetune this RQ-Transformer model(3.9B) at certain domain. (I'm already aware that it is impossible to release official training code.) In my training code, 'CUDA out of memory' error occurred with 8 NVIDIA RTX A6000(48GB) in training phase(optimizer step). (Batch size 1 per each device) I'm trying to find out reason of errors and alternative solutions.
So I have a question about minimum GPU memory size for this training task. I saw that NVIDIA A100 was used in your research paper. Was that 80GB memory? (I ask this because there are 2 versions in A100 GPU, 40GB/80GB.)
And should I implement 'model parallelism' code for this task with this resource? If your opinion is that learning process is possible with 48gb, I will look for the wrong part in my code.
The text was updated successfully, but these errors were encountered:
First of all, thank you all the authors for releasing this remarkable researches and models!
I tried to finetune this RQ-Transformer model(3.9B) at certain domain. (I'm already aware that it is impossible to release official training code.) In my training code, 'CUDA out of memory' error occurred with 8 NVIDIA RTX A6000(48GB) in training phase(optimizer step). (Batch size 1 per each device) I'm trying to find out reason of errors and alternative solutions.
So I have a question about minimum GPU memory size for this training task. I saw that NVIDIA A100 was used in your research paper. Was that 80GB memory? (I ask this because there are 2 versions in A100 GPU, 40GB/80GB.)
And should I implement 'model parallelism' code for this task with this resource? If your opinion is that learning process is possible with 48gb, I will look for the wrong part in my code.
The text was updated successfully, but these errors were encountered: