Use GPU and multi-card for model training #3

yumianhuli2 · 2022-06-22T02:24:37Z

Hello! How to use GPU and multi-card for training? The default card 0 is the CPU for training.
Thank U!

ZikangZhou · 2022-06-22T02:41:25Z

This repo uses pytorch-lightning as the trainer. It's convenient to do single-gpu or multi-gpu training by simply setting the gpu number:

python train.py --root /path/to/dataset_root/ --embed_dim 128 --gpus #YOUR_GPU_NUM

If I remember correctly, by default this will use Pytorch DDP Spawn strategy for multi-gpu training. If you want to use Pytorch DDP instead (which should be faster than DDP Spawn in general), you can add one line to train.py:

parser.add_argument('--strategy', type=str, default='ddp')

Let me know if it works.

ZikangZhou · 2022-06-22T14:36:34Z

@yumianhuli2 To reproduce the results in the paper when using multi-gpu training, please also make sure that the effective batch size (batch_size * gpu_num) is 32. For example, if you use 4 gpus, then the batch size per gpu should be 8:

python train.py --root /path/to/dataset_root/ --embed_dim 128 --gpus 4 --train_batch_size 8

yumianhuli2 · 2022-06-27T06:41:49Z

Thank you！

tandangzuoren · 2022-10-21T03:09:34Z

Thank you for your outstanding work！If the batchsize is changed, does the learning rate need to be adjusted accordingly？

ZikangZhou · 2022-10-21T06:31:47Z

@tandangzuoren I believe the learning rate should be adjusted. The number of epochs may also need to be changed.

tteokl · 2023-07-06T05:37:59Z

@ZikangZhou Thank you for your advice on this. May I know why 32 is the effective batch size?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use GPU and multi-card for model training #3

Use GPU and multi-card for model training #3

yumianhuli2 commented Jun 22, 2022

ZikangZhou commented Jun 22, 2022 •

edited

Loading

ZikangZhou commented Jun 22, 2022

yumianhuli2 commented Jun 27, 2022

tandangzuoren commented Oct 21, 2022

ZikangZhou commented Oct 21, 2022

tteokl commented Jul 6, 2023

Use GPU and multi-card for model training #3

Use GPU and multi-card for model training #3

Comments

yumianhuli2 commented Jun 22, 2022

ZikangZhou commented Jun 22, 2022 • edited Loading

ZikangZhou commented Jun 22, 2022

yumianhuli2 commented Jun 27, 2022

tandangzuoren commented Oct 21, 2022

ZikangZhou commented Oct 21, 2022

tteokl commented Jul 6, 2023

ZikangZhou commented Jun 22, 2022 •

edited

Loading