-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the effective batch size? #131
Comments
Thanks for using DeepSpeed. The answer to your question is train_micro_batch_size_per_gpu, i.e. number of data samples in each GPU for each loop iteration. Since you are not from an ML background, let me further explain the relationship with train_batch_size and gradient_accumulation_steps. gradient_accumulation_steps is how many loop iterations before the model is updated, e.g. by calling model.step(), while train_batch_size is total number of examples processed across all GPUs before model is updated. For your specific example, let's assume model is updated every loop iteration, i.e., gradient_accumulation_steps=1. Yes, it would take around 937 steps to complete one epoch of processing the entire 240,000 data examples. However, this is how parameters match up: train_micro_batch_size_per_gpu = 32 Hope that helps. |
Thank you very much for your detailed explanation, I get it now. But still, When I do: print(len(trainloader)) It always prints 60,000 (240000/4gpu's), where I expect it to print 15,000 when train_micro_batch_size_per_gpu is 4 and train_batch_size is 16. |
@tjruwase Can you please shed some light? |
As far as I know print(len(trainloader)) returns total number of data samples in the rank irrespective of batch size configurations. It is not an indicator of number of training steps (or loop iterations) to process all the data examples. |
Okay, so would I be right to assume that on every training step I'm running on train_micro_batch_size_per_gpu*gpu's examples (assuming gradient_accumulation_steps is 1)? |
That is correct. |
Thank you very much! |
Hello, and thank you for your contribution.
I noticed there both
I'm not from an ML background, I just want to know how many data examples each GPU works with on every iteration of the below for loop, so I can set it to maximize my GPU's memory and performance.
How can I know that? And how can I set the above variables differently so as to change the effective batch size?
Let's say, I have 240,000 data examples. When I do print(len(trainloader)), it accurately shows 30,000 (240,000/8gpu's). But since my train_batch_size is 32 I would imagine there to be around 937 steps for one epoch, but the above for loop still walks over 30,000.
The text was updated successfully, but these errors were encountered: