-
Notifications
You must be signed in to change notification settings - Fork 673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training Multi-threaded #743
Conversation
This PR replaces the ParallelTrainer that was created by @chenkelmann. They update the usual EasyTrain.trainBatch to run each device on a separate thread as running them sequentially wouldn't take advantage of multiple GPUs. This PR adds a new component to the TrainingConfig: the executorService. So, the multi-threaded aspect of training is managed entirely from the trainer. The EasyTrain can then retrieve the executorService from the trainer and will run multi-threaded if it is found and sequentially if not. This means that the default is sequential and it must be configured to run parallel. Likewise, the Dataset was updated to also utilize the executorService from the trainer. It was added as part of a new optional variation of dataset.getData that can be called from trainer.iterateDataset. So, there are no required changes to the general dataset flow but a new overload that can be used to enable multi-threading on a dataset. The RandomAccessDataset was updated to conform to this. Change-Id: I440d464da79466b47f0f4875767c179330512c4d
* @return the {@link ExecutorService} | ||
*/ | ||
public Optional<ExecutorService> getExecutorService() { | ||
return Optional.ofNullable(executorService); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use Optional for all optional arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably discuss that as a larger improvement, but I would be in support of it
User should still be able to multithreading dataloader with single thread training |
@stu1130 The multi-thread training really only applies to the multi-gpu scenario. In fact, if you only have a single device (CPU or GPU), it also uses sequential training and ignores the executorService. So, they can just supply the executorService to trainer without worrying about it |
@zachgk I mean I checked the testMultithreadingDataLoading and we remove the optExecutor. Looks like now to use multithreading dataloader, we need to pass executor in trainingConfig and it will use multithreading training. But I think we still want to make them independent |
Right. Because the executor is an argument to getData, it didn't seem to make sense to have it also be in the dataset builder. Is there a particular use case you are thinking of where should be independent? |
My impression was from python. Usually when data fetching is not fast enough and GPU is waiting for data to do forward & backward, we use multi-processing dataloader and single thread for forward & backward. In this case, it don't matter if we call it with single thread or multi-thread as GPU kernel is scheduled sequentially if they don't use stream. So in theory, there should be no performance difference. But in reality, maybe we have improvement. Let's see. I am ok with current implementaion. |
This PR replaces the ParallelTrainer that was created by @chenkelmann. They
update the usual EasyTrain.trainBatch to run each device on a separate thread
as running them sequentially wouldn't take advantage of multiple GPUs.
This PR adds a new component to the TrainingConfig: the executorService. So, the
multi-threaded aspect of training is managed entirely from the trainer. The
EasyTrain can then retrieve the executorService from the trainer and will run
multi-threaded if it is found and sequentially if not. This means that the
default is sequential and it must be configured to run parallel.
Likewise, the Dataset was updated to also utilize the executorService from the
trainer. It was added as part of a new optional variation of dataset.getData
that can be called from trainer.iterateDataset. So, there are no required
changes to the general dataset flow but a new overload that can be used to
enable multi-threading on a dataset. The RandomAccessDataset was updated to
conform to this.