Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Multi-threaded #743

Merged
merged 1 commit into from
Mar 19, 2021
Merged

Training Multi-threaded #743

merged 1 commit into from
Mar 19, 2021

Conversation

zachgk
Copy link
Contributor

@zachgk zachgk commented Mar 12, 2021

This PR replaces the ParallelTrainer that was created by @chenkelmann. They
update the usual EasyTrain.trainBatch to run each device on a separate thread
as running them sequentially wouldn't take advantage of multiple GPUs.

This PR adds a new component to the TrainingConfig: the executorService. So, the
multi-threaded aspect of training is managed entirely from the trainer. The
EasyTrain can then retrieve the executorService from the trainer and will run
multi-threaded if it is found and sequentially if not. This means that the
default is sequential and it must be configured to run parallel.

Likewise, the Dataset was updated to also utilize the executorService from the
trainer. It was added as part of a new optional variation of dataset.getData
that can be called from trainer.iterateDataset. So, there are no required
changes to the general dataset flow but a new overload that can be used to
enable multi-threading on a dataset. The RandomAccessDataset was updated to
conform to this.

This PR replaces the ParallelTrainer that was created by @chenkelmann. They
update the usual EasyTrain.trainBatch to run each device on a separate thread
as running them sequentially wouldn't take advantage of multiple GPUs.

This PR adds a new component to the TrainingConfig: the executorService. So, the
multi-threaded aspect of training is managed entirely from the trainer. The
EasyTrain can then retrieve the executorService from the trainer and will run
multi-threaded if it is found and sequentially if not. This means that the
default is sequential and it must be configured to run parallel.

Likewise, the Dataset was updated to also utilize the executorService from the
trainer. It was added as part of a new optional variation of dataset.getData
that can be called from trainer.iterateDataset. So, there are no required
changes to the general dataset flow but a new overload that can be used to
enable multi-threading on a dataset. The RandomAccessDataset was updated to
conform to this.

Change-Id: I440d464da79466b47f0f4875767c179330512c4d
@zachgk zachgk requested review from a team, frankfliu and stu1130 March 15, 2021 23:01
* @return the {@link ExecutorService}
*/
public Optional<ExecutorService> getExecutorService() {
return Optional.ofNullable(executorService);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use Optional for all optional arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably discuss that as a larger improvement, but I would be in support of it

@stu1130
Copy link
Contributor

stu1130 commented Mar 17, 2021

User should still be able to multithreading dataloader with single thread training

@zachgk
Copy link
Contributor Author

zachgk commented Mar 17, 2021

@stu1130 The multi-thread training really only applies to the multi-gpu scenario. In fact, if you only have a single device (CPU or GPU), it also uses sequential training and ignores the executorService. So, they can just supply the executorService to trainer without worrying about it

@stu1130
Copy link
Contributor

stu1130 commented Mar 17, 2021

@zachgk I mean I checked the testMultithreadingDataLoading and we remove the optExecutor. Looks like now to use multithreading dataloader, we need to pass executor in trainingConfig and it will use multithreading training. But I think we still want to make them independent

@zachgk
Copy link
Contributor Author

zachgk commented Mar 18, 2021

@zachgk I mean I checked the testMultithreadingDataLoading and we remove the optExecutor. Looks like now to use multithreading dataloader, we need to pass executor in trainingConfig and it will use multithreading training. But I think we still want to make them independent

Right. Because the executor is an argument to getData, it didn't seem to make sense to have it also be in the dataset builder. Is there a particular use case you are thinking of where should be independent?

@stu1130
Copy link
Contributor

stu1130 commented Mar 19, 2021

@zachgk I mean I checked the testMultithreadingDataLoading and we remove the optExecutor. Looks like now to use multithreading dataloader, we need to pass executor in trainingConfig and it will use multithreading training. But I think we still want to make them independent

Right. Because the executor is an argument to getData, it didn't seem to make sense to have it also be in the dataset builder. Is there a particular use case you are thinking of where should be independent?

My impression was from python. Usually when data fetching is not fast enough and GPU is waiting for data to do forward & backward, we use multi-processing dataloader and single thread for forward & backward. In this case, it don't matter if we call it with single thread or multi-thread as GPU kernel is scheduled sequentially if they don't use stream. So in theory, there should be no performance difference. But in reality, maybe we have improvement. Let's see. I am ok with current implementaion.

@zachgk zachgk merged commit fc1290f into deepjavalibrary:master Mar 19, 2021
@zachgk zachgk deleted the multiTrain branch March 19, 2021 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants