Training Multi-threaded #743

zachgk · 2021-03-12T18:23:16Z

This PR replaces the ParallelTrainer that was created by @chenkelmann. They
update the usual EasyTrain.trainBatch to run each device on a separate thread
as running them sequentially wouldn't take advantage of multiple GPUs.

This PR adds a new component to the TrainingConfig: the executorService. So, the
multi-threaded aspect of training is managed entirely from the trainer. The
EasyTrain can then retrieve the executorService from the trainer and will run
multi-threaded if it is found and sequentially if not. This means that the
default is sequential and it must be configured to run parallel.

Likewise, the Dataset was updated to also utilize the executorService from the
trainer. It was added as part of a new optional variation of dataset.getData
that can be called from trainer.iterateDataset. So, there are no required
changes to the general dataset flow but a new overload that can be used to
enable multi-threading on a dataset. The RandomAccessDataset was updated to
conform to this.

@chenkelmann

This PR replaces the ParallelTrainer that was created by @chenkelmann. They update the usual EasyTrain.trainBatch to run each device on a separate thread as running them sequentially wouldn't take advantage of multiple GPUs. This PR adds a new component to the TrainingConfig: the executorService. So, the multi-threaded aspect of training is managed entirely from the trainer. The EasyTrain can then retrieve the executorService from the trainer and will run multi-threaded if it is found and sequentially if not. This means that the default is sequential and it must be configured to run parallel. Likewise, the Dataset was updated to also utilize the executorService from the trainer. It was added as part of a new optional variation of dataset.getData that can be called from trainer.iterateDataset. So, there are no required changes to the general dataset flow but a new overload that can be used to enable multi-threading on a dataset. The RandomAccessDataset was updated to conform to this. Change-Id: I440d464da79466b47f0f4875767c179330512c4d

stu1130 · 2021-03-17T19:42:18Z

api/src/main/java/ai/djl/training/Trainer.java

+     * @return the {@link ExecutorService}
+     */
+    public Optional<ExecutorService> getExecutorService() {
+        return Optional.ofNullable(executorService);


Should we use Optional for all optional arguments?

We should probably discuss that as a larger improvement, but I would be in support of it

stu1130 · 2021-03-17T19:54:06Z

User should still be able to multithreading dataloader with single thread training

zachgk · 2021-03-17T22:10:45Z

@stu1130 The multi-thread training really only applies to the multi-gpu scenario. In fact, if you only have a single device (CPU or GPU), it also uses sequential training and ignores the executorService. So, they can just supply the executorService to trainer without worrying about it

stu1130 · 2021-03-17T22:26:18Z

@zachgk I mean I checked the testMultithreadingDataLoading and we remove the optExecutor. Looks like now to use multithreading dataloader, we need to pass executor in trainingConfig and it will use multithreading training. But I think we still want to make them independent

zachgk · 2021-03-18T20:42:58Z

@zachgk I mean I checked the testMultithreadingDataLoading and we remove the optExecutor. Looks like now to use multithreading dataloader, we need to pass executor in trainingConfig and it will use multithreading training. But I think we still want to make them independent

Right. Because the executor is an argument to getData, it didn't seem to make sense to have it also be in the dataset builder. Is there a particular use case you are thinking of where should be independent?

stu1130 · 2021-03-19T21:28:46Z

@zachgk I mean I checked the testMultithreadingDataLoading and we remove the optExecutor. Looks like now to use multithreading dataloader, we need to pass executor in trainingConfig and it will use multithreading training. But I think we still want to make them independent

Right. Because the executor is an argument to getData, it didn't seem to make sense to have it also be in the dataset builder. Is there a particular use case you are thinking of where should be independent?

My impression was from python. Usually when data fetching is not fast enough and GPU is waiting for data to do forward & backward, we use multi-processing dataloader and single thread for forward & backward. In this case, it don't matter if we call it with single thread or multi-thread as GPU kernel is scheduled sequentially if they don't use stream. So in theory, there should be no performance difference. But in reality, maybe we have improvement. Let's see. I am ok with current implementaion.

zachgk requested review from a team, frankfliu and stu1130 March 15, 2021 23:01

stu1130 approved these changes Mar 17, 2021

View reviewed changes

zachgk merged commit fc1290f into deepjavalibrary:master Mar 19, 2021

zachgk deleted the multiTrain branch March 19, 2021 21:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Multi-threaded #743

Training Multi-threaded #743

zachgk commented Mar 12, 2021

stu1130 Mar 17, 2021

zachgk Mar 17, 2021

stu1130 commented Mar 17, 2021 •

edited

Loading

zachgk commented Mar 17, 2021

stu1130 commented Mar 17, 2021

zachgk commented Mar 18, 2021

stu1130 commented Mar 19, 2021 •

edited

Loading

Training Multi-threaded #743

Training Multi-threaded #743

Conversation

zachgk commented Mar 12, 2021

stu1130 Mar 17, 2021

Choose a reason for hiding this comment

zachgk Mar 17, 2021

Choose a reason for hiding this comment

stu1130 commented Mar 17, 2021 • edited Loading

zachgk commented Mar 17, 2021

stu1130 commented Mar 17, 2021

zachgk commented Mar 18, 2021

stu1130 commented Mar 19, 2021 • edited Loading

stu1130 commented Mar 17, 2021 •

edited

Loading

stu1130 commented Mar 19, 2021 •

edited

Loading