Make the first device share data with the global scope in parallel_do_op. #9398

qingqing01 · 2018-03-27T02:39:19Z

There is no need to copy data to the first device. Just make the first device share data with the global scope, since they are on the same device.

…_op.

panyx0718

Add some comments to explain the problem?

The global moving_mean and moving_variance is currently not correctly updated by the values calculated from sub_scopes (unlike trainable parameters). Perhaps ParallelExecutor has the similar problem to solve
@tonyyang-svail @reyoung

qingqing01 · 2018-03-27T03:46:14Z

Add some comments to explain the problem?

In #9386, the moving mean/variance in BN are un-trainable parameters. The trainable parameters will update in backward and copy to the sub-scope in each mini-batch before the forward. Different from other trainable parameters, the moving means/variances will not updated in backward, the parallel_do_op still copy the initialized parameters in the global scope.

This fix makes the first device share parameter address with the global scope. When the moving mean/variance in the first device is updated, they will also be updated in the global scope.

But for BN, only save moving mean/variance in the first device. Maybe we can merge them between multi-GPUs and multi-machines in the future.

Make the first device share data with the global scope in parallel_do…

8208c67

…_op.

qingqing01 requested review from reyoung, panyx0718, wanghaoshuang and tonyyang-svail March 27, 2018 02:39

panyx0718 approved these changes Mar 27, 2018

View reviewed changes

qingqing01 merged commit 25317bd into PaddlePaddle:develop Mar 27, 2018

qingqing01 deleted the parallel_do_op branch November 14, 2019 05:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the first device share data with the global scope in parallel_do_op. #9398

Make the first device share data with the global scope in parallel_do_op. #9398

qingqing01 commented Mar 27, 2018 •

edited by tonyyang-svail

Loading

panyx0718 left a comment

qingqing01 commented Mar 27, 2018 •

edited

Loading

Make the first device share data with the global scope in parallel_do_op. #9398

Make the first device share data with the global scope in parallel_do_op. #9398

Conversation

qingqing01 commented Mar 27, 2018 • edited by tonyyang-svail Loading

panyx0718 left a comment

Choose a reason for hiding this comment

qingqing01 commented Mar 27, 2018 • edited Loading

qingqing01 commented Mar 27, 2018 •

edited by tonyyang-svail

Loading

qingqing01 commented Mar 27, 2018 •

edited

Loading