Fix dist error with lr decay layer #9489

Yancey1989 · 2018-03-29T04:12:05Z

typhoonzero · 2018-03-29T06:06:44Z

python/paddle/fluid/optimizer.py

@@ -24,6 +24,7 @@
 from regularizer import append_regularization_ops
 from clip import append_gradient_clip_ops, error_clip_callback
 from contextlib import contextmanager
+from distribute_transpiler import UnionFind


optimizer.py should not depend on distribute_transpiler, either put _get_lr_decay_ops in the transpiler or put UnionFind in a single file. I'd prefer the first method, because we don't have to change current demo files then.

optimizer.py should not depend on distribute_transpiler

Thanks @typhoonzero ,I think it's a good point, and maybe we can pass Optimizer instance to transpile interface so that we can support regularization for future.

We can consider that when moving regularizer and clipping to the server side.

Done, moved to transpiler.

…t_with_lr_decay

typhoonzero

LGTM++

typhoonzero · 2018-03-29T10:32:56Z

python/paddle/fluid/distribute_transpiler.py

        for _, opt_op in enumerate(opt_op_on_pserver):
            for _, op in enumerate(self.optimize_ops):
                # optimizer is connected to itself
                if ufind.is_connected(op, opt_op) and \
                    op not in global_ops:
                    __append_optimize_op__(op, per_opt_block)
-            per_opt_block = pserver_program.create_block(0)
+            per_opt_block = pserver_program.create_block(append_block.idx)


That's smart.

typhoonzero · 2018-03-29T10:34:31Z

python/paddle/fluid/distribute_transpiler.py

+        find_ops = []
+        # find ops which output is lr var
+        # make a union-find struct by all ops
+        block = default_main_program().global_block()


Use self.program instead of default_main_program because we may need to transpile a program that is specified by user.

typhoonzero

LGTM++

gongweibao

LGTM

* commit '33b8b3d22034423455a493712955e419aac7b19b': (251 commits) Remove redundant commands in build.sh and build_doc.sh Add dependencies Move v2/api/fluid to fluid/api and Adjust doc build commands Plain LRN op throws an exception when is_test is set in backward pass fix compiler error of profiler_test in ONLY_CPU mode fix server shutdown Translation for Model Configuration (PaddlePaddle#9513) Fix data transform when inplace (PaddlePaddle#9450) refine parallel add FAQ (PaddlePaddle#9494) Fix dist error with lr decay layer (PaddlePaddle#9489) add prefetch_op (PaddlePaddle#9495) Fix some errors (PaddlePaddle#9403) hookup WITH_FLUID_ONLY in TeamCity build.sh (PaddlePaddle#9509) Fix the order of reads and write from buffered channel (PaddlePaddle#9423) change WITH_FLUID to WITH_FLUID_ONLY (PaddlePaddle#9427) fix block num Revert "make append activation in place by default (PaddlePaddle#9417)" Speed/sequence op1 (PaddlePaddle#9217) fix a compile error ...

Fix dist error with lr decay layer

b92aeae

Yancey1989 requested a review from typhoonzero March 29, 2018 04:12

typhoonzero mentioned this pull request Mar 29, 2018

fluid分布式pserver出现异常 #9487

Closed

update

ce2e0a8

typhoonzero reviewed Mar 29, 2018

View reviewed changes

Yancey1989 changed the title ~~[WIP]Fix dist error with lr decay layer~~ Fix dist error with lr decay layer Mar 29, 2018

Yancey1989 added 5 commits March 29, 2018 18:04

analyse lr ops in transpiler

c2fcbf7

revert optimize.py

c8eca6b

Merge branch 'develop' of github.com:PaddlePaddle/Paddle into fix_dis…

1dda42a

…t_with_lr_decay

update

1b07d06

update comments

633a8b2

typhoonzero previously approved these changes Mar 29, 2018

View reviewed changes

use self.program instead of default_main_program

05d5e26

Yancey1989 dismissed typhoonzero’s stale review via 05d5e26 March 29, 2018 12:01

typhoonzero previously approved these changes Mar 29, 2018

View reviewed changes

gongweibao mentioned this pull request Mar 30, 2018

The CI hangs over 12 hours and not started. #9514

Closed

Yancey1989 dismissed typhoonzero’s stale review via d9d11a3 March 30, 2018 02:32

fix ci

b7ffd5d

Yancey1989 force-pushed the fix_dist_with_lr_decay branch from d9d11a3 to b7ffd5d Compare March 30, 2018 02:52

gongweibao approved these changes Mar 30, 2018

View reviewed changes

gongweibao merged commit 374f1ca into PaddlePaddle:develop Mar 30, 2018

Yancey1989 deleted the fix_dist_with_lr_decay branch March 30, 2018 04:01

Yancey1989 mentioned this pull request Mar 30, 2018

fluid分布式pserver出现SIGSEGV异常 #9351

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dist error with lr decay layer #9489

Fix dist error with lr decay layer #9489

Yancey1989 commented Mar 29, 2018

typhoonzero Mar 29, 2018

Yancey1989 Mar 29, 2018

typhoonzero Mar 29, 2018

Yancey1989 Mar 29, 2018

typhoonzero left a comment

typhoonzero Mar 29, 2018

typhoonzero Mar 29, 2018

Yancey1989 Mar 29, 2018

typhoonzero left a comment

gongweibao left a comment

Fix dist error with lr decay layer #9489

Fix dist error with lr decay layer #9489

Conversation

Yancey1989 commented Mar 29, 2018

typhoonzero Mar 29, 2018

Choose a reason for hiding this comment

Yancey1989 Mar 29, 2018

Choose a reason for hiding this comment

typhoonzero Mar 29, 2018

Choose a reason for hiding this comment

Yancey1989 Mar 29, 2018

Choose a reason for hiding this comment

typhoonzero left a comment

Choose a reason for hiding this comment

typhoonzero Mar 29, 2018

Choose a reason for hiding this comment

typhoonzero Mar 29, 2018

Choose a reason for hiding this comment

Yancey1989 Mar 29, 2018

Choose a reason for hiding this comment

typhoonzero left a comment

Choose a reason for hiding this comment

gongweibao left a comment

Choose a reason for hiding this comment