-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dist error with lr decay layer #9489
Fix dist error with lr decay layer #9489
Conversation
python/paddle/fluid/optimizer.py
Outdated
@@ -24,6 +24,7 @@ | |||
from regularizer import append_regularization_ops | |||
from clip import append_gradient_clip_ops, error_clip_callback | |||
from contextlib import contextmanager | |||
from distribute_transpiler import UnionFind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimizer.py
should not depend on distribute_transpiler, either put _get_lr_decay_ops
in the transpiler or put UnionFind
in a single file. I'd prefer the first method, because we don't have to change current demo files then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimizer.py should not depend on distribute_transpiler
Thanks @typhoonzero ,I think it's a good point, and maybe we can pass Optimizer
instance to transpile
interface so that we can support regularization for future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can consider that when moving regularizer and clipping to the server side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, moved to transpiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM++
for _, opt_op in enumerate(opt_op_on_pserver): | ||
for _, op in enumerate(self.optimize_ops): | ||
# optimizer is connected to itself | ||
if ufind.is_connected(op, opt_op) and \ | ||
op not in global_ops: | ||
__append_optimize_op__(op, per_opt_block) | ||
per_opt_block = pserver_program.create_block(0) | ||
per_opt_block = pserver_program.create_block(append_block.idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's smart.
find_ops = [] | ||
# find ops which output is lr var | ||
# make a union-find struct by all ops | ||
block = default_main_program().global_block() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use self.program
instead of default_main_program
because we may need to transpile a program that is specified by user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM++
d9d11a3
to
b7ffd5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* commit '33b8b3d22034423455a493712955e419aac7b19b': (251 commits) Remove redundant commands in build.sh and build_doc.sh Add dependencies Move v2/api/fluid to fluid/api and Adjust doc build commands Plain LRN op throws an exception when is_test is set in backward pass fix compiler error of profiler_test in ONLY_CPU mode fix server shutdown Translation for Model Configuration (PaddlePaddle#9513) Fix data transform when inplace (PaddlePaddle#9450) refine parallel add FAQ (PaddlePaddle#9494) Fix dist error with lr decay layer (PaddlePaddle#9489) add prefetch_op (PaddlePaddle#9495) Fix some errors (PaddlePaddle#9403) hookup WITH_FLUID_ONLY in TeamCity build.sh (PaddlePaddle#9509) Fix the order of reads and write from buffered channel (PaddlePaddle#9423) change WITH_FLUID to WITH_FLUID_ONLY (PaddlePaddle#9427) fix block num Revert "make append activation in place by default (PaddlePaddle#9417)" Speed/sequence op1 (PaddlePaddle#9217) fix a compile error ...
Fixed #9429