-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Auto Parallel] Add zero h1 pipeline scheduling for paddle #62865
Conversation
… reconstruct_pipeline_scheduler_pass
… reconstruct_pipeline_scheduler_pass
你的PR提交成功,感谢你对开源项目的贡献! |
Sorry to inform you that 4911d03's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
name = name.split("@")[0] | ||
if not block._find_var_recursive(name): | ||
return "backward_b" | ||
var = block._find_var_recursive(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deal the operators without output
such as send_v2
, c_sync_calc_stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, 麻烦研发老师有空的时候再测试一下是否还会 hang 住 ~
def _partial_programs(self, program): | ||
dist_context = self.get_attr("dist_context") | ||
self._split_matmul_grad_ops_to_matmul(program, dist_context) | ||
types, sub_program_list = _program_for_zero_bubble(program) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
参考1F1B
或FTheB
的_partial_programs
,这里增加enable_send_recv_overlap
参数设置,例如1F1B_partial_programs
:
def _partial_programs(self, program):
# NOTE: The flag "enable_send_recv_overlap" may increase the reserved memory of GPUs.
enable_send_recv_overlap = self.get_attr("enable_send_recv_overlap")
types = [FORWARD, BACKWARD, OPT]
sub_program_list = _program_for_fthenb_and_1f1b(
program, enable_send_recv_overlap
)
return types, sub_program_list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
参考
1F1B
或FTheB
的_partial_programs
,这里增加enable_send_recv_overlap
参数设置,例如1F1B_partial_programs
:def _partial_programs(self, program): # NOTE: The flag "enable_send_recv_overlap" may increase the reserved memory of GPUs. enable_send_recv_overlap = self.get_attr("enable_send_recv_overlap") types = [FORWARD, BACKWARD, OPT] sub_program_list = _program_for_fthenb_and_1f1b( program, enable_send_recv_overlap ) return types, sub_program_list
这个要不单独加一个pr 适配一下吧,之前 vpp 的这个开关也是后续适配的 ~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
参考
1F1B
或FTheB
的_partial_programs
,这里增加enable_send_recv_overlap
参数设置,例如1F1B_partial_programs
:def _partial_programs(self, program): # NOTE: The flag "enable_send_recv_overlap" may increase the reserved memory of GPUs. enable_send_recv_overlap = self.get_attr("enable_send_recv_overlap") types = [FORWARD, BACKWARD, OPT] sub_program_list = _program_for_fthenb_and_1f1b( program, enable_send_recv_overlap ) return types, sub_program_list
这个要不单独加一个pr 适配一下吧,之前 vpp 的这个开关也是后续适配的 ~
VPP应该是一开始忘记加了,所以后续单独加上。这里可以一并加上。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
参考
1F1B
或FTheB
的_partial_programs
,这里增加enable_send_recv_overlap
参数设置,例如1F1B_partial_programs
:def _partial_programs(self, program): # NOTE: The flag "enable_send_recv_overlap" may increase the reserved memory of GPUs. enable_send_recv_overlap = self.get_attr("enable_send_recv_overlap") types = [FORWARD, BACKWARD, OPT] sub_program_list = _program_for_fthenb_and_1f1b( program, enable_send_recv_overlap ) return types, sub_program_list
这个要不单独加一个pr 适配一下吧,之前 vpp 的这个开关也是后续适配的 ~
VPP应该是一开始忘记加了,所以后续单独加上。这里可以一并加上。
好的 ~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR Category
Auto Parallel
PR Types
Others
Description
为 Paddle 支持 Zero-H1 并行调度
Llama2 4卡实际调度结果如下:
在 PaddleNLP Llama2 模型上进行测试结果如下(pp4, batch 1, hidden_layer=4):
精度
精度可以对齐,有时候小数点后3位以后会有误查(符合论文的描述)
Llama2 下 10000 步 Loss 对比:
以下为前10000步,loss 曲线图
速度测试
测试机器: 4卡 3090
显存占用
测试脚本如下:
相关 Issue: