Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fluid] Migrate Fluid Distributed Kernels to PHI #55716

Closed
GhostScreaming opened this issue Jul 26, 2023 · 19 comments
Closed

[Fluid] Migrate Fluid Distributed Kernels to PHI #55716

GhostScreaming opened this issue Jul 26, 2023 · 19 comments
Assignees
Labels
PFCC Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc

Comments

@GhostScreaming
Copy link
Contributor

GhostScreaming commented Jul 26, 2023

问题描述 Please describe your issue

大家好,当前Fluid下分布式算子由于未完全迁移到新的PHI算子体系,无法具备PHI下函数式算子注册时"记录自身输入输出属性“的能力,在分布式场景也就无法使用框架新的通信模块和调度系统,给分布式训练调试、优化等工作带来较大的负担。我们一共收集了17个需要迁移的算子,欢迎大家提交PR一起对这些算子做迁移改造。

更多详细介绍见 Call-for-Contributions: Fluid算子函数式迁移专项, 本issue用于跟踪记录该项目下各个算子的迁移改造进度。

注:

待迁移算子列表(整体进度 15/16)

按 merge 的时间顺序,排名不分先后: @AndSonder (4) @GreatV (2) @BeingGod (3) @huangjiyi (2) @gouzil (2) @yangguohao (1) @zeroRains (1)

序号 算子名称 认领人 PR(cpu/gpu kernel ) PR(xpu kernel) PR(InferShape)
1 c_embedding✅(2023/8/16) @BeingGod #56129 #56129 无需
2 dgc✅(2023/8/10) @huangjiyi #56003 无需 无需
3 dgc_momentum✅(2023/8/18) @huangjiyi #56158 无需 #56358
4 c_split✅(2023/8/22) @BeingGod #56327 #56327 无需
5 lars_momentum✅(2023/9/5) @gouzil #55798 #56751 #56749
6 nop✅(2023/8/3) @gouzil #55816 无需 无需
7 ftrl @enkilee #56270
8 decayed_adagrad✅(2023/8/5) @GreatV #55995 无需 #55995
9 c_identity✅(2023/8/23) @GreatV #56215 #56215 #56215
10 distributed_fused_lamb_init✅(2023/8/31) @zeroRains #55993 无需 无需
11 limit_by_capacity✅(2023/8/18) @yangguohao #55948 无需 无需
12 number_count✅(2023/8/15) @BeingGod #56128 无需 无需
13 distributed_fused_lamb 上期迁移过
14 random_routing✅(2023/8/1) @AndSonder #55773 无需 无需
15 prune_gate_by_capacity✅(2023/8/1) @AndSonder #55780 无需 无需
16 assign_pos✅(2023/8/16) @AndSonder #55794 无需 无需
17 fused_softmax_mask_upper_triangle✅(2023/8/1) @AndSonder #55769 无需 无需
@GhostScreaming GhostScreaming added the PFCC Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc label Jul 26, 2023
@GhostScreaming GhostScreaming self-assigned this Jul 26, 2023
@luotao1 luotao1 moved this to In Progress in Call for Contributions Jul 26, 2023
@GreatV
Copy link
Contributor

GreatV commented Jul 26, 2023

序号 算子名称 认领人 PR(cpu/gpu kernel ) PR(xpu kernel) PR(InferShape)
8 decayed_adagrad  @GreatV
9 c_identity @GreatV

@BeingGod
Copy link
Contributor

BeingGod commented Jul 26, 2023

序号 算子名称 认领人 PR(cpu/gpu kernel ) PR(xpu kernel) PR(InferShape)
1 c_embedding @BeingGod  #56129
4 c_split @BeingGod #56327
12 number_count @BeingGod #56128

@luotao1 luotao1 self-assigned this Jul 26, 2023
@AndSonder
Copy link
Contributor

AndSonder commented Jul 26, 2023

序号 算子名称 认领人 PR(cpu/gpu kernel) PR(xpu kernel) PR(InferShape)
14 random_routing @AndSonder #55773 - -
15 prune_gate_by_capacity @AndSonder #55780 - -
16 assign_pos @AndSonder #55794 - -
17 fused_softmax_mask_upper_triangle @AndSonder #55769 - -

@gouzil
Copy link
Member

gouzil commented Jul 26, 2023

序号 算子名称 认领人 PR(cpu/gpu kernel) PR(xpu kernel) PR(InferShape)
5 lars_momentum @gouzil #55798     
6 nop @gouzil #55816  -  

@enkilee
Copy link
Contributor

enkilee commented Jul 26, 2023

序号 算子名称 认领人 PR(cpu/gpu kernel) PR(xpu kernel) PR(InferShape)
7 ftrl @enkilee      

@huangjiyi
Copy link
Member

序号 算子名称 认领人 PR(cpu/gpu kernel) PR(xpu kernel) PR(InferShape)
2 dgc @huangjiyi
3 dgc_momentum @huangjiyi

@yangguohao
Copy link
Contributor

序号 算子名称 认领人 PR(cpu/gpu kernel) PR(xpu kernel) PR(InferShape)
11 limit_by_capacity @yangguohao   


@AndSonder
Copy link
Contributor

@GhostScreaming fused_softmax_mask_upper_triangle 的任务, fused_softmax_mask_upper_triangle_grad 需要一并迁移吗?

@GhostScreaming
Copy link
Contributor Author

GhostScreaming commented Jul 31, 2023

@GhostScreaming fused_softmax_mask_upper_triangle 的任务, fused_softmax_mask_upper_triangle_grad 需要一并迁移吗?

嗯嗯,对应的反向算子也默认是需要迁移的。

@zeroRains
Copy link
Contributor

序号 算子名称 认领人 PR(cpu/gpu kernel) PR(xpu kernel) PR(InferShape)
10 distributed_fused_lamb_init @zeroRains - - -

@AndSonder
Copy link
Contributor

random_routing、prune_gate_by_capacity、assign_pos、fused_softmax_mask_upper_triangle 无需迁移 XPU,因为这些算子原本就没有 XPU 的Kernel。也无需迁移 InferShape,因为不涉及到undefined的注册数据类型,无需用InferMeta推导出 dtype 信息

@GreatV
Copy link
Contributor

GreatV commented Aug 14, 2023

@AndSonder
Copy link
Contributor

distributed_fused_lamb 这个算子在上一期中已经迁移过了可以划掉了 @luotao1

@BeingGod
Copy link
Contributor

BeingGod commented Aug 15, 2023

@huangjiyi
Copy link
Member

@zeroRains
Copy link
Contributor

distributed_fused_lamb_init算子原本就没有 XPU 的Kernel,无须迁移xpu。不涉及到undefined的注册数据类型,无需用InferMeta推导出 dtype 信息,无须迁移InferShape

@gouzil
Copy link
Member

gouzil commented Aug 16, 2023

@yangguohao
Copy link
Contributor

limit_by_capacity 无需迁移 XPU 和 InferShape

@luotao1
Copy link
Contributor

luotao1 commented Sep 18, 2023

Fluid下分布式算子迁移已完成,感谢参与的小伙伴们!

按 merge 的时间顺序,排名不分先后: @AndSonder (4) @GreatV (2) @BeingGod (3) @huangjiyi (2) @gouzil (2) @yangguohao (1) @zeroRains (1)

欢迎继续参与快乐开源的其他任务

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PFCC Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc
Projects
Development

No branches or pull requests