-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
paddle cloud 计划内容 #1860
Comments
“容错” 改为 “扩容并体现性能变化” |
4/24/2017 plan:
|
我来调研一下parameter server 的情况吧 |
|
|
不确定:
|
4/24/2017 plan: |
4/26/2017 plan
|
Closing this issue, we can track the status in "Project": https://github.com/PaddlePaddle/Paddle/projects/18 |
第一阶段: 可以演示,可以开放给部分用户使用(内测)
时间点:2017-05-31
End user's
train
function talks to PaddlePaddle server, which invokes Docker to build images.风险:
- 长期考虑,高性能存储需要深度支持。
- Web页面的开发有工作量,人员少
Design docs:
4/24/2017 meeting minutes:
scope information for first version:
pserver:
⁃ 只考虑TCP,不支持RDMA
⁃ 不考虑sparse
⁃ 支持trainer动态伸缩
⁃ 同步 SGD
trainer:
⁃ pserver client
⁃ fetch taskid,按task处理数据
⁃ 动态伸缩,demo强调扩容,体现出变化。
master:
⁃ 服务发现
⁃ 分配task
paddle server:
⁃ build docker image on Kubernetes
⁃ 启动paddle job
paddle client:
⁃ 提交集群任务(python代码, add an optional argument for paddle.train, which contains dist train configuration.)
⁃ 命令行 paddle upload/download
The text was updated successfully, but these errors were encountered: