-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FileManager #2115
FileManager #2115
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
把cp等命令的细节的设计统一放到pfs.md
文件里,是否会整齐一些?
@@ -0,0 +1,71 @@ | |||
# FileManager设计文档 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FileManager or FileServer保留一个名词?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/design/file_manager/README.md
Outdated
|
||
### 架构图 | ||
<image src=./src/filemanager.png width=900> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在这个图的下面增加一些用户使用的流程会清晰点?
doc/design/file_manager/README.md
Outdated
@@ -0,0 +1,71 @@ | |||
# FileManager设计文档 | |||
## 目标 | |||
在本文档中,我们设计说明了名为FileManager系统,方便用户管理存放到PaddlePaddle Cloud上的文件。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
准确的说,目标是为了让用户上传自己的训练数据以进行分布式训练。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个PR应该是一个design doc,但是写的形式类似manpage,而manpage是manual doc。
这两者的差距是:design doc的review是为了实现过程里大家意见一致。我担心现在的design doc罗列了各种功能,但是不可能短期内都实现,比如sync。
我建议合成一个.md文件即可。说明白现在要解决的一个具体问题,以及应该怎么解决。当这个design doc被大家同意并且merge的时候,大家对代码应该怎么写心里都有数。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个design doc设计的到底是个什么程序?
是用Go RPC写的RPC server,还是一个Restful API server?RPC是frontend server和backend server之间或者backend servers之间通信的机制,不适合写pfs server这样的Web frontend server。如果要和标准Web client technolgoies(比如HTML5)兼容,应该采用与之对应的Web server technology。
doc/design/file_manager/README.md
Outdated
### PFSClient | ||
- 功能: 详细的内容看[Here](./pfs/pfs.md) | ||
- 提供用户管理文件的命令 | ||
- 用Golang写,可以跨平台执行 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Golang => Go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/design/file_manager/README.md
Outdated
主要功能包括: | ||
|
||
- 提供常用的命令行管理命令管理文件和目录 | ||
- 支持的命令在[Here](./pfs/pfs.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here => 这里
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用中文的anchor text会报错,这是Sphinx的一个bug。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/design/file_manager/README.md
Outdated
- 支持大文件的断点上传、下载 | ||
|
||
## 名词解释 | ||
- PFS:是Paddlepaddle cloud File System的简称,是对用户文件存储空间的抽象,与之相对的是Local File System。目前我们用CephFS来搭建。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paddlepaddle => PaddlePaddle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
主要是想把PFS的缩写怎么来的表达一下。
Paddlepaddle cloud File System:这个里边大写的字母刚好是PFS
doc/design/file_manager/README.md
Outdated
- 支持大文件的断点上传、下载 | ||
|
||
## 名词解释 | ||
- PFS:是Paddlepaddle cloud File System的简称,是对用户文件存储空间的抽象,与之相对的是Local File System。目前我们用CephFS来搭建。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local File System => local filesystem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/design/file_manager/README.md
Outdated
- [CephFS](http://docs.ceph.com/docs/master/cephfs/):一个POSIX兼容的文件系统。 | ||
- Chunk:逻辑划上文件分块的单位。 | ||
- [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/):提供七层协议的反向代理、基于粘性会话的负载均衡。 | ||
- CA:certificate authority<sup>[tls](#tls)</sup> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CA, CRT, Key 用全称就好了, 定义成术语更麻烦。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/design/file_manager/README.md
Outdated
<image src=./src/filemanager.png width=900> | ||
|
||
### PFSClient | ||
- 功能: 详细的内容看[Here](./pfs/pfs.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here => 这里
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/design/file_manager/README.md
Outdated
|
||
|
||
### FileServer | ||
FileServer是一个用GoRPC写的HTTPServer,提供[RESTful API](./RESTAPI.md)接口,接收处理PFSClient端的文件管理请求,并且把结果返回PFSClient端。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GoRPC => Go RPC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/design/file_manager/README.md
Outdated
|
||
|
||
### FileServer | ||
FileServer是一个用GoRPC写的HTTPServer,提供[RESTful API](./RESTAPI.md)接口,接收处理PFSClient端的文件管理请求,并且把结果返回PFSClient端。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTTPServer => HTTP server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/design/file_manager/README.md
Outdated
|
||
|
||
### FileServer | ||
FileServer是一个用GoRPC写的HTTPServer,提供[RESTful API](./RESTAPI.md)接口,接收处理PFSClient端的文件管理请求,并且把结果返回PFSClient端。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用Go语言的标准库rpc写的server叫做RPC server,虽然它也是一个HTTP server,但是那是因为一个用户不需了解的细节——Go RPC用HTTP传输——导致的。
我不记得Go RPC支持暴露Restful API。这个Go RPC server怎么能同时是一个Restful API server呢?
这里的FileServer指的是Go 标准库里已经写好的这个吗? https://golang.org/pkg/net/http/#FileServer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不是,这个FileServer太简单了。
https://golang.org/pkg/net/http/#FileServer
设计的是RESTful API的接口,这样可以兼容H5。概念混淆了,没有表达清楚。
doc/design/file_manager/README.md
Outdated
### PFSServer | ||
PFSServer提供RESTful API接口,接收处理PFSClient端的文件管理请求,并且把结果返回PFSClient端。 | ||
|
||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的```标记看起来是不需要的?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/design/file_manager/README.md
Outdated
RESTful API | ||
|
||
- /api/v1/files | ||
- `GET /api/v1/files`: Get attributes of files or directories. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
既然是个子标题,不需要重复了吧? GET
: Get attributes ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是想说明在下载之前获取元数据信息么?attributes
=>metadata
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
子标题那个不是很理解
doc/design/file_manager/README.md
Outdated
|
||
- /api/v1/files | ||
- `GET /api/v1/files`: Get attributes of files or directories. | ||
- `POST /api/v1/files`: Update files or directories. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update的操作应该对应PATCH
的操作,参考:http://restful-api-design.readthedocs.io/en/latest/methods.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/design/file_manager/README.md
Outdated
## 文件传输优化 | ||
|
||
### 分块文件传输 | ||
用户文件可能是比较大的,上传到Cloud或者下载到本地的时间可能比较长,而且在传输的过程中也可能出现网络不稳定的情况。为了应对以上的问题,我们提出了Chunk的概念,一个Chunk由所在的文件偏移、数据、数据长度及校验值组成。文件数据内容的上传和下载都是都过Chunk的操作来实现的。由于Chunk比较小(默认256K),完成一个传输动作完成的时间也比较短,不容易出错。PFSClient需要在传输完毕最后一个Chunk的时候检查destination文件的MD5值是否和source文件一致。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
文件数据内容的上传和下载都是都过Chunk的操作来实现的
文件的上传和下载都是通过对Chunk的操作来实现的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
``` | ||
|
||
### 生成sparse文件 | ||
当destination文件不存在或者大小和source文件不一致时,可以用[Fallocate](https://Go.org/pkg/syscall/#Fallocate)生成sparse文件,然后就可以并发写入多个Chunk。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没太明白这部分内容,是说要解决断点续传的问题么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是
View
Fix by Old PR and ISSUE #1902