Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
typhoonzero committed Apr 24, 2017
1 parent b360dec commit a9b33f3
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 3 deletions.
1 change: 0 additions & 1 deletion doc/design/cluster_train/checkpointing.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
说明:

* parameter server在集群中启动后,自动挂载分布式存储目录,并把快照保存到这个目录下。
* ***注:parameter server在保存检查点时,利用了Linux内核的“写时复制”技术,在fork的进程中保存检查点,原进程可以继续接收trainer的梯度更新请求,而不影响检查点数据的保存。***
* ***注:每个parameter server的检查点各自独立保存,暂时不考虑多个parameter server同步的保存一个特定时间点的全局检查点,因为这样做也没法保证消除随机性。***

检查点保存程序流程:
Expand Down
4 changes: 2 additions & 2 deletions doc/design/cluster_train/data_dispatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@

### 上传训练文件

使用下面命令,可以把本地的训练数据上传到存储集群中
使用下面命令,可以把本地的训练数据上传到存储集群中,并指定上传数据的`dataset-name`

```
paddle upload train_data.list
paddle upload train_data.list "dataset-name"
```

其中`.list`文件描述了训练数据的文件和对应的label,对于图像类数据,`.list文件`样例如下,每一行包含了图片文件的路径和其label(用tab分隔开):
Expand Down

0 comments on commit a9b33f3

Please sign in to comment.