Single release for PaddlePaddle CPU Image #1607

gangliao · 2017-03-13T06:59:12Z

Please give me some ack.... 😄

Xreki · 2017-03-13T07:49:34Z

paddle/cuda/Readme.md

+
+### Background
+
+Currently, PaddlePaddle supports AVX and SSE3 intrinsics (extensions to the x86 instruction set architecture). When using CMake to compile PaddlePaddle source code, it will check and detect the host which SIMD instruction is supported, then automatically set the legal one.  Developer or user also could manually set CMake option `WITH_AVX=ON/OFF` before PaddlePaddle compilation. That's good for local usage.


这里我想强调一下host和target。目前我们常用的local usage方式，host就是target，所以没有什么区别。如果host和target不同，check and detect the host which SIMD instruction is supported其实没有意义。

正是如此。

Xreki · 2017-03-13T07:55:56Z

paddle/cuda/Readme.md

+ } else if (HAS_SIMD(SIMD_SSE3)) {
+      sse3_stub();
+ }
+```


这里接口上可以简单一些吧，比如可以定义一个SIMD的宏（或者其他方式），然后调用出简化为SIMD(stub())

其实CpuID.h里面也是有宏的，可以直接用，函数调用和宏都支持

/** * @brief Check SIMD flags at runtime. * * 1. Check all SIMD flags at runtime: * * @code{.cpp} * if (HAS_AVX && HAS_AVX2) { * avx2_stub(); * } * @endcod * * 2. Check one SIMD flag at runtime: * * @code{.cpp} * if (HAS_SSE41 || HAS_SSE42) { * sse4_stub(); * } * @endcode */

Xreki · 2017-03-13T07:58:55Z

paddle/cuda/Readme.md

+       |     |---sse3 -- sse3_stub()
+       |
+       arm--- ...
+```


naive指的是纯c++的实现吧？

恩，顺手写了，当然可以直接放在x86目录下，不需要naive目录

Xreki · 2017-03-13T08:01:14Z

paddle/cuda/Readme.md

+        |                    |         |- gru.cc
+        |                              |- ...
+        |- gpu -- ...
+```


这里我有两点不确定：

gpu的实现以后是不是都会移到function里面去

这个目录还需要保持include和src两个目录吗

gpu这一块可以先放着不动，先增加一个cpu目录开展工作，并不会冲突。

觉得还是需要include目录来存放接口，清晰规范一些，方便调用。

naive目录直接去掉，相应代码直接放x86目录很合理。

reyoung · 2017-03-16T02:41:39Z

paddle/cuda/Readme.md

+PaddlePaddle will crash and throw out `illegal instruction is used`. This problem will appear
+frequently on cluster environment, like Kubernetes. **It must be addressed before PaddlePaddle on Cloud**
+
+2. Once new version is ready to deliver, we have to release more products to users, for example, `no-avx-cpu`, `avx-cpu`, `no-avx-gpu`, `avx=gpu`. Users do not need to care about details. It sucks!


avx=gpu ==> avx-gpu

reyoung · 2017-03-16T02:54:03Z

paddle/cuda/Readme.md

+
+```c++
+ if (HAS_SIMD(SIMD_AVX2 | SIMD_FMA4)) {
+      avx2_fm4_stub();


xxx_xx_stub不太合适。

stub似乎指测试时候模拟某种行为的函数，好像是用在单元测试里的术语。

https://zh.wikipedia.org/wiki/%E6%A1%A9_(%E8%AE%A1%E7%AE%97%E6%9C%BA)

有道理。

reyoung · 2017-03-16T04:40:57Z

paddle/cuda/Readme.md

+`avx2_fm4_stub` and `sse3_stub` could be located in different directory:
+
+```text
+------x84---naive


Naive的实现应该是和x86, arm并列的一种实现。因为ARM和X86实现某一个操作的时候，应该会调用naive的代码进行尾数处理。

譬如，目录结构可以是

---simd `---naive `---X86 `--- AVX `---ARM

另外，谨慎怀疑我们应该直接使用一些封装好的，成熟的SIMD库做这个事情。直接把Paddle内部的SIMD优化删掉，依赖其他开源项目也许更方便。

例如 Vc

补充回复下， Vc 不太能用。因为那个库里面没有做运行时的实现切换。

所以还是把目录结构改一下。

hedaoyuan · 2017-03-13T08:57:22Z

paddle/cuda/Readme.md

@@ -0,0 +1,79 @@
+## Runtime Check SIMD for x86 architecture


这个design doc的标题可以修改一下，Runtime Check是其中一个工作；整个工作（或design）的目标应该是，只发布一个Paddle支持各种CPU环境。
我的理解这里至少包括三块工作，
a. Runtime Check
b. 代码修改和目录结构调整 (比如下面讲到的把naive/sse/avx的一些实现放到不同的目录里；另外，在issue #1116 里面也有相关的内容)
c. 最后是编译相关的工作；比如，如何编译一个即包含naive实现，也包含sse、avx实现的paddle（这里涉及编译以外，也涉及代码修改相关工作）。

虽然，后面的实现上也都写到了一些相关的技术细节，但建议design doc还是从整个工作包含哪些方面去讲，这样基于这个design doc可以创建一个个相关issue，比如cuda下面的那些代码怎么调整，把细节的讨论放到issue里面去（一些细节问题可以在issue里面讨论完之后再merge回这个design doc）。

hedaoyuan · 2017-03-17T07:15:56Z

paddle/cuda/Readme.md

+
+### How to implement it?
+
+Since the current `cuda` directory includes heterogeneous source code, we want to refactor `cuda` directory as follows:


我理解这里目录结构是一个example？有几个问题：

kernels是替换cuda目录？还是新增一个目录？

这里面好些文件名与当前paddle里面的文件对应不上，比如activation.cc是指的哪一个？

我的理解应该是替换，之后再issue里面讨论目录问题

hedaoyuan · 2017-03-17T07:21:32Z

paddle/cuda/Readme.md

+Here, each directory uses the different compile options (`-mavx` or `-msse`) to generate the corresponding binaries. Then, at
+runtime, `if(HAS_SIMD(__flags)` can select the supported branch (intrinsics) to execute.
+
+The method could fix the releases and deployment problems.


并不是所有代码都需要写naive和avx两份，有的只需要写naive一份，编译的时候通过是否加-mavx生成两份，对应到不同的CPU环境下执行。

是的，我的理解也是这样。

reyoung

LGTM 如果其他同学再看的话。

hedaoyuan · 2017-03-20T06:57:23Z

paddle/cuda/Readme.md

+
+3. [Pending] Modify CMake files.
+
+        Different simd intrinsics will be inside the different directories. we need to modified CMake files to support this solution. Each directory uses the different compile options (`-mavx` or `-msse`) to generate the corresponding binaries. Then, at runtime, using SIMD flags `HAS_AVX`, `HAS_SSE` automatically detect and select the supported branch (intrinsics) to execute.


这部分逻辑需要这么做再看一下，简单的做法#1634 (comment) 。

hedaoyuan · 2017-03-20T07:13:55Z

paddle/cuda/Readme.md

+
+2. [Pending] Adjust `cuda` Directory.
+
+        Since the current `cuda` directory includes heterogeneous source code (cpu and gpu), we want to refactor `cuda` directory. For simplicity, different simd intrinsics will be inside the different directories. we need to


cuda目录里面的代码需要调整，但是different simd intrinsics will be inside the different directories会怎加一些sse/avx目录，这样感觉并不是很好，每个目录里面可能没有几个文件；另外，我觉得相同功能的代码放在一起比相同指令集的代码放在一起更重要。

有一些开源库的做法是，把一些simd intrinsic做一层封装[fftw]，上层的功能都是基于这层封装开发的，毕竟大部分用intrinsic实现的功能都只是指令的不一样，而常用的指令也就是load、store、add、mul等

hedaoyuan · 2017-03-20T07:23:46Z

paddle/cuda/Readme.md

+
+### Conclusion
+
+The method could fix the releases and deployment problems.


所以，releases and deployment 时的环境不一致，从而带来的运行时的一些困惑是这个design的目的。而Single release for PaddlePaddle CPU Image 是其中一种解决方法；#1634 (comment) 是另外一种解决方法。需要再比较一下这两种方法。
另外，我的建议是，是否有必要去做Single release for PaddlePaddle CPU Image ；如果后续引入一些AVX2/AVX512的代码，当前的设计是否能够支持（3. [Pending] Modify CMake files里面只提到-mavx/-msse）？

luotao1 · 2017-12-08T06:50:27Z

@gangliao @Xreki @hedaoyuan @reyoung
How about this pull request: updated or closed? Since there is an extra branch in Paddle repo:

gangliao added 3 commits March 13, 2017 14:57

Runtime check SIMD desgin docs

e17e86b

Update docs

187fbb5

fix word errors

4e58a86

gangliao requested review from hedaoyuan, reyoung, livc and Xreki March 13, 2017 07:02

gangliao added 2 commits March 13, 2017 15:04

remove a word

4d975c9

remove some syntaxes

265a414

Xreki reviewed Mar 13, 2017

View reviewed changes

gangliao mentioned this pull request Mar 16, 2017

Improve the design doc of Docker build #1627

Merged

reyoung reviewed Mar 16, 2017

View reviewed changes

hedaoyuan reviewed Mar 17, 2017

View reviewed changes

refine avx docs

6ed54df

gangliao mentioned this pull request Mar 17, 2017

Refactor CUDA Directory #1634

Closed

reyoung approved these changes Mar 20, 2017

View reviewed changes

hedaoyuan requested changes Mar 20, 2017

View reviewed changes

gangliao changed the title ~~Runtime check SIMD design docs~~ Single release for PaddlePaddle CPU Image Mar 20, 2017

gangliao mentioned this pull request Mar 21, 2017

Add simd check and set SSE3 as default compilation #1666

Merged

Xreki mentioned this pull request Mar 24, 2017

Add cross-compiling support for arm architecture. #1698

Merged

livc removed their request for review December 8, 2017 06:54

luotao1 closed this Dec 19, 2017

luotao1 deleted the avx_docs branch December 19, 2017 07:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single release for PaddlePaddle CPU Image #1607

Single release for PaddlePaddle CPU Image #1607

gangliao commented Mar 13, 2017 •

edited

Loading

Xreki Mar 13, 2017

gangliao Mar 13, 2017

Xreki Mar 13, 2017

gangliao Mar 13, 2017 •

edited

Loading

gangliao Mar 13, 2017

Xreki Mar 13, 2017

gangliao Mar 13, 2017

Xreki Mar 13, 2017

gangliao Mar 13, 2017

gangliao Mar 13, 2017 •

edited

Loading

reyoung Mar 16, 2017

gangliao Mar 17, 2017

reyoung Mar 16, 2017

gangliao Mar 17, 2017

reyoung Mar 16, 2017

reyoung Mar 17, 2017

hedaoyuan Mar 13, 2017

gangliao Mar 17, 2017

hedaoyuan Mar 17, 2017

gangliao Mar 17, 2017

hedaoyuan Mar 17, 2017

gangliao Mar 17, 2017

reyoung left a comment

hedaoyuan Mar 20, 2017

hedaoyuan Mar 20, 2017

Xreki Mar 20, 2017

hedaoyuan Mar 20, 2017

luotao1 commented Dec 8, 2017


		### Background

		Currently, PaddlePaddle supports AVX and SSE3 intrinsics (extensions to the x86 instruction set architecture). When using CMake to compile PaddlePaddle source code, it will check and detect the host which SIMD instruction is supported, then automatically set the legal one. Developer or user also could manually set CMake option `WITH_AVX=ON/OFF` before PaddlePaddle compilation. That's good for local usage.


		### How to implement it?

		Since the current `cuda` directory includes heterogeneous source code, we want to refactor `cuda` directory as follows:


		3. [Pending] Modify CMake files.

		Different simd intrinsics will be inside the different directories. we need to modified CMake files to support this solution. Each directory uses the different compile options (`-mavx` or `-msse`) to generate the corresponding binaries. Then, at runtime, using SIMD flags `HAS_AVX`, `HAS_SSE` automatically detect and select the supported branch (intrinsics) to execute.


		2. [Pending] Adjust `cuda` Directory.

		Since the current `cuda` directory includes heterogeneous source code (cpu and gpu), we want to refactor `cuda` directory. For simplicity, different simd intrinsics will be inside the different directories. we need to


		### Conclusion

		The method could fix the releases and deployment problems.

Single release for PaddlePaddle CPU Image #1607

Single release for PaddlePaddle CPU Image #1607

Conversation

gangliao commented Mar 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gangliao Mar 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gangliao Mar 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reyoung left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 commented Dec 8, 2017

gangliao commented Mar 13, 2017 •

edited

Loading

gangliao Mar 13, 2017 •

edited

Loading

gangliao Mar 13, 2017 •

edited

Loading