-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single release for PaddlePaddle CPU Image #1607
Conversation
paddle/cuda/Readme.md
Outdated
|
||
### Background | ||
|
||
Currently, PaddlePaddle supports AVX and SSE3 intrinsics (extensions to the x86 instruction set architecture). When using CMake to compile PaddlePaddle source code, it will check and detect the host which SIMD instruction is supported, then automatically set the legal one. Developer or user also could manually set CMake option `WITH_AVX=ON/OFF` before PaddlePaddle compilation. That's good for local usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里我想强调一下host
和target
。目前我们常用的local usage
方式,host
就是target
,所以没有什么区别。如果host
和target
不同,check and detect the host which SIMD instruction is supported
其实没有意义。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
正是如此。
paddle/cuda/Readme.md
Outdated
} else if (HAS_SIMD(SIMD_SSE3)) { | ||
sse3_stub(); | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里接口上可以简单一些吧,比如可以定义一个SIMD
的宏(或者其他方式),然后调用出简化为SIMD(stub())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其实CpuID.h里面也是有宏的, 可以直接用, 函数调用和宏都支持
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/**
* @brief Check SIMD flags at runtime.
*
* 1. Check all SIMD flags at runtime:
*
* @code{.cpp}
* if (HAS_AVX && HAS_AVX2) {
* avx2_stub();
* }
* @endcod
*
* 2. Check one SIMD flag at runtime:
*
* @code{.cpp}
* if (HAS_SSE41 || HAS_SSE42) {
* sse4_stub();
* }
* @endcode
*/
paddle/cuda/Readme.md
Outdated
| |---sse3 -- sse3_stub() | ||
| | ||
arm--- ... | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naive
指的是纯c++的实现吧?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
恩, 顺手写了,当然可以直接放在x86目录下,不需要naive目录
paddle/cuda/Readme.md
Outdated
| | |- gru.cc | ||
| |- ... | ||
|- gpu -- ... | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里我有两点不确定:
- gpu的实现以后是不是都会移到
function
里面去 - 这个目录还需要保持
include
和src
两个目录吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- gpu这一块可以先放着不动,先增加一个cpu目录开展工作,并不会冲突。
- 觉得还是需要include目录来存放接口,清晰规范一些,方便调用。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naive目录直接去掉,相应代码直接放x86目录很合理。
paddle/cuda/Readme.md
Outdated
PaddlePaddle will crash and throw out `illegal instruction is used`. This problem will appear | ||
frequently on cluster environment, like Kubernetes. **It must be addressed before PaddlePaddle on Cloud** | ||
|
||
2. Once new version is ready to deliver, we have to release more products to users, for example, `no-avx-cpu`, `avx-cpu`, `no-avx-gpu`, `avx=gpu`. Users do not need to care about details. It sucks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avx=gpu
==> avx-gpu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
paddle/cuda/Readme.md
Outdated
|
||
```c++ | ||
if (HAS_SIMD(SIMD_AVX2 | SIMD_FMA4)) { | ||
avx2_fm4_stub(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xxx_xx_stub
不太合适。
stub
似乎指测试时候模拟某种行为的函数,好像是用在单元测试里的术语。
https://zh.wikipedia.org/wiki/%E6%A1%A9_(%E8%AE%A1%E7%AE%97%E6%9C%BA)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有道理。
paddle/cuda/Readme.md
Outdated
`avx2_fm4_stub` and `sse3_stub` could be located in different directory: | ||
|
||
```text | ||
------x84---naive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naive
的实现应该是和x86, arm并列的一种实现。因为ARM和X86实现某一个操作的时候,应该会调用naive的代码进行尾数处理。
譬如,目录结构可以是
---simd
`---naive
`---X86
`--- AVX
`---ARM
另外,谨慎怀疑我们应该直接使用一些封装好的,成熟的SIMD库做这个事情。直接把Paddle内部的SIMD优化删掉,依赖其他开源项目也许更方便。
例如 Vc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
补充回复下, Vc 不太能用。因为那个库里面没有做运行时的实现切换。
所以还是把目录结构改一下。
paddle/cuda/Readme.md
Outdated
@@ -0,0 +1,79 @@ | |||
## Runtime Check SIMD for x86 architecture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个design doc的标题可以修改一下,Runtime Check是其中一个工作;整个工作(或design)的目标应该是,只发布一个Paddle支持各种CPU环境。
我的理解这里至少包括三块工作,
a. Runtime Check
b. 代码修改和目录结构调整 (比如下面讲到的把naive/sse/avx的一些实现放到不同的目录里;另外,在issue #1116 里面也有相关的内容)
c. 最后是编译相关的工作;比如,如何编译一个即包含naive实现,也包含sse、avx实现的paddle(这里涉及编译以外,也涉及代码修改相关工作)。
虽然,后面的实现上也都写到了一些相关的技术细节,但建议design doc还是从整个工作包含哪些方面去讲,这样基于这个design doc可以创建一个个相关issue,比如cuda下面的那些代码怎么调整,把细节的讨论放到issue里面去(一些细节问题可以在issue里面讨论完之后再merge回这个design doc)。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
赞
paddle/cuda/Readme.md
Outdated
|
||
### How to implement it? | ||
|
||
Since the current `cuda` directory includes heterogeneous source code, we want to refactor `cuda` directory as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我理解这里目录结构是一个example?有几个问题:
- kernels是替换cuda目录?还是新增一个目录?
- 这里面好些文件名与当前paddle里面的文件对应不上,比如
activation.cc
是指的哪一个?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我的理解应该是替换,之后再issue里面讨论目录问题
paddle/cuda/Readme.md
Outdated
Here, each directory uses the different compile options (`-mavx` or `-msse`) to generate the corresponding binaries. Then, at | ||
runtime, `if(HAS_SIMD(__flags)` can select the supported branch (intrinsics) to execute. | ||
|
||
The method could fix the releases and deployment problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
并不是所有代码都需要写naive和avx两份,有的只需要写naive一份,编译的时候通过是否加-mavx生成两份,对应到不同的CPU环境下执行。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的,我的理解也是这样。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 如果其他同学再看的话。
|
||
3. [Pending] Modify CMake files. | ||
|
||
Different simd intrinsics will be inside the different directories. we need to modified CMake files to support this solution. Each directory uses the different compile options (`-mavx` or `-msse`) to generate the corresponding binaries. Then, at runtime, using SIMD flags `HAS_AVX`, `HAS_SSE` automatically detect and select the supported branch (intrinsics) to execute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这部分逻辑需要这么做再看一下,简单的做法#1634 (comment) 。
|
||
2. [Pending] Adjust `cuda` Directory. | ||
|
||
Since the current `cuda` directory includes heterogeneous source code (cpu and gpu), we want to refactor `cuda` directory. For simplicity, different simd intrinsics will be inside the different directories. we need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuda目录里面的代码需要调整,但是different simd intrinsics will be inside the different directories
会怎加一些sse/avx目录,这样感觉并不是很好,每个目录里面可能没有几个文件;另外,我觉得相同功能的代码放在一起比相同指令集的代码放在一起更重要。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有一些开源库的做法是,把一些simd intrinsic做一层封装[fftw],上层的功能都是基于这层封装开发的,毕竟大部分用intrinsic实现的功能都只是指令的不一样,而常用的指令也就是load、store、add、mul
等
|
||
### Conclusion | ||
|
||
The method could fix the releases and deployment problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
所以,releases and deployment 时的环境不一致,从而带来的运行时的一些困惑是这个design的目的。而Single release for PaddlePaddle CPU Image
是其中一种解决方法;#1634 (comment) 是另外一种解决方法。需要再比较一下这两种方法。
另外,我的建议是,是否有必要去做Single release for PaddlePaddle CPU Image
;如果后续引入一些AVX2/AVX512的代码,当前的设计是否能够支持(3. [Pending] Modify CMake files
里面只提到-mavx/-msse)?
@gangliao @Xreki @hedaoyuan @reyoung |
Please give me some
ack
.... 😄