-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HACKATHON 6th] Support compiling PaddlePaddle with Clang. #62565
[HACKATHON 6th] Support compiling PaddlePaddle with Clang. #62565
Conversation
…n-deleted" to avoid warning in GLOO
…ructor (three/five/zero rule)
你的PR提交成功,感谢你对开源项目的贡献! |
@@ -870,7 +870,7 @@ struct MergeAverage<phi::CPUContext, T> { | |||
input_height, | |||
input->height(), | |||
phi::errors::InvalidArgument("All input should have same height.")); | |||
row_num += input->rows().size(); | |||
// row_num += input->rows().size(); | |||
merged_row_set.insert(input->rows().begin(), input->rows().end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同理,删除这些注释
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -20,7 +20,7 @@ limitations under the License. */ | |||
|
|||
namespace phi { | |||
|
|||
class MetaConfig; | |||
struct MetaConfig; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?这里用struct有问题吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
见上面的第 5 条:
5. struct 和 class 不匹配:fix struct and class tag mismatch
我们知道,
struct
和class
唯一的区别是默认的访问权限不同,struct
默认为public
,class
为private
。在几乎所有情况下,我们可以混用这两个名字。而 Clang 编译器的 Warning 为:“class was previously declared as a struct; this is valid, but may result in linker errors under the Microsoft C++ ABI”,意思是,由于 C++ 的 mangling 机制,MSVC 可能会区别struct
和class
,导致链接时可能出现异常。为了统一起见,均修改为class
。
一般来说,混用 class 和 struct 几乎不会有影响,但编译器会警告,从而报错
@@ -3168,7 +3168,7 @@ void Pool2DInferMeta(const MetaTensor& x, | |||
(data_format == "NHWC" || data_format == "NDHWC"); | |||
if (!config.is_runtime && kernel_size.FromTensor()) { | |||
auto x_dims = x.dims(); | |||
std::vector<int64_t> output_shape = std::move(common::vectorize(x_dims)); | |||
std::vector<int64_t> output_shape = common::vectorize(x_dims); | |||
// set dims of HW -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common::vectorize(x_dims)这样的右值都不在需使用std::move来转换吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
通过编译器的 NRVO 优化,在 common::vectorize()
函数中的 result
变量会直接在 output_shape = ...
这里实例化。
void StartRecord(); | ||
void EndRecord(); | ||
void StartRecord() override; | ||
void EndRecord() override; | ||
void ClearRecord() override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
覆盖基类中的虚函数,不加overide为啥gcc没有监控到啊,按道理gcc也能监控到啊
current_timepoint - start_time_); | ||
// auto current_timepoint = std::chrono::steady_clock::now(); | ||
// auto time_elapsed = std::chrono::duration_cast<std::chrono::milliseconds>( | ||
// current_timepoint - start_time_); | ||
auto global_ranks = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同理删除注释,有很多这种注释的都删了吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
#ifdef PADDLE_WITH_CUSTOM_DEVICE | ||
else if (phi::CustomContext::classof(&dev_ctx)) { // NOLINT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果不加宏判断会报错吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不会报错,这个 commit 主要是添加宏判断来避免不必要的头文件引入
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted
cmake/flags.cmake
Outdated
-Wno-deprecated-copy-with-user-provided-copy # For three/five/zeros rule, clang | ||
-Wno-inconsistent-missing-override # For lots of warnings when not using override for virtual functions, clang | ||
-Wno-bitwise-instead-of-logical # Warning in "unsupported/Eigen/CXX11/Tensor" | ||
-Wno-defaulted-function-deleted # Warning in GLOO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no-defaulted-function-deleted建议加载gloo.cmake,不要放在flags.cmake里面,同理,如果有一些第三方库报了很多warning,可以加在第三方库对应的.cmake文件里,flags.cmake不建议放这些,之前删了很多这里的flags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -156,8 +156,22 @@ if(NOT WIN32) | |||
-Wno-error=ignored-attributes # Warnings in Eigen, gcc 6.3 | |||
-Wno-error=int-in-bool-context # Warning in Eigen gcc 7.2 | |||
-Wimplicit-fallthrough=0 # Warning in tinyformat.h | |||
-Wno-overloaded-virtual # For some inconsistent virtual function signature, clang | |||
-Wno-deprecated-copy # Same above | |||
-Wno-unused-const-variable | |||
${fsanitize}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
上边加的几个编译器选项gcc和clang都有这个问题吗?是不是也加一些判断比较好?如果是clang编译器再加?然后如果是第三方库的问题,放在第三方库的.CMAKE里
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work
@@ -260,7 +260,7 @@ void AllReduceOpHandle::AllReduceFunc( | |||
|
|||
size_t size = | |||
numel * SizeOfType(framework::TransToProtoVarType(trg.dtype())); | |||
RunAndRecordEvent(p, [&trg, var, p, size] { | |||
RunAndRecordEvent(p, [&trg, var, size] { | |||
auto dst_ptr = var->GetMutable<phi::DenseTensor>()->data(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为啥要这应改啊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for const_cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for PADDLE_API
PR types
Bug fixes
PR changes
Others
Description
本次 PR 包含一系列 commits 来修复用 Clang/LLVM 来编译 Paddle 的一些问题,使得目前可以通过 Clang 完整编译 Paddle。
由于 Paddle 开启了
-Wall -Werror
编译选项,将所有的 Warning 视为错误,而 Clang 的编译检查较为严格,相比 GCC 会有更多的 Warning 需要解决。本次 PR 修复的问题包含多种,同一类的问题都在相同的 commit 中,每一种问题的具体描述如下:
变量类型不一致:fix variable type inconsistent
编译器可以隐式转化绝大部分基本类型的变量,其中一些可能产生错误的转化,则会提示 Warning。如将
unsigned int
转为signed int
,int
转为float
等。使用未初始化的变量:fix uninitialized but used variable
有时候,变量的初始化是通过传递指针或地址交给一个函数来赋值,然而这种情况编译器不能正常判断,所以报错。不过,为了良好的编程习惯,还是建议将变量手动初始化。
子类声明无效的默认构造函数:add flags "-Wno-defaulted-function-deleted" to avoid warning in GLOO
例如在第三方依赖 GLOO 中的 https://github.com/ziyoujiyi/gloo/blob/8b6b61dfa0dca02b226a01262bfcf0484382048f/gloo/common/error.h#L25
由于父类
std::runtime_error
没有默认的构造函数,所以Exception() = default;
实际上是无效的,所以编译器会报错。由于 GLOO 是第三方依赖,无法直接修改其实现,所以在flags.cmake
添加了-Wno-defaulted-function-deleted
Copy elision 编译优化:fix copy elision
现代编译器会默认启用 RVO 和 NRVO,在 C++17 标准之后,RVO 被要求必须实现,NRVO 则鼓励实现。现代编译器均会实现这两种优化。具体内容可搜索关键字了解,篇幅有限,简而言之,如果一个函数中实例化了一个对象,然后将其作为返回值,编译器会在调用该函数的地方直接实例化对象,而不是在函数中实例化,避免对象的拷贝。而
std::move
会阻止这种行为,可能造成额外的程序开销。struct 和 class 不匹配:fix struct and class tag mismatch
我们知道,
struct
和class
唯一的区别是默认的访问权限不同,struct
默认为public
,class
为private
。在几乎所有情况下,我们可以混用这两个名字。而 Clang 编译器的 Warning 为:“class was previously declared as a struct; this is valid, but may result in linker errors under the Microsoft C++ ABI”,意思是,由于 C++ 的 mangling 机制,MSVC 可能会区别struct
和class
,导致链接时可能出现异常。为了统一起见,均修改为class
。缺失拷贝构造函数或拷贝赋值构造函数:add copy or copy assignment constructor (three/five/zero rule)
C++ 中有个约定叫 three/five/zero rule,意思是:当定义一个 Class 时,要么不明确声明任何构造函数,由编译器默认生成;要么声明普通构造函数、拷贝构造函数和拷贝赋值构造函数这三个;要么再加上移动构造函数和移动赋值构造函数。如果一个类定义了拷贝构造函数,则也应该定义普通构造函数和拷贝赋值构造函数,避免在代码中产生不一致的行为。
重载右加运算符:fix operator+ infinite recursion
删除无用的 lambda 捕获:remove unused lambda captures
运算符短路判断:fix overlapping comparisons
子类 final 关键字: add final keyword for final subclass
如果一个子类中一些成员函数标记为
final
,则该子类也应标记为final
注释函数中未使用的局部变量:comment out unused local variable in functions
在 Paddle 的许多函数中存在一些声明、赋值,但并未使用的局部变量,可能是历史遗留原因。本次 PR 先将其全部注释。
不兼容的 C ABI 暴露:fix C-compatible declaration for C linkage
子类 override 关键字:add override keyword
添加宏判断:add macro check for custom device
调整 rocksdb 编译选项:fix rocksdb compatibility with clang
调整了编译选项,避免 Clang 将不能识别的选项报错
使用 C++17 参数展开语法:use C++17 pack arguments expansion
添加额外的编译选项来抑制错误:suppress several warning
Paddle 以及第三方依赖中存在大量以下问题,包括:
定义但未使用的成员私有变量
定义但未使用的常量
重载虚函数时没有添加 override 关键字(大部分在
paddle/fluid/distributed
模块下)重载虚函数时函数签名不一致(同上)
这些问题部分来自于历史遗留,部分源于模块还在实现,其最终的修改还需对应模块的研发人员来确定,目前先通过在
flags.cmake
中添加警告抑制选项来完成编译。后续的开发,可以通过移除某类警告的抑制选项,然后由相关人员根据编译器报错进行排查处理。
本次 PR 的本地测试环境有限,不能覆盖所有编译场景与配置,还需后续继续完善,以及对应的 Clang 编译 CI 支持。
备注:
--compiler-bindir=<path/to/clang>
来指定。本次 PR 没有指定,也即依旧使用 GCC 编译 Host 程序。后续可以改进。