Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HACKATHON 6th] Support compiling PaddlePaddle with Clang. #62565

Merged
merged 29 commits into from
Mar 15, 2024

Conversation

silverling
Copy link
Contributor

@silverling silverling commented Mar 8, 2024

PR types

Bug fixes

PR changes

Others

Description

本次 PR 包含一系列 commits 来修复用 Clang/LLVM 来编译 Paddle 的一些问题,使得目前可以通过 Clang 完整编译 Paddle。

由于 Paddle 开启了 -Wall -Werror 编译选项,将所有的 Warning 视为错误,而 Clang 的编译检查较为严格,相比 GCC 会有更多的 Warning 需要解决。

本次 PR 修复的问题包含多种,同一类的问题都在相同的 commit 中,每一种问题的具体描述如下:

  1. 变量类型不一致fix variable type inconsistent

    编译器可以隐式转化绝大部分基本类型的变量,其中一些可能产生错误的转化,则会提示 Warning。如将 unsigned int 转为 signed intint 转为 float 等。

  2. 使用未初始化的变量fix uninitialized but used variable

    有时候,变量的初始化是通过传递指针或地址交给一个函数来赋值,然而这种情况编译器不能正常判断,所以报错。不过,为了良好的编程习惯,还是建议将变量手动初始化。

  3. 子类声明无效的默认构造函数add flags "-Wno-defaulted-function-deleted" to avoid warning in GLOO

    例如在第三方依赖 GLOO 中的 https://github.com/ziyoujiyi/gloo/blob/8b6b61dfa0dca02b226a01262bfcf0484382048f/gloo/common/error.h#L25

// A base class for all gloo runtime errors
struct Exception : public std::runtime_error {
  Exception() = default;
  explicit Exception(const std::string& msg) : std::runtime_error(msg) {}
};

由于父类 std::runtime_error 没有默认的构造函数,所以 Exception() = default; 实际上是无效的,所以编译器会报错。由于 GLOO 是第三方依赖,无法直接修改其实现,所以在 flags.cmake 添加了 -Wno-defaulted-function-deleted

  1. Copy elision 编译优化fix copy elision
    现代编译器会默认启用 RVO 和 NRVO,在 C++17 标准之后,RVO 被要求必须实现,NRVO 则鼓励实现。现代编译器均会实现这两种优化。具体内容可搜索关键字了解,篇幅有限,简而言之,如果一个函数中实例化了一个对象,然后将其作为返回值,编译器会在调用该函数的地方直接实例化对象,而不是在函数中实例化,避免对象的拷贝。而 std::move 会阻止这种行为,可能造成额外的程序开销。

  2. struct 和 class 不匹配fix struct and class tag mismatch
    我们知道,structclass 唯一的区别是默认的访问权限不同,struct 默认为 publicclassprivate。在几乎所有情况下,我们可以混用这两个名字。而 Clang 编译器的 Warning 为:“class was previously declared as a struct; this is valid, but may result in linker errors under the Microsoft C++ ABI”,意思是,由于 C++ 的 mangling 机制,MSVC 可能会区别 structclass,导致链接时可能出现异常。为了统一起见,均修改为 class

  3. 缺失拷贝构造函数或拷贝赋值构造函数add copy or copy assignment constructor (three/five/zero rule)
    C++ 中有个约定叫 three/five/zero rule,意思是:当定义一个 Class 时,要么不明确声明任何构造函数,由编译器默认生成;要么声明普通构造函数、拷贝构造函数和拷贝赋值构造函数这三个;要么再加上移动构造函数和移动赋值构造函数。如果一个类定义了拷贝构造函数,则也应该定义普通构造函数和拷贝赋值构造函数,避免在代码中产生不一致的行为。

  4. 重载右加运算符fix operator+ infinite recursion

  5. 删除无用的 lambda 捕获remove unused lambda captures

  6. 运算符短路判断fix overlapping comparisons

  7. 子类 final 关键字add final keyword for final subclass
    如果一个子类中一些成员函数标记为 final,则该子类也应标记为 final

  8. 注释函数中未使用的局部变量comment out unused local variable in functions
    在 Paddle 的许多函数中存在一些声明、赋值,但并未使用的局部变量,可能是历史遗留原因。本次 PR 先将其全部注释。

  9. 不兼容的 C ABI 暴露fix C-compatible declaration for C linkage

  10. 子类 override 关键字add override keyword

  11. 添加宏判断add macro check for custom device

  12. 调整 rocksdb 编译选项fix rocksdb compatibility with clang
    调整了编译选项,避免 Clang 将不能识别的选项报错

  13. 使用 C++17 参数展开语法use C++17 pack arguments expansion

  14. 添加额外的编译选项来抑制错误suppress several warning
    Paddle 以及第三方依赖中存在大量以下问题,包括:

  • 定义但未使用的成员私有变量

  • 定义但未使用的常量

  • 重载虚函数时没有添加 override 关键字(大部分在 paddle/fluid/distributed 模块下)

  • 重载虚函数时函数签名不一致(同上)

    这些问题部分来自于历史遗留,部分源于模块还在实现,其最终的修改还需对应模块的研发人员来确定,目前先通过在 flags.cmake 中添加警告抑制选项来完成编译。

后续的开发,可以通过移除某类警告的抑制选项,然后由相关人员根据编译器报错进行排查处理。

本次 PR 的本地测试环境有限,不能覆盖所有编译场景与配置,还需后续继续完善,以及对应的 Clang 编译 CI 支持。

备注:

  • 在编译 CUDA 程序时,nvcc 会调用系统默认编译器来编译其中的 C++ Host 程序,可能会绕过 Clang,可以添加编译选项 --compiler-bindir=<path/to/clang> 来指定。本次 PR 没有指定,也即依旧使用 GCC 编译 Host 程序。后续可以改进。

Copy link

paddle-bot bot commented Mar 8, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Mar 8, 2024
paddle/phi/kernels/cpu/send_uv_kernel.cc Show resolved Hide resolved
@@ -870,7 +870,7 @@ struct MergeAverage<phi::CPUContext, T> {
input_height,
input->height(),
phi::errors::InvalidArgument("All input should have same height."));
row_num += input->rows().size();
// row_num += input->rows().size();
merged_row_set.insert(input->rows().begin(), input->rows().end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同理,删除这些注释

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -20,7 +20,7 @@ limitations under the License. */

namespace phi {

class MetaConfig;
struct MetaConfig;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?这里用struct有问题吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

见上面的第 5 条:

5. struct 和 class 不匹配fix struct and class tag mismatch

我们知道,structclass 唯一的区别是默认的访问权限不同,struct 默认为 publicclassprivate。在几乎所有情况下,我们可以混用这两个名字。而 Clang 编译器的 Warning 为:“class was previously declared as a struct; this is valid, but may result in linker errors under the Microsoft C++ ABI”,意思是,由于 C++ 的 mangling 机制,MSVC 可能会区别 structclass,导致链接时可能出现异常。为了统一起见,均修改为 class

一般来说,混用 class 和 struct 几乎不会有影响,但编译器会警告,从而报错

@@ -3168,7 +3168,7 @@ void Pool2DInferMeta(const MetaTensor& x,
(data_format == "NHWC" || data_format == "NDHWC");
if (!config.is_runtime && kernel_size.FromTensor()) {
auto x_dims = x.dims();
std::vector<int64_t> output_shape = std::move(common::vectorize(x_dims));
std::vector<int64_t> output_shape = common::vectorize(x_dims);
// set dims of HW -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common::vectorize(x_dims)这样的右值都不在需使用std::move来转换吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

通过编译器的 NRVO 优化,在 common::vectorize() 函数中的 result 变量会直接在 output_shape = ... 这里实例化。

void StartRecord();
void EndRecord();
void StartRecord() override;
void EndRecord() override;
void ClearRecord() override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

覆盖基类中的虚函数,不加overide为啥gcc没有监控到啊,按道理gcc也能监控到啊

current_timepoint - start_time_);
// auto current_timepoint = std::chrono::steady_clock::now();
// auto time_elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(
// current_timepoint - start_time_);
auto global_ranks =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同理删除注释,有很多这种注释的都删了吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

#ifdef PADDLE_WITH_CUSTOM_DEVICE
else if (phi::CustomContext::classof(&dev_ctx)) { // NOLINT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果不加宏判断会报错吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不会报错,这个 commit 主要是添加宏判断来避免不必要的头文件引入

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted

-Wno-deprecated-copy-with-user-provided-copy # For three/five/zeros rule, clang
-Wno-inconsistent-missing-override # For lots of warnings when not using override for virtual functions, clang
-Wno-bitwise-instead-of-logical # Warning in "unsupported/Eigen/CXX11/Tensor"
-Wno-defaulted-function-deleted # Warning in GLOO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no-defaulted-function-deleted建议加载gloo.cmake,不要放在flags.cmake里面,同理,如果有一些第三方库报了很多warning,可以加在第三方库对应的.cmake文件里,flags.cmake不建议放这些,之前删了很多这里的flags

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -156,8 +156,22 @@ if(NOT WIN32)
-Wno-error=ignored-attributes # Warnings in Eigen, gcc 6.3
-Wno-error=int-in-bool-context # Warning in Eigen gcc 7.2
-Wimplicit-fallthrough=0 # Warning in tinyformat.h
-Wno-overloaded-virtual # For some inconsistent virtual function signature, clang
-Wno-deprecated-copy # Same above
-Wno-unused-const-variable
${fsanitize})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上边加的几个编译器选项gcc和clang都有这个问题吗?是不是也加一些判断比较好?如果是clang编译器再加?然后如果是第三方库的问题,放在第三方库的.CMAKE里

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@risemeup1 risemeup1 changed the title [HACKATHON 6th][CMake Optimization] Fix clang compatibility [HACKATHON 6th] Support compiling PaddlePaddle with Clang. Mar 8, 2024
Copy link
Contributor

@risemeup1 risemeup1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work

@@ -260,7 +260,7 @@ void AllReduceOpHandle::AllReduceFunc(

size_t size =
numel * SizeOfType(framework::TransToProtoVarType(trg.dtype()));
RunAndRecordEvent(p, [&trg, var, p, size] {
RunAndRecordEvent(p, [&trg, var, size] {
auto dst_ptr = var->GetMutable<phi::DenseTensor>()->data();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为啥要这应改啊

Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for const_cast

Copy link
Contributor

@jiahy0825 jiahy0825 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for PADDLE_API

@risemeup1 risemeup1 merged commit b73c59a into PaddlePaddle:develop Mar 15, 2024
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants