Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tf32 support for A100 tensor core acceleration for cuBLAS #28732

Merged
merged 48 commits into from
Dec 15, 2020

Conversation

AshburnLee
Copy link
Contributor

@AshburnLee AshburnLee commented Nov 18, 2020

PR types

New features

PR changes

Others

Describe

功能:支持tf32 cublas开关。默认为开启,当用户设置为 false 时,将其关闭:
用法

def test_dygraph_without_out(self):
    if core.is_compiled_with_cuda():
        place = fluid.CUDAPlace(0)
        core.set_cublas_switch(0)          # turn off
        with fluid.dygraph.guard(place):
            input_array1 = np.random.rand(4, 12, 64, 88).astype("float32")
            input_array2 = np.random.rand(4, 12, 88, 512).astype("float32")
            data1 = paddle.to_tensor(input_array1)
            data2 = paddle.to_tensor(input_array2)
            out = paddle.matmul(data1, data2)
            expected_result = np.matmul(input_array1, input_array2)
        self.assertTrue(np.allclose(expected_result, out.numpy(), 1e-03))

    else:
        pass

效果
对于两个 [10240, 10240] 的矩阵相乘,
关闭开关的执行时间是:0.113 s
开启开关的执行时间是:0.017 s
开启的性能为关闭的 6.6倍

修复部分单测效果
由于精度 failed 的部分单测:
截屏2020-11-26 18 55 58

关闭 cublas 和 cudnn TF32 开关后(cudnn 开关在另一个 PR 中):
截屏2020-11-26 18 47 00

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle/fluid/platform/device_context.h Show resolved Hide resolved

void tf32_switch_on_off(bool active) { allow_tf32_cublas = active; }

bool get_tf32_switch() { return allow_tf32_cublas; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是不是放在cuda device context 里面比较好

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以的,会考虑在添加 cudnn 开关时一起修改,并且变量名函数名也会做相应修改。

@@ -0,0 +1,78 @@
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2020

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,54 @@
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2020

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@AshburnLee
Copy link
Contributor Author

关于函数名,变量名的问题,会在cudnn 的 PR 中做修改。

@AshburnLee AshburnLee changed the title Add tf32 support for A100 tensor core acceleration Add tf32 support for A100 tensor core acceleration for cuBLAS Nov 25, 2020
"""
get the state of tf32 switch.

Args:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

两处文档方面都可以参考下其他API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

暂时将 API 删除

wangchaochaohu
wangchaochaohu previously approved these changes Nov 30, 2020
Copy link
Contributor

@wangchaochaohu wangchaochaohu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -57,6 +57,10 @@ struct GpuDevice;
namespace paddle {
namespace platform {

#ifdef PADDLE_WITH_CUDA
static bool allow_tf32_cublas{true};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个变量为啥不作为device_context的成员呢?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

作为全局变量的话,在 python 端使用一个临时的 CUDADeviceContext,来拨动开关,改变这个全局变量的值,因为这个值的改变,其他CUDADeviceContext 对象调用AllowTF32Cublas() 时就会返回allow_tf32_cublas实时变化的值。随时开随时关。
把它作为全局,上述逻辑是清晰的。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用类的成员函数来set、get一个变量,那这个变量就应该定义成类的成员变量。如果是全局变量,就应该用全局函数来set和get,不要有这么奇怪的写法。

另外,即使作为类的成员变量,也自有更好的暴露到python端的办法,参考下pytorch吧。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已做修改。将函数写为全局函数。如果开关相关是 CUDADeviceContext 的类成员的话,开关所在的 CUDADeviceContext实例 不能控制 matmul 运算所在实例的 开启或关闭。开关无效。

所以将开关相关写在全局。

paddle/fluid/platform/device_context.h Show resolved Hide resolved
paddle/fluid/platform/device_context.h Show resolved Hide resolved
@@ -361,6 +361,12 @@ CUDADeviceContext::~CUDADeviceContext() {
#endif
}

void CUDADeviceContext::SetTF32Cublas(bool active) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数名应该叫SetAllowTF32Cublas

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

input_array1 = np.random.rand(4, 12, 64, 88).astype("float32")
input_array2 = np.random.rand(4, 12, 88, 512).astype("float32")
data1 = fluid.dygraph.to_variable(input_array1)
data2 = fluid.dygraph.to_variable(input_array2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

单测里面也建议统一用2.0API,fluid.dygraph.to_variable -> paddle.to_tensor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

paddle/fluid/platform/device_context.h Show resolved Hide resolved
@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Dec 2, 2020
@PaddlePaddle PaddlePaddle unlocked this conversation Dec 2, 2020
#if CUDA_VERSION >= 11000
if (AllowTF32Cublas()) {
cublas_handle_.reset(
new CublasHandleHolder(RawStream(), CUBLAS_TF32_TENSOR_OP_MATH));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每次调CublasCall的时候就重新new一个CublasHandlerHolder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是。调用初始化函数InitCuBlasContext() 时候会 new 一个,cublas_handle_指向的 handle 对应CUBLAS_DEFAULT_MATH。这里当 if 满足,cublas_handle_ 就需要指向一个新的 handle(其对应CUBLAS_TF32_TENSOR_OP_MATH)。

如果将这个 if 判断放入初始化函数InitCuBlasContext(),情况就会变成:新建一个 CUDADeviceContext 对象时,初始化函数InitCuBlasContext()就会被调用,而开关的状态在创建这个CUDADeviceContext 对象时就固定了,中途若没有创建新的 CUDADeviceContext对象,就不能拨动开关。如下代码片段:
截屏2020-12-07 18 36 32

所以此处将这个 if判断放在了 CublasCall 中。使得在一个 CUDADeviceContext 对象中可以随时拨动开关。

如果不每次调CublasCall的时候就重新new一个CublasHandlerHolder,就需要两个不同的 handle

Copy link
Contributor

@wangchaochaohu wangchaochaohu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wangchaochaohu wangchaochaohu merged commit efea540 into PaddlePaddle:develop Dec 15, 2020
@AshburnLee AshburnLee deleted the tf32 branch December 15, 2020 12:46
lanxianghit pushed a commit that referenced this pull request Jan 20, 2021
…uBLAS (#28732) (#30612)

* Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732)

* Fixed an error

* Fixed an error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants