Add tf32 support for A100 tensor core acceleration for cuBLAS #28732

AshburnLee · 2020-11-18T17:37:41Z

PR types

New features

PR changes

Others

Describe

功能：支持tf32 cublas开关。默认为开启，当用户设置为 false 时，将其关闭：
用法：

def test_dygraph_without_out(self):
    if core.is_compiled_with_cuda():
        place = fluid.CUDAPlace(0)
        core.set_cublas_switch(0)          # turn off
        with fluid.dygraph.guard(place):
            input_array1 = np.random.rand(4, 12, 64, 88).astype("float32")
            input_array2 = np.random.rand(4, 12, 88, 512).astype("float32")
            data1 = paddle.to_tensor(input_array1)
            data2 = paddle.to_tensor(input_array2)
            out = paddle.matmul(data1, data2)
            expected_result = np.matmul(input_array1, input_array2)
        self.assertTrue(np.allclose(expected_result, out.numpy(), 1e-03))

    else:
        pass

效果：
对于两个 [10240, 10240] 的矩阵相乘，
关闭开关的执行时间是：0.113 s
开启开关的执行时间是：0.017 s
开启的性能为关闭的 6.6倍

修复部分单测效果：
由于精度 failed 的部分单测：

关闭 cublas 和 cudnn TF32 开关后（cudnn 开关在另一个 PR 中）：

Update forked PaddlePaddle

Update my fork

update from PaddlePaddle

Update forked paddle repo

Update USERNAME/paddle

update Paddle USERNAME repo

update username repo

paddle-bot-old · 2020-11-18T17:37:47Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… tf32

paddle/fluid/platform/device_context.h

wangchaochaohu · 2020-11-23T08:19:56Z

paddle/fluid/platform/device_context.cc

+
+void tf32_switch_on_off(bool active) { allow_tf32_cublas = active; }
+
+bool get_tf32_switch() { return allow_tf32_cublas; }


是不是放在cuda device context 里面比较好

可以的，会考虑在添加 cudnn 开关时一起修改，并且变量名函数名也会做相应修改。

… tf32

wangchaochaohu · 2020-11-24T06:32:17Z

python/paddle/fluid/tf32_switch.py

@@ -0,0 +1,78 @@
+#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.


wangchaochaohu · 2020-11-24T06:32:25Z

python/paddle/fluid/tests/unittests/test_tf32.py

@@ -0,0 +1,54 @@
+#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.


… tf32

AshburnLee · 2020-11-25T02:22:49Z

关于函数名，变量名的问题，会在cudnn 的 PR 中做修改。

wangchaochaohu · 2020-11-25T03:27:34Z

python/paddle/fluid/tf32_switch.py

+    """
+    get the state of tf32 switch.
+
+    Args:


两处文档方面都可以参考下其他API

暂时将 API 删除

… tf32

Fixed unit test when no CUDA device is available

wangchaochaohu

LGTM

Xreki · 2020-12-01T12:10:18Z

paddle/fluid/platform/device_context.h

@@ -57,6 +57,10 @@ struct GpuDevice;
 namespace paddle {
 namespace platform {

+#ifdef PADDLE_WITH_CUDA
+static bool allow_tf32_cublas{true};


这个变量为啥不作为device_context的成员呢？

作为全局变量的话，在 python 端使用一个临时的 CUDADeviceContext，来拨动开关，改变这个全局变量的值，因为这个值的改变，其他CUDADeviceContext 对象调用AllowTF32Cublas() 时就会返回allow_tf32_cublas实时变化的值。随时开随时关。
把它作为全局，上述逻辑是清晰的。

使用类的成员函数来set、get一个变量，那这个变量就应该定义成类的成员变量。如果是全局变量，就应该用全局函数来set和get，不要有这么奇怪的写法。

另外，即使作为类的成员变量，也自有更好的暴露到python端的办法，参考下pytorch吧。

已做修改。将函数写为全局函数。如果开关相关是 CUDADeviceContext 的类成员的话，开关所在的 CUDADeviceContext实例不能控制 matmul 运算所在实例的开启或关闭。开关无效。

所以将开关相关写在全局。

paddle/fluid/platform/device_context.h

Xreki · 2020-12-01T12:15:16Z

paddle/fluid/platform/device_context.cc

@@ -361,6 +361,12 @@ CUDADeviceContext::~CUDADeviceContext() {
 #endif
 }

+void CUDADeviceContext::SetTF32Cublas(bool active) {


这个函数名应该叫SetAllowTF32Cublas

Xreki · 2020-12-01T12:18:23Z

python/paddle/fluid/tests/unittests/test_tf32_cublas.py

+                input_array1 = np.random.rand(4, 12, 64, 88).astype("float32")
+                input_array2 = np.random.rand(4, 12, 88, 512).astype("float32")
+                data1 = fluid.dygraph.to_variable(input_array1)
+                data2 = fluid.dygraph.to_variable(input_array2)


单测里面也建议统一用2.0API，fluid.dygraph.to_variable -> paddle.to_tensor

paddle/fluid/platform/device_context.h

… tf32

paddle/fluid/platform/device_context.cc

paddle/fluid/platform/device_context.h

… tf32

Xreki · 2020-12-07T06:26:31Z

paddle/fluid/platform/device_context.h

+#if CUDA_VERSION >= 11000
+    if (AllowTF32Cublas()) {
+      cublas_handle_.reset(
+          new CublasHandleHolder(RawStream(), CUBLAS_TF32_TENSOR_OP_MATH));


每次调CublasCall的时候就重新new一个CublasHandlerHolder？

是。调用初始化函数InitCuBlasContext() 时候会 new 一个，cublas_handle_指向的 handle 对应CUBLAS_DEFAULT_MATH。这里当 if 满足，cublas_handle_ 就需要指向一个新的 handle（其对应CUBLAS_TF32_TENSOR_OP_MATH）。

如果将这个 if 判断放入初始化函数InitCuBlasContext()，情况就会变成：新建一个 CUDADeviceContext 对象时，初始化函数InitCuBlasContext()就会被调用，而开关的状态在创建这个CUDADeviceContext 对象时就固定了，中途若没有创建新的 CUDADeviceContext对象，就不能拨动开关。如下代码片段：

所以此处将这个 if判断放在了 CublasCall 中。使得在一个 CUDADeviceContext 对象中可以随时拨动开关。

如果不每次调CublasCall的时候就重新new一个CublasHandlerHolder，就需要两个不同的 handle。

… tf32

wangchaochaohu

LGTM

…Paddle#28732)

…uBLAS (#28732) (#30612) * Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732) * Fixed an error * Fixed an error

AshburnLee added 8 commits September 8, 2020 09:45

Merge pull request #1 from PaddlePaddle/develop

8f532b0

Update forked PaddlePaddle

Merge pull request #2 from PaddlePaddle/develop

5b5804d

Update my fork

Merge pull request #3 from PaddlePaddle/develop

cee2470

update from PaddlePaddle

Merge pull request #4 from PaddlePaddle/develop

5be3a45

Update forked paddle repo

Merge pull request #5 from PaddlePaddle/develop

a1d92b7

Update USERNAME/paddle

Merge pull request #6 from PaddlePaddle/develop

e674a5d

update Paddle USERNAME repo

Merge pull request #7 from PaddlePaddle/develop

855d00b

update username repo

Add tf32 support for A100 tensor core acceleration

4885d51

AshburnLee added 2 commits November 23, 2020 07:35

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

bdf9170

… tf32

Update tf32 switch for cublas

68c517e

wangchaochaohu reviewed Nov 23, 2020

View reviewed changes

AshburnLee added 6 commits November 23, 2020 08:37

Fixed an error

7951c73

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

517c424

… tf32

Fixed the unit test and added sample code for python API

2084e5e

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e02e585

… tf32

Fixed sample code and unit-test cases for CPU device

2794d22

Fixed: name 'core' is not defined

2dec0bc

wangchaochaohu reviewed Nov 24, 2020

View reviewed changes

AshburnLee added 2 commits November 24, 2020 07:15

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

40170d2

… tf32

Fixed Copyright time

a76a4bd

AshburnLee changed the title ~~Add tf32 support for A100 tensor core acceleration~~ Add tf32 support for A100 tensor core acceleration for cuBLAS Nov 25, 2020

wangchaochaohu reviewed Nov 25, 2020

View reviewed changes

AshburnLee added 6 commits November 27, 2020 05:10

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

d13e3d6

… tf32

Changed function names

2a10933

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

494bb05

… tf32

Fixed errors

2f7d118

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ca0022a

… tf32

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e5caff6

… tf32

AshburnLee added 2 commits November 29, 2020 10:08

Added conditional compilation for global variable 'allow_tf32_cublas'

72abba1

Fixed unit test when no CUDA device is available

Merge branch 'tf32' of https://github.com/AshburnLee/Paddle into tf32

89ff47d

wangchaochaohu previously approved these changes Nov 30, 2020

View reviewed changes

Xreki reviewed Dec 1, 2020

View reviewed changes

AshburnLee added 2 commits December 1, 2020 17:21

Fixed a bug & made some changes based on review comments.

5e3097d

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

2bb958b

… tf32

AshburnLee dismissed wangchaochaohu’s stale review via 2bb958b December 1, 2020 17:25

Xreki reviewed Dec 2, 2020

View reviewed changes

paddle/fluid/platform/device_context.cc Show resolved Hide resolved

paddle/fluid/platform/device_context.h Show resolved Hide resolved

paddle/fluid/platform/device_context.h Show resolved Hide resolved

PaddlePaddle locked and limited conversation to collaborators Dec 2, 2020

PaddlePaddle unlocked this conversation Dec 2, 2020

AshburnLee added 7 commits December 2, 2020 05:49

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

4b39f17

… tf32

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

bb8019b

… tf32

Merge branch 'tf32' of https://github.com/AshburnLee/Paddle into tf32

edcedfb

Removed unnecessary cublas handles

967b5fe

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

0ba1b69

… tf32

Move two fucntions in platform namespace

0bbff6c

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

09484c5

… tf32

Xreki reviewed Dec 7, 2020

View reviewed changes

AshburnLee added 6 commits December 11, 2020 06:39

Fixed a math mode error

317f855

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

fb74d69

… tf32

Added support when compute_capability < 80 while CUDA_version = 11

61c96d0

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

1ae9967

… tf32

Revert to 3 handles

156a903

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

265d67d

… tf32

wangchaochaohu approved these changes Dec 15, 2020

View reviewed changes

wangchaochaohu merged commit efea540 into PaddlePaddle:develop Dec 15, 2020

AshburnLee deleted the tf32 branch December 15, 2020 12:46

AshburnLee added a commit to AshburnLee/Paddle that referenced this pull request Jan 20, 2021

Add tf32 support for A100 tensor core acceleration for cuBLAS (Paddle…

6dcc6e5

…Paddle#28732)

AshburnLee mentioned this pull request Jan 20, 2021

[cherry-pick]Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732) #30612

Merged

lanxianghit pushed a commit that referenced this pull request Jan 20, 2021

[cherry-pick]Add tf32 support for A100 tensor core acceleration for c…

fd9d6fd

…uBLAS (#28732) (#30612) * Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732) * Fixed an error * Fixed an error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tf32 support for A100 tensor core acceleration for cuBLAS #28732

Add tf32 support for A100 tensor core acceleration for cuBLAS #28732

AshburnLee commented Nov 18, 2020 •

edited

Loading

paddle-bot-old bot commented Nov 18, 2020

wangchaochaohu Nov 23, 2020

AshburnLee Nov 24, 2020

wangchaochaohu Nov 24, 2020

AshburnLee Nov 24, 2020

wangchaochaohu Nov 24, 2020

AshburnLee Nov 24, 2020

AshburnLee commented Nov 25, 2020

wangchaochaohu Nov 25, 2020

AshburnLee Dec 1, 2020

wangchaochaohu left a comment

Xreki Dec 1, 2020

AshburnLee Dec 1, 2020

Xreki Dec 2, 2020

AshburnLee Dec 7, 2020

Xreki Dec 1, 2020

AshburnLee Dec 1, 2020

Xreki Dec 1, 2020

AshburnLee Dec 1, 2020

Xreki Dec 7, 2020

AshburnLee Dec 7, 2020

wangchaochaohu left a comment


		void tf32_switch_on_off(bool active) { allow_tf32_cublas = active; }

		bool get_tf32_switch() { return allow_tf32_cublas; }

		@@ -0,0 +1,78 @@
		# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.

		@@ -0,0 +1,54 @@
		# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.

Add tf32 support for A100 tensor core acceleration for cuBLAS #28732

Add tf32 support for A100 tensor core acceleration for cuBLAS #28732

Conversation

AshburnLee commented Nov 18, 2020 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Nov 18, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AshburnLee commented Nov 25, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangchaochaohu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangchaochaohu left a comment

Choose a reason for hiding this comment

AshburnLee commented Nov 18, 2020 •

edited

Loading