[TOPI][CUDA] Enable vectorization on fp16 type #4867

wpan11nv · 2020-02-11T23:06:03Z

This allows to better utilize the memory bandwidth
Note that not all cases are vectorized for fp16 datatype. For
instance, when the size is not a multiple of 1024, the inner loop
may be an expression that cannot be vectorized. In this case, a
small inner loop is still benefical for latency hidding.

Signed-off-by: Wei Pan [email protected]

Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

tqchen · 2020-02-12T00:48:52Z

Please request reviews from reviewers

wpan11nv · 2020-02-12T18:39:45Z

Kindly ping. Thanks!

tqchen · 2020-02-12T20:10:10Z

@vinx13 @Laurawly please help to review the PR

vinx13 · 2020-02-12T20:47:00Z

topi/tests/python/test_topi_tensor.py

+        if not tvm.runtime.enabled(device):
+            print("Skip because %s is not enabled" % device)
+            return
+        with tvm.target.create(device):


add a check whether fp16 is supported here and also verify_relu
https://github.com/apache/incubator-tvm/blob/aaf62e47e64d592be770e915a7aa59d41eddb729/topi/tests/python/test_topi_transform.py#L57-L59

Laurawly · 2020-02-12T21:14:45Z

topi/tests/python/test_topi_relu.py

@@ -87,12 +87,12 @@ def _prelu_numpy(x, W):
    tvm.testing.assert_allclose(b.asnumpy(), out_np, rtol=1e-5)

 def test_relu():
-    verify_relu(10, 128)
+    verify_relu(128, 128, "float32")


Can you keep a test case as before where m and n have different values?

Laurawly · 2020-02-12T21:16:25Z

topi/tests/python/test_topi_tensor.py

+        check_device(device)
+
+def test_vectorization():
+    verify_vectorization(128, 128, "float16")


wpan11nv · 2020-02-13T01:05:05Z

Thanks all for the suggestions! Tests updated.

vinx13 · 2020-02-13T16:49:33Z

topi/tests/python/test_topi_relu.py

 from common import get_all_backend

-def verify_relu(m, n):
-    A = tvm.placeholder((m, n), name='A')
+def skipTest(dtype, device):


nit: we prefer skip_test style naming

Fixed. Thanks!

- This allows to better utilize the memory bandwidth - Note that not all cases are vectorized for fp16 datatype. For instance, when the size is not a multiple of 1024, the inner loop may be an expression that cannot be vectorized. In this case, a small inner loop is still benefical for latency hidding. Signed-off-by: Wei Pan <[email protected]>

tqchen · 2020-02-14T04:24:35Z

Thanks @vinx13 @wpan11nv !

- This allows to better utilize the memory bandwidth - Note that not all cases are vectorized for fp16 datatype. For instance, when the size is not a multiple of 1024, the inner loop may be an expression that cannot be vectorized. In this case, a small inner loop is still benefical for latency hidding. Signed-off-by: Wei Pan <[email protected]>

tqchen assigned vinx13 Feb 12, 2020

tqchen added the status: need review label Feb 12, 2020

vinx13 requested changes Feb 12, 2020

View reviewed changes

Laurawly reviewed Feb 12, 2020

View reviewed changes

wpan11nv force-pushed the topi_fp16 branch from 454ef07 to 9298258 Compare February 13, 2020 01:02

wpan11nv force-pushed the topi_fp16 branch from 9298258 to 4707461 Compare February 13, 2020 01:09

vinx13 reviewed Feb 13, 2020

View reviewed changes

wpan11nv force-pushed the topi_fp16 branch from 4707461 to 8df9773 Compare February 13, 2020 17:37

vinx13 approved these changes Feb 13, 2020

View reviewed changes

tqchen merged commit 7013fc9 into apache:master Feb 14, 2020

tqchen added status: accepted and removed status: need review labels Feb 14, 2020

wpan11nv deleted the topi_fp16 branch February 14, 2020 17:21

ZihengJiang mentioned this pull request Sep 17, 2020

TVM v0.7 Release Note Candidate #6486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TOPI][CUDA] Enable vectorization on fp16 type #4867

[TOPI][CUDA] Enable vectorization on fp16 type #4867

wpan11nv commented Feb 11, 2020

tqchen commented Feb 12, 2020

wpan11nv commented Feb 12, 2020

tqchen commented Feb 12, 2020

vinx13 Feb 12, 2020

Laurawly Feb 12, 2020

Laurawly Feb 12, 2020

wpan11nv commented Feb 13, 2020

vinx13 Feb 13, 2020

wpan11nv Feb 13, 2020

tqchen commented Feb 14, 2020

[TOPI][CUDA] Enable vectorization on fp16 type #4867

[TOPI][CUDA] Enable vectorization on fp16 type #4867

Conversation

wpan11nv commented Feb 11, 2020

tqchen commented Feb 12, 2020

wpan11nv commented Feb 12, 2020

tqchen commented Feb 12, 2020

vinx13 Feb 12, 2020

Choose a reason for hiding this comment

Laurawly Feb 12, 2020

Choose a reason for hiding this comment

Laurawly Feb 12, 2020

Choose a reason for hiding this comment

wpan11nv commented Feb 13, 2020

vinx13 Feb 13, 2020

Choose a reason for hiding this comment

wpan11nv Feb 13, 2020

Choose a reason for hiding this comment

tqchen commented Feb 14, 2020