Skip to content

Commit

Permalink
[TIR] Support tensorization using ldmatrix + MMA (#11355)
Browse files Browse the repository at this point in the history
* [TIR] Support tensorization using ldmatrix + MMA

commit 3218fac
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 14:04:56 2022 +0900

    some clean up

commit 7a235b6
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 13:55:11 2022 +0900

    parameterize over storage scope in mma store intrin

commit 827ea4c
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 13:37:38 2022 +0900

    properly handle floordiv/mod in codegen

commit 42d4c6f
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 09:53:57 2022 +0900

    update tuned factors for fp16

commit 328d0aa
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 08:43:30 2022 +0900

    all tests working

commit 5e086cf
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 07:48:43 2022 +0900

    add doc for mma_fill and mma_store intrin

commit 4f945c4
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 18 06:39:01 2022 +0900

    remove tests

commit df7708f
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 19:52:14 2022 +0900

    unified test

commit 754c83e
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 19:36:24 2022 +0900

    clean up LowerWarpmemory

commit 178c3dc
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 19:15:04 2022 +0900

    Use IndexMap

commit 07fb589
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 17:51:44 2022 +0900

    remove 16x8x8 test

commit 2b05b5a
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 17:31:35 2022 +0900

    generate mma fill/store

commit bf23fc5
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 12:23:30 2022 +0900

    mma intrin generation with meta programming

commit 5afb5f0
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 17 05:26:14 2022 +0900

    ldmatrix intrin generation with meta programming

commit fb62abb
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 20:30:49 2022 +0900

    minor

commit 5a80adc
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 19:55:57 2022 +0900

    revert some change

commit e599a55
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 19:54:18 2022 +0900

    remove obsolete files

commit 4b13b85
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 19:51:21 2022 +0900

    wip

commit 848de63
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 19:44:29 2022 +0900

    wip

commit b35bff9
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 19:31:18 2022 +0900

    update parse error msg

commit ad9b053
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 19:26:51 2022 +0900

    fix for avoiding Buffer.vload(...) case

commit 54c6864
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 18:59:55 2022 +0900

    wip

commit 078060f
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 18:57:34 2022 +0900

    wip

commit 576f841
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 18:52:15 2022 +0900

    wip

commit 12a376a
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 17:54:58 2022 +0900

    Squashed commit of the following:

    commit 48eef49
    Author: Masahiro Masuda <[email protected]>
    Date:   Mon May 16 17:40:48 2022 +0900

        more comment

    commit 8f67fc8
    Author: Masahiro Masuda <[email protected]>
    Date:   Mon May 16 17:11:27 2022 +0900

        update test

    commit ad85036
    Author: Masahiro Masuda <[email protected]>
    Date:   Mon May 16 16:54:01 2022 +0900

        add test

    commit 4a5dc3f
    Author: Masahiro Masuda <[email protected]>
    Date:   Mon May 16 16:40:47 2022 +0900

        [TVMScript] Support function call to help construct AST

commit 76c1bcf
Author: Masahiro Masuda <[email protected]>
Date:   Mon May 16 16:30:07 2022 +0900

    simplify iterator in layout transform

commit 9362803
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 11:31:39 2022 +0900

    remove obsolet files

commit 2e119b4
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 10:43:59 2022 +0900

    calculate mma store dst index using inverse affine map

commit 9489434
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 10:01:12 2022 +0900

    simplify store

commit 1adcb77
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 09:43:40 2022 +0900

    simplified fill

commit 7b13c73
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 09:22:17 2022 +0900

    simplify intrin desc using index map function

commit bcf212d
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 07:16:42 2022 +0900

    seems to work

commit dd8ccf9
Author: Masahiro Masuda <[email protected]>
Date:   Sat May 14 07:11:57 2022 +0900

    poking with the parser

commit 596582c
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 20:04:59 2022 +0900

    16x8x32 4k trans working

commit 273f89a
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 19:52:13 2022 +0900

    add 16x8x16 fp16 trans

commit 8e2066c
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 19:32:37 2022 +0900

    16x8x16 4k trans working

commit c2d0744
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 19:25:52 2022 +0900

    16x8x16 trans working

commit c2e314c
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 16:19:32 2022 +0900

    tuned int8 4k, 91 TOPS

commit 94d9d96
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 15:59:33 2022 +0900

    int8 4k tune working

commit 3ca8ca0
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 08:43:57 2022 +0900

    mma 16x8x32 int8 working with ldmatrix b workaround

commit 54f1cb7
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 18:23:27 2022 +0900

    wip

commit 9d2844d
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 16:38:53 2022 +0900

    test tensorize without layout transform

commit 86ee6da
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 15:15:34 2022 +0900

    int8 4k tensorize works

commit 39f9e32
Author: Masahiro Masuda <[email protected]>
Date:   Fri May 13 12:44:39 2022 +0900

    begin int8 4k tune

commit 6fa91e5
Author: Masahiro Masuda <[email protected]>
Date:   Thu May 12 18:53:20 2022 +0900

    try fix ldmatrix b for int8

commit 7a962cd
Author: Masahiro Masuda <[email protected]>
Date:   Thu May 12 18:28:34 2022 +0900

    fixed warp_coeff

commit a0afb56
Author: Masahiro Masuda <[email protected]>
Date:   Thu May 12 12:20:01 2022 +0900

    wip

commit f70ccd0
Author: Masahiro Masuda <[email protected]>
Date:   Thu May 12 12:09:57 2022 +0900

    int8 tensorize working

commit 20321fa
Author: Masahiro Masuda <[email protected]>
Date:   Thu May 12 07:06:22 2022 +0900

    starting 16x8x32 int8

commit 441fd19
Author: Masahiro Masuda <[email protected]>
Date:   Thu May 12 05:50:46 2022 +0900

    adding fp16 accum case

commit c9d40b6
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 17:04:29 2022 +0900

    clean up

commit 5b2d486
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 16:38:19 2022 +0900

    16x8x16 4k tune working

commit c3cb170
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 16:20:27 2022 +0900

    tensoriz fixed

commit 68039b0
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 15:55:25 2022 +0900

    begin 16x8x16 4k tune

commit ced5d8d
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 15:50:11 2022 +0900

    16x8x16 worked

commit 3d2c90d
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 15:47:26 2022 +0900

    fix

commit 403050b
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 15:45:10 2022 +0900

    add 16x8x16 test

commit 18e8d73
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 06:50:32 2022 +0900

    fixed mma store codegen for 16x8x16

commit ec81250
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 04:25:25 2022 +0900

    add 16x8x16 mma store codegen

commit e08df2a
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 03:47:47 2022 +0900

    tensorized C_warp init

commit ae06789
Author: Masahiro Masuda <[email protected]>
Date:   Wed May 11 03:06:06 2022 +0900

    mma store codegen working

commit deb4d66
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 10 19:22:57 2022 +0900

    update lower warp memory

commit 71fe5fe
Author: Masahiro Masuda <[email protected]>
Date:   Tue May 10 09:01:42 2022 +0900

    tensorizing mma store

commit e80a1f1
Author: Masahiro Masuda <[email protected]>
Date:   Thu Apr 28 19:54:08 2022 +0900

    clean up

commit a9640f4
Author: Masahiro Masuda <[email protected]>
Date:   Thu Apr 28 19:40:55 2022 +0900

    add tunable 4k test, 36 TFLOPS

commit b9f7eae
Author: Masahiro Masuda <[email protected]>
Date:   Thu Apr 28 18:01:08 2022 +0900

    fixed bug in LowerWarpMemory index splitting for ldmatrix

commit 00df308
Author: Masahiro Masuda <[email protected]>
Date:   Wed Apr 27 07:58:17 2022 +0900

    fixed missing reverse_compute_at

commit 93f9fe7
Author: Masahiro Masuda <[email protected]>
Date:   Wed Apr 27 06:55:12 2022 +0900

    add 4k test

commit 3689ef7
Author: Masahiro Masuda <[email protected]>
Date:   Wed Apr 27 06:54:09 2022 +0900

    temp disable high dim base indices check in tensorize

commit 0c859c4
Author: Masahiro Masuda <[email protected]>
Date:   Tue Apr 26 19:18:23 2022 +0900

    clean up

commit f6aadbf
Author: Masahiro Masuda <[email protected]>
Date:   Tue Apr 26 19:13:09 2022 +0900

    Add 16x8x8 MMA + LDMatrix test

commit 4cf6b20
Author: Masahiro Masuda <[email protected]>
Date:   Tue Apr 26 18:04:17 2022 +0900

    testing 16x8x8 ldmatrix tensoriation

* set measure_perf to False

* add requires_gpu decorator in tests, always test build on non-ampere

* skip cuda compile on old gpu
  • Loading branch information
masahi authored May 20, 2022
1 parent febae40 commit 0274d8e
Show file tree
Hide file tree
Showing 7 changed files with 1,042 additions and 4 deletions.
27 changes: 27 additions & 0 deletions include/tvm/tir/builtin.h
Original file line number Diff line number Diff line change
Expand Up @@ -651,6 +651,33 @@ TVM_DLL const Op& ptx_cp_async();
TVM_DLL const Op& ptx_commit_group();
TVM_DLL const Op& ptx_wait_group();

/*!
* \brief tvm intrinsic for storing the result of PTX MMA into a destination pointer.
* For example, if each thread in a warp of size 32 has 4 elements from the result of
* m16xn8xk16 MMA in its registers, this intrinsic can be used to store the result in a
* 16x8 region in shared or global memory.
*
* There is no real PTX instruction that does that, but we want to hide details of
* complex index manipulation behind this intrinsic to simplify TIR lowering passes (e.g.
* LowerWarpMemory).
*
* void mma_store(IntImm m, IntImm n, Var dst_ptr, Var src_ptr, Expr src_offset, Var dst_stride);
*/
TVM_DLL const Op& mma_store();

/*!
* \brief tvm intrinsic for zero-initalizing an MMA accumulation registor.
* For example, if each thread in a warp of size 32 has 8 elements from the A matrix in
* m16xn8xk16 MMA in its registers, this intrinsic can be used to zero-initialize its
* 4 accumulation registers.
*
* There is no real PTX instruction that does that, but we introduce this intrinsic for the
* same reason as mma_store above.
*
* void mma_fill(IntImm local_size, Var local_ptr, Expr offset);
*/
TVM_DLL const Op& mma_fill();

// TODO(tvm-team) replace the usage of the vector operations by Shuffle.
/*!
* \brief Get the high level half of the vector
Expand Down
1 change: 1 addition & 0 deletions python/tvm/tir/tensor_intrin/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@
from .arm_cpu import *
from .dot_product_common import *
from .rocm import *
from .cuda import *
Loading

0 comments on commit 0274d8e

Please sign in to comment.