[hexagon][testing] add TVMScript elemwise-add #11490

cconvey · 2022-05-27T15:30:38Z

Recreate the existing TE-based elementwise-add benchmark
in TVMScript.

The TE-based benchmark has several features not (yet)
present in the new TVMScript one:
- support for mem_scope='global.vtcm'
- scheduling
Small refactoring of how these benchmarks use the
BenchmarkTable class.

cconvey · 2022-05-27T15:33:00Z

Here's an example of the tabular output now produced:

$ cat /tmp/tmpo97ou02e/benchmark-results.csv | column -s $'\t' -t -n
basic_kernel       dtype  sched_type  mem_scope    num_vectors_per_tensor  row_status  timings_min_usecs  timings_max_usecs  timings_median_usecs  timings_mean_usecs  timings_stddev_usecs  host_files_dir                                                                                                          comments
ewise-tvmscript-1  int8               global       1                       SUCCESS     0.500              0.500              0.500                 0.500               0.000                 /tmp/tmpo97ou02e/basic_kernel:ewise-tvmscript-1-dtype:int8-mem_scope:global-num_vectors_per_tensor:1
ewise-tvmscript-1  int8               global       16                      SUCCESS     0.700              0.700              0.700                 0.700               0.000                 /tmp/tmpo97ou02e/basic_kernel:ewise-tvmscript-1-dtype:int8-mem_scope:global-num_vectors_per_tensor:16
ewise-tvmscript-1  int8               global       64                      SUCCESS     0.700              0.700              0.700                 0.700               0.000                 /tmp/tmpo97ou02e/basic_kernel:ewise-tvmscript-1-dtype:int8-mem_scope:global-num_vectors_per_tensor:64
ewise-tvmscript-1  int8               global       512                     SUCCESS     2.900              2.900              2.900                 2.900               0.000                 /tmp/tmpo97ou02e/basic_kernel:ewise-tvmscript-1-dtype:int8-mem_scope:global-num_vectors_per_tensor:512
ewise-tvmscript-1  int8               global       2048                    SUCCESS     8.300              8.300              8.300                 8.300               0.000                 /tmp/tmpo97ou02e/basic_kernel:ewise-tvmscript-1-dtype:int8-mem_scope:global-num_vectors_per_tensor:2048
elemwise-add-te    int8   1           global       1                       SUCCESS     0.700              0.700              0.700                 0.700               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:1-mem_scope:global-num_vectors_per_tensor:1
elemwise-add-te    int8   1           global       16                      SUCCESS     0.800              0.800              0.800                 0.800               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:1-mem_scope:global-num_vectors_per_tensor:16
elemwise-add-te    int8   1           global       64                      SUCCESS     0.900              0.900              0.900                 0.900               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:1-mem_scope:global-num_vectors_per_tensor:64
elemwise-add-te    int8   1           global       512                     SUCCESS     2.600              2.600              2.600                 2.600               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:1-mem_scope:global-num_vectors_per_tensor:512
elemwise-add-te    int8   1           global       2048                    SUCCESS     16.600             16.600             16.600                16.600              0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:1-mem_scope:global-num_vectors_per_tensor:2048
elemwise-add-te    int8   1           global.vtcm  1                       SUCCESS     0.600              0.600              0.600                 0.600               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:1-mem_scope:global.vtcm-num_vectors_per_tensor:1
elemwise-add-te    int8   1           global.vtcm  16                      SUCCESS     0.700              0.700              0.700                 0.700               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:1-mem_scope:global.vtcm-num_vectors_per_tensor:16
elemwise-add-te    int8   1           global.vtcm  64                      SUCCESS     1.400              1.400              1.400                 1.400               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:1-mem_scope:global.vtcm-num_vectors_per_tensor:64
elemwise-add-te    int8   1           global.vtcm  512                     SUCCESS     6.400              6.400              6.400                 6.400               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:1-mem_scope:global.vtcm-num_vectors_per_tensor:512
elemwise-add-te    int8   1           global.vtcm  2048                    SKIP                                                                                                                                                                                                                                      Expect to exceed VTCM budget.
elemwise-add-te    int8   2           global       1                       SUCCESS     0.500              0.500              0.500                 0.500               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:2-mem_scope:global-num_vectors_per_tensor:1
elemwise-add-te    int8   2           global       16                      SUCCESS     0.600              0.600              0.600                 0.600               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:2-mem_scope:global-num_vectors_per_tensor:16
elemwise-add-te    int8   2           global       64                      SUCCESS     0.700              0.700              0.700                 0.700               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:2-mem_scope:global-num_vectors_per_tensor:64
elemwise-add-te    int8   2           global       512                     SUCCESS     2.800              2.800              2.800                 2.800               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:2-mem_scope:global-num_vectors_per_tensor:512
elemwise-add-te    int8   2           global       2048                    SUCCESS     13.500             13.500             13.500                13.500              0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:2-mem_scope:global-num_vectors_per_tensor:2048
elemwise-add-te    int8   2           global.vtcm  1                       SUCCESS     0.700              0.700              0.700                 0.700               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:2-mem_scope:global.vtcm-num_vectors_per_tensor:1
elemwise-add-te    int8   2           global.vtcm  16                      SUCCESS     0.800              0.800              0.800                 0.800               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:2-mem_scope:global.vtcm-num_vectors_per_tensor:16
elemwise-add-te    int8   2           global.vtcm  64                      SUCCESS     1.100              1.100              1.100                 1.100               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:2-mem_scope:global.vtcm-num_vectors_per_tensor:64
elemwise-add-te    int8   2           global.vtcm  512                     SUCCESS     5.300              5.300              5.300                 5.300               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:2-mem_scope:global.vtcm-num_vectors_per_tensor:512
elemwise-add-te    int8   2           global.vtcm  2048                    SKIP                                                                                                                                                                                                                                      Expect to exceed VTCM budget.

cconvey · 2022-05-27T15:39:56Z

Note: This PR starts to factor some benchmarking logic out of the individual benchmark functions, but it's very much a WIP. I'd like to safe a more thorough refactoring for if/when it's truly necessary.

cconvey · 2022-05-27T15:54:49Z

FYI I just force-pushed a fix, since probably nobody had a chance to review the PR yet.

tests/python/contrib/test_hexagon/benchmark_elemwise_add.py

csullivan · 2022-05-27T15:57:41Z

tests/python/contrib/test_hexagon/benchmark_elemwise_add.py

+    # -----------------------------------------------------------------------------------------------
+
+    # Hexagon v69 allows more dtypes, but we're sticking with v68 for now.
+    for dtype in [


nit:
These loops can be hoisted out into tvm.testing.parameters, e.g.

class TestEWBenchmarks: dtype = tvm.testing.parameter("int8") num_vectors_per_tensor = tvm.testing.parameters([1, 16, 64, 512]) ... @tvm.testing.requires_hexagon def test_elemwise_add_tvmcript(hexagon_session : HexagonLauncherRPC, dtype, num_vectors_per_tensor, ...):

In doing so, each testing option becomes a separate pytest. The benchmarking utilities could be defined in the class scope to capture the state across multiple tests.

It's a good idea. If possible I'd like to tackle that in a future PR, once we have more benchmark configurations written.

tests/python/contrib/test_hexagon/benchmark_elemwise_add.py

mehrdadh · 2022-05-27T16:34:17Z

looks like there are multiple benchmark files. I suggest to move them under tests/python/contrib/test_hexagon/benchmark/ directory. wdyt?

junrushao · 2022-05-27T17:57:00Z

dude, it's called TVMScript :-)

masahi · 2022-05-27T21:17:32Z

~~Do these benchmarks run during CI?~~ At least, I don't think we need to benchmark exhaustively on both TVMScript and TE. I'm concerned about the CI time as we add more ops (I saw Hexagon jobs being the bottleneck in my PRs).

UPDATE: Doesn't seem so.

cconvey · 2022-06-01T19:50:39Z

@csullivan : Just a heads-up, I just pushed a major reworking of the PR.

cconvey · 2022-06-01T20:07:38Z

Sample output from the current revision of the PR:

$ cat /tmp/tmp2l3burqv/benchmark-results.csv | column -s $'\t' -t -n 
basic_kernel  dtype    sched_type  mem_scope    num_vectors_per_tensor  row_status  timings_min_usecs  timings_max_usecs  timings_median_usecs  timings_mean_usecs  timings_stddev_usecs  host_files_dir_path                                                                        comments
ewise-add     int8                 global                               SUCCESS     0.600              0.600              0.600                 0.600               0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:1_128-mem_scope:global            
ewise-add     int8                 global                               SUCCESS     0.700              0.700              0.700                 0.700               0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:16_128-mem_scope:global           
ewise-add     int8                 global                               SUCCESS     0.800              0.800              0.800                 0.800               0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:64_128-mem_scope:global           
ewise-add     int8                 global                               SUCCESS     2.400              2.400              2.400                 2.400               0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:512_128-mem_scope:global          
ewise-add     int8                 global                               SUCCESS     23.000             23.000             23.000                23.000              0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:2048_128-mem_scope:global         
ewise-add     int8                 global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:1_128-mem_scope:global.vtcm       Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.
ewise-add     int8                 global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:16_128-mem_scope:global.vtcm      Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.
ewise-add     int8                 global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:64_128-mem_scope:global.vtcm      Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.
ewise-add     int8                 global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:512_128-mem_scope:global.vtcm     Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.
ewise-add     int8                 global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:2048_128-mem_scope:global.vtcm    Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.
ewise-add     float16              global                               SUCCESS     0.600              0.600              0.600                 0.600               0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:1_64-mem_scope:global          
ewise-add     float16              global                               SUCCESS     0.600              0.600              0.600                 0.600               0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:16_64-mem_scope:global         
ewise-add     float16              global                               SUCCESS     0.900              0.900              0.900                 0.900               0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:64_64-mem_scope:global         
ewise-add     float16              global                               SUCCESS     3.300              3.300              3.300                 3.300               0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:512_64-mem_scope:global        
ewise-add     float16              global                               SUCCESS     22.600             22.600             22.600                22.600              0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:2048_64-mem_scope:global       
ewise-add     float16              global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:1_64-mem_scope:global.vtcm     Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.
ewise-add     float16              global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:16_64-mem_scope:global.vtcm    Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.
ewise-add     float16              global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:64_64-mem_scope:global.vtcm    Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.
ewise-add     float16              global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:512_64-mem_scope:global.vtcm   Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.
ewise-add     float16              global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:2048_64-mem_scope:global.vtcm  Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.

tests/python/contrib/test_hexagon/benchmark_elemwise_add.py

cconvey · 2022-06-01T20:13:17Z

tests/python/contrib/test_hexagon/benchmark_elemwise_add.py

+_HEXAGON_TARGET = tvm.target.hexagon("v69", link_params=True)
+
+_SUPER_TARGET = tvm.target.Target(_HEXAGON_TARGET, host=_HEXAGON_TARGET)


I'm looking for terminology suggestions here.

tests/python/contrib/test_hexagon/benchmark_elemwise_add.py

Replace TE-based elementwise-add benchmark with a TVMScript-based one. Update Hexagon target architecture from v68 to v69. As a result, the benchmark now requires a version of Hexagon SDK newer than 4.4.0.1. Version 4.5.0.3 is known to work.

csullivan

LGTM, thanks @cconvey!

cconvey force-pushed the ewise-add-tir branch from 1c6f62c to aa463b0 Compare May 27, 2022 15:53

github-actions bot requested a review from mehrdadh May 27, 2022 15:54

csullivan reviewed May 27, 2022

View reviewed changes

cconvey commented May 27, 2022

View reviewed changes

tests/python/contrib/test_hexagon/benchmark_elemwise_add.py Outdated Show resolved Hide resolved

cconvey commented May 27, 2022

View reviewed changes

tests/python/contrib/test_hexagon/benchmark_elemwise_add.py Outdated Show resolved Hide resolved

cconvey commented May 27, 2022

View reviewed changes

tests/python/contrib/test_hexagon/benchmark_elemwise_add.py Outdated Show resolved Hide resolved

cconvey changed the title ~~[hexagon][testing] add TIRScript elemwise-add~~ WIP [hexagon][testing] add TIRScript elemwise-add May 27, 2022

cconvey changed the title ~~WIP [hexagon][testing] add TIRScript elemwise-add~~ WIP [hexagon][testing] add TVMScript elemwise-add May 31, 2022

cconvey force-pushed the ewise-add-tir branch from 01184d7 to 48e1df6 Compare June 1, 2022 19:48

cconvey requested a review from csullivan June 1, 2022 19:50

cconvey commented Jun 1, 2022

View reviewed changes

tests/python/contrib/test_hexagon/benchmark_elemwise_add.py Outdated Show resolved Hide resolved

cconvey commented Jun 1, 2022

View reviewed changes

cconvey commented Jun 2, 2022

View reviewed changes

tests/python/contrib/test_hexagon/benchmark_elemwise_add.py Outdated Show resolved Hide resolved

cconvey force-pushed the ewise-add-tir branch from 48e1df6 to a34e700 Compare June 2, 2022 13:50

cconvey force-pushed the ewise-add-tir branch from a34e700 to ed667ff Compare June 2, 2022 14:39

cconvey changed the title ~~WIP [hexagon][testing] add TVMScript elemwise-add~~ [hexagon][testing] add TVMScript elemwise-add Jun 2, 2022

csullivan approved these changes Jun 3, 2022

View reviewed changes

csullivan merged commit 2ae2088 into apache:main Jun 3, 2022

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hexagon][testing] add TVMScript elemwise-add #11490

[hexagon][testing] add TVMScript elemwise-add #11490

cconvey commented May 27, 2022 •

edited by github-actions bot

Loading

cconvey commented May 27, 2022 •

edited

Loading

cconvey commented May 27, 2022

cconvey commented May 27, 2022 •

edited

Loading

csullivan May 27, 2022

cconvey May 27, 2022

mehrdadh commented May 27, 2022

junrushao commented May 27, 2022

masahi commented May 27, 2022 •

edited

Loading

cconvey commented Jun 1, 2022

cconvey commented Jun 1, 2022

cconvey Jun 1, 2022

csullivan left a comment

		_HEXAGON_TARGET = tvm.target.hexagon("v69", link_params=True)

		_SUPER_TARGET = tvm.target.Target(_HEXAGON_TARGET, host=_HEXAGON_TARGET)

[hexagon][testing] add TVMScript elemwise-add #11490

[hexagon][testing] add TVMScript elemwise-add #11490

Conversation

cconvey commented May 27, 2022 • edited by github-actions bot Loading

cconvey commented May 27, 2022 • edited Loading

cconvey commented May 27, 2022

cconvey commented May 27, 2022 • edited Loading

csullivan May 27, 2022

Choose a reason for hiding this comment

cconvey May 27, 2022

Choose a reason for hiding this comment

mehrdadh commented May 27, 2022

junrushao commented May 27, 2022

masahi commented May 27, 2022 • edited Loading

cconvey commented Jun 1, 2022

cconvey commented Jun 1, 2022

cconvey Jun 1, 2022

Choose a reason for hiding this comment

csullivan left a comment

Choose a reason for hiding this comment

cconvey commented May 27, 2022 •

edited by github-actions bot

Loading

cconvey commented May 27, 2022 •

edited

Loading

cconvey commented May 27, 2022 •

edited

Loading

masahi commented May 27, 2022 •

edited

Loading