Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hexagon][testing] add TVMScript elemwise-add #11490

Merged
merged 1 commit into from
Jun 3, 2022

Conversation

cconvey
Copy link
Contributor

@cconvey cconvey commented May 27, 2022

  • Recreate the existing TE-based elementwise-add benchmark
    in TVMScript.

    The TE-based benchmark has several features not (yet)
    present in the new TVMScript one:

    • support for mem_scope='global.vtcm'
    • scheduling
  • Small refactoring of how these benchmarks use the
    BenchmarkTable class.

cc @mehrdadh

@cconvey
Copy link
Contributor Author

cconvey commented May 27, 2022

Here's an example of the tabular output now produced:

$ cat /tmp/tmpo97ou02e/benchmark-results.csv | column -s $'\t' -t -n
basic_kernel       dtype  sched_type  mem_scope    num_vectors_per_tensor  row_status  timings_min_usecs  timings_max_usecs  timings_median_usecs  timings_mean_usecs  timings_stddev_usecs  host_files_dir                                                                                                          comments
ewise-tvmscript-1  int8               global       1                       SUCCESS     0.500              0.500              0.500                 0.500               0.000                 /tmp/tmpo97ou02e/basic_kernel:ewise-tvmscript-1-dtype:int8-mem_scope:global-num_vectors_per_tensor:1
ewise-tvmscript-1  int8               global       16                      SUCCESS     0.700              0.700              0.700                 0.700               0.000                 /tmp/tmpo97ou02e/basic_kernel:ewise-tvmscript-1-dtype:int8-mem_scope:global-num_vectors_per_tensor:16
ewise-tvmscript-1  int8               global       64                      SUCCESS     0.700              0.700              0.700                 0.700               0.000                 /tmp/tmpo97ou02e/basic_kernel:ewise-tvmscript-1-dtype:int8-mem_scope:global-num_vectors_per_tensor:64
ewise-tvmscript-1  int8               global       512                     SUCCESS     2.900              2.900              2.900                 2.900               0.000                 /tmp/tmpo97ou02e/basic_kernel:ewise-tvmscript-1-dtype:int8-mem_scope:global-num_vectors_per_tensor:512
ewise-tvmscript-1  int8               global       2048                    SUCCESS     8.300              8.300              8.300                 8.300               0.000                 /tmp/tmpo97ou02e/basic_kernel:ewise-tvmscript-1-dtype:int8-mem_scope:global-num_vectors_per_tensor:2048
elemwise-add-te    int8   1           global       1                       SUCCESS     0.700              0.700              0.700                 0.700               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:1-mem_scope:global-num_vectors_per_tensor:1
elemwise-add-te    int8   1           global       16                      SUCCESS     0.800              0.800              0.800                 0.800               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:1-mem_scope:global-num_vectors_per_tensor:16
elemwise-add-te    int8   1           global       64                      SUCCESS     0.900              0.900              0.900                 0.900               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:1-mem_scope:global-num_vectors_per_tensor:64
elemwise-add-te    int8   1           global       512                     SUCCESS     2.600              2.600              2.600                 2.600               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:1-mem_scope:global-num_vectors_per_tensor:512
elemwise-add-te    int8   1           global       2048                    SUCCESS     16.600             16.600             16.600                16.600              0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:1-mem_scope:global-num_vectors_per_tensor:2048
elemwise-add-te    int8   1           global.vtcm  1                       SUCCESS     0.600              0.600              0.600                 0.600               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:1-mem_scope:global.vtcm-num_vectors_per_tensor:1
elemwise-add-te    int8   1           global.vtcm  16                      SUCCESS     0.700              0.700              0.700                 0.700               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:1-mem_scope:global.vtcm-num_vectors_per_tensor:16
elemwise-add-te    int8   1           global.vtcm  64                      SUCCESS     1.400              1.400              1.400                 1.400               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:1-mem_scope:global.vtcm-num_vectors_per_tensor:64
elemwise-add-te    int8   1           global.vtcm  512                     SUCCESS     6.400              6.400              6.400                 6.400               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:1-mem_scope:global.vtcm-num_vectors_per_tensor:512
elemwise-add-te    int8   1           global.vtcm  2048                    SKIP                                                                                                                                                                                                                                      Expect to exceed VTCM budget.
elemwise-add-te    int8   2           global       1                       SUCCESS     0.500              0.500              0.500                 0.500               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:2-mem_scope:global-num_vectors_per_tensor:1
elemwise-add-te    int8   2           global       16                      SUCCESS     0.600              0.600              0.600                 0.600               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:2-mem_scope:global-num_vectors_per_tensor:16
elemwise-add-te    int8   2           global       64                      SUCCESS     0.700              0.700              0.700                 0.700               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:2-mem_scope:global-num_vectors_per_tensor:64
elemwise-add-te    int8   2           global       512                     SUCCESS     2.800              2.800              2.800                 2.800               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:2-mem_scope:global-num_vectors_per_tensor:512
elemwise-add-te    int8   2           global       2048                    SUCCESS     13.500             13.500             13.500                13.500              0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:2-mem_scope:global-num_vectors_per_tensor:2048
elemwise-add-te    int8   2           global.vtcm  1                       SUCCESS     0.700              0.700              0.700                 0.700               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:2-mem_scope:global.vtcm-num_vectors_per_tensor:1
elemwise-add-te    int8   2           global.vtcm  16                      SUCCESS     0.800              0.800              0.800                 0.800               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:2-mem_scope:global.vtcm-num_vectors_per_tensor:16
elemwise-add-te    int8   2           global.vtcm  64                      SUCCESS     1.100              1.100              1.100                 1.100               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:2-mem_scope:global.vtcm-num_vectors_per_tensor:64
elemwise-add-te    int8   2           global.vtcm  512                     SUCCESS     5.300              5.300              5.300                 5.300               0.000                 /tmp/tmpo97ou02e/basic_kernel:elemwise-add-te-dtype:int8-sched_type:2-mem_scope:global.vtcm-num_vectors_per_tensor:512
elemwise-add-te    int8   2           global.vtcm  2048                    SKIP                                                                                                                                                                                                                                      Expect to exceed VTCM budget.

@cconvey
Copy link
Contributor Author

cconvey commented May 27, 2022

Note: This PR starts to factor some benchmarking logic out of the individual benchmark functions, but it's very much a WIP. I'd like to safe a more thorough refactoring for if/when it's truly necessary.

@github-actions github-actions bot requested a review from mehrdadh May 27, 2022 15:54
@cconvey
Copy link
Contributor Author

cconvey commented May 27, 2022

FYI I just force-pushed a fix, since probably nobody had a chance to review the PR yet.

# -----------------------------------------------------------------------------------------------

# Hexagon v69 allows more dtypes, but we're sticking with v68 for now.
for dtype in [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
These loops can be hoisted out into tvm.testing.parameters, e.g.

class TestEWBenchmarks:
    dtype = tvm.testing.parameter("int8")
    num_vectors_per_tensor = tvm.testing.parameters([1, 16, 64, 512])
    ...
    @tvm.testing.requires_hexagon
    def test_elemwise_add_tvmcript(hexagon_session : HexagonLauncherRPC, dtype,  num_vectors_per_tensor, ...):

In doing so, each testing option becomes a separate pytest. The benchmarking utilities could be defined in the class scope to capture the state across multiple tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good idea. If possible I'd like to tackle that in a future PR, once we have more benchmark configurations written.

@mehrdadh
Copy link
Member

looks like there are multiple benchmark files. I suggest to move them under tests/python/contrib/test_hexagon/benchmark/ directory. wdyt?

@cconvey cconvey changed the title [hexagon][testing] add TIRScript elemwise-add WIP [hexagon][testing] add TIRScript elemwise-add May 27, 2022
@junrushao
Copy link
Member

dude, it's called TVMScript :-)

@masahi
Copy link
Member

masahi commented May 27, 2022

Do these benchmarks run during CI? At least, I don't think we need to benchmark exhaustively on both TVMScript and TE. I'm concerned about the CI time as we add more ops (I saw Hexagon jobs being the bottleneck in my PRs).

UPDATE: Doesn't seem so.

@cconvey cconvey changed the title WIP [hexagon][testing] add TIRScript elemwise-add WIP [hexagon][testing] add TVMScript elemwise-add May 31, 2022
@cconvey
Copy link
Contributor Author

cconvey commented Jun 1, 2022

@csullivan : Just a heads-up, I just pushed a major reworking of the PR.

@cconvey cconvey requested a review from csullivan June 1, 2022 19:50
@cconvey
Copy link
Contributor Author

cconvey commented Jun 1, 2022

Sample output from the current revision of the PR:

$ cat /tmp/tmp2l3burqv/benchmark-results.csv | column -s $'\t' -t -n 
basic_kernel  dtype    sched_type  mem_scope    num_vectors_per_tensor  row_status  timings_min_usecs  timings_max_usecs  timings_median_usecs  timings_mean_usecs  timings_stddev_usecs  host_files_dir_path                                                                        comments
ewise-add     int8                 global                               SUCCESS     0.600              0.600              0.600                 0.600               0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:1_128-mem_scope:global            
ewise-add     int8                 global                               SUCCESS     0.700              0.700              0.700                 0.700               0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:16_128-mem_scope:global           
ewise-add     int8                 global                               SUCCESS     0.800              0.800              0.800                 0.800               0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:64_128-mem_scope:global           
ewise-add     int8                 global                               SUCCESS     2.400              2.400              2.400                 2.400               0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:512_128-mem_scope:global          
ewise-add     int8                 global                               SUCCESS     23.000             23.000             23.000                23.000              0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:2048_128-mem_scope:global         
ewise-add     int8                 global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:1_128-mem_scope:global.vtcm       Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.
ewise-add     int8                 global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:16_128-mem_scope:global.vtcm      Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.
ewise-add     int8                 global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:64_128-mem_scope:global.vtcm      Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.
ewise-add     int8                 global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:512_128-mem_scope:global.vtcm     Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.
ewise-add     int8                 global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:int8-shape:2048_128-mem_scope:global.vtcm    Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.
ewise-add     float16              global                               SUCCESS     0.600              0.600              0.600                 0.600               0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:1_64-mem_scope:global          
ewise-add     float16              global                               SUCCESS     0.600              0.600              0.600                 0.600               0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:16_64-mem_scope:global         
ewise-add     float16              global                               SUCCESS     0.900              0.900              0.900                 0.900               0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:64_64-mem_scope:global         
ewise-add     float16              global                               SUCCESS     3.300              3.300              3.300                 3.300               0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:512_64-mem_scope:global        
ewise-add     float16              global                               SUCCESS     22.600             22.600             22.600                22.600              0.000                 /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:2048_64-mem_scope:global       
ewise-add     float16              global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:1_64-mem_scope:global.vtcm     Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.
ewise-add     float16              global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:16_64-mem_scope:global.vtcm    Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.
ewise-add     float16              global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:64_64-mem_scope:global.vtcm    Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.
ewise-add     float16              global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:512_64-mem_scope:global.vtcm   Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.
ewise-add     float16              global.vtcm                          SKIP                                                                                                              /tmp/tmp2l3burqv/basic_kernel:ewise-add-dtype:float16-shape:2048_64-mem_scope:global.vtcm  Unsupported configuration: This benchmark kernel does not yet support VTCM buffers.

Comment on lines +36 to +38
_HEXAGON_TARGET = tvm.target.hexagon("v69", link_params=True)

_SUPER_TARGET = tvm.target.Target(_HEXAGON_TARGET, host=_HEXAGON_TARGET)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking for terminology suggestions here.

Replace TE-based elementwise-add benchmark with
a TVMScript-based one.

Update Hexagon target architecture from v68 to v69.
As a result, the benchmark now requires a version of
Hexagon SDK newer than 4.4.0.1.  Version 4.5.0.3 is
known to work.
@cconvey cconvey changed the title WIP [hexagon][testing] add TVMScript elemwise-add [hexagon][testing] add TVMScript elemwise-add Jun 2, 2022
Copy link
Contributor

@csullivan csullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @cconvey!

@csullivan csullivan merged commit 2ae2088 into apache:main Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants