Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matmul nvfuser auto splitk benchmarks are too slow #1389

Closed
xwang233 opened this issue Nov 28, 2023 · 1 comment · Fixed by #1545
Closed

Matmul nvfuser auto splitk benchmarks are too slow #1389

xwang233 opened this issue Nov 28, 2023 · 1 comment · Fixed by #1545

Comments

@xwang233
Copy link
Collaborator

xwang233 commented Nov 28, 2023

... and caused many CI timeouts

Reproduce command

/opt/pytorch/nvfuser/bin/nvfuser_bench --benchmark_repetitions=1 --benchmark_min_time=0.00000001 '--benchmark_filter=^NvFuserScheduler_Matmul/nvfuser_a.*' | ts -s

note that we only have 1 iteration for everything

some examples on A100

00:00:22 --------------------------------------------------------------------------------------------------------------------------------------------------
00:00:22 Benchmark                                                                                                        Time             CPU   Iterations
00:00:22 --------------------------------------------------------------------------------------------------------------------------------------------------
00:00:22 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:128/K:65536/warps:4/stages:3/manual_time               3237 us         3463 us            1 /Launch_Parameters[block(2/2/32)/grid(108/1/1)/49664]
00:00:42 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:256/K:65536/warps:4/stages:3/manual_time               2432 us         2643 us            1 /Launch_Parameters[block(2/2/32)/grid(54/2/1)/49664]
00:01:02 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:384/K:65536/warps:4/stages:3/manual_time               1926 us         2134 us            1 /Launch_Parameters[block(2/2/32)/grid(36/3/1)/49664]
00:01:23 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:512/K:65536/warps:4/stages:3/manual_time               1974 us         2183 us            1 /Launch_Parameters[block(2/2/32)/grid(27/4/1)/49664]
00:01:43 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:768/K:65536/warps:4/stages:3/manual_time               1918 us         2130 us            1 /Launch_Parameters[block(2/2/32)/grid(18/6/1)/49664]
00:02:03 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:1152/K:65536/warps:4/stages:3/manual_time              1818 us         2024 us            1 /Launch_Parameters[block(2/2/32)/grid(12/9/1)/49664]
00:02:23 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:1536/K:65536/warps:4/stages:3/manual_time              1959 us         2173 us            1 /Launch_Parameters[block(2/2/32)/grid(9/12/1)/49664]
00:02:44 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:2304/K:65536/warps:4/stages:3/manual_time              2435 us         2641 us            1 /Launch_Parameters[block(2/2/32)/grid(6/18/1)/49664]
00:03:04 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:3456/K:65536/warps:4/stages:3/manual_time              2760 us         2969 us            1 /Launch_Parameters[block(2/2/32)/grid(4/27/1)/49664]
00:03:25 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:4608/K:65536/warps:4/stages:3/manual_time              3336 us         3544 us            1 /Launch_Parameters[block(2/2/32)/grid(3/36/1)/49664]
00:03:45 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:6912/K:65536/warps:4/stages:3/manual_time              4297 us         4502 us            1 /Launch_Parameters[block(2/2/32)/grid(2/54/1)/49664]
00:03:46 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:13824/K:65536/warps:4/stages:3/manual_time             6740 us         6930 us            1 /Launch_Parameters[block(2/2/32)/grid(1/108/1)/49152]
00:04:06 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:128/K:65536/warps:4/stages:4/manual_time               3216 us         3423 us            1 /Launch_Parameters[block(2/2/32)/grid(108/1/1)/66048]
00:04:26 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:256/K:65536/warps:4/stages:4/manual_time               2426 us         2635 us            1 /Launch_Parameters[block(2/2/32)/grid(54/2/1)/66048]
00:04:47 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:384/K:65536/warps:4/stages:4/manual_time               1906 us         2110 us            1 /Launch_Parameters[block(2/2/32)/grid(36/3/1)/66048]
00:05:07 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:512/K:65536/warps:4/stages:4/manual_time               1957 us         2164 us            1 /Launch_Parameters[block(2/2/32)/grid(27/4/1)/66048]
00:05:27 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:768/K:65536/warps:4/stages:4/manual_time               1878 us         2088 us            1 /Launch_Parameters[block(2/2/32)/grid(18/6/1)/66048]
00:05:47 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:1152/K:65536/warps:4/stages:4/manual_time              1772 us         1981 us            1 /Launch_Parameters[block(2/2/32)/grid(12/9/1)/66048]
00:06:07 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:1536/K:65536/warps:4/stages:4/manual_time              1900 us         2113 us            1 /Launch_Parameters[block(2/2/32)/grid(9/12/1)/66048]
00:06:28 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:2304/K:65536/warps:4/stages:4/manual_time              2335 us         2540 us            1 /Launch_Parameters[block(2/2/32)/grid(6/18/1)/66048]
00:06:48 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:3456/K:65536/warps:4/stages:4/manual_time              2620 us         2829 us            1 /Launch_Parameters[block(2/2/32)/grid(4/27/1)/66048]
00:07:09 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:4608/K:65536/warps:4/stages:4/manual_time              3171 us         3378 us            1 /Launch_Parameters[block(2/2/32)/grid(3/36/1)/66048]
00:07:29 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:6912/K:65536/warps:4/stages:4/manual_time              4055 us         4267 us            1 /Launch_Parameters[block(2/2/32)/grid(2/54/1)/66048]
00:07:30 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:128/N:13824/K:65536/warps:4/stages:4/manual_time             7277 us         7466 us            1 /Launch_Parameters[block(2/2/32)/grid(1/108/1)/65536]
00:07:50 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:128/K:65536/warps:8/stages:3/manual_time               3984 us         4197 us            1 /Launch_Parameters[block(4/2/32)/grid(108/1/1)/74752]
00:08:10 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:256/K:65536/warps:8/stages:3/manual_time               2926 us         3132 us            1 /Launch_Parameters[block(4/2/32)/grid(54/2/1)/74752]
00:08:31 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:384/K:65536/warps:8/stages:3/manual_time               2400 us         2608 us            1 /Launch_Parameters[block(4/2/32)/grid(36/3/1)/74752]
00:08:51 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:512/K:65536/warps:8/stages:3/manual_time               2449 us         2658 us            1 /Launch_Parameters[block(4/2/32)/grid(27/4/1)/74752]
00:09:11 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:768/K:65536/warps:8/stages:3/manual_time               2507 us         2718 us            1 /Launch_Parameters[block(4/2/32)/grid(18/6/1)/74752]
00:09:32 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:1152/K:65536/warps:8/stages:3/manual_time              2644 us         2852 us            1 /Launch_Parameters[block(4/2/32)/grid(12/9/1)/74752]
00:09:52 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:1536/K:65536/warps:8/stages:3/manual_time              2987 us         3200 us            1 /Launch_Parameters[block(4/2/32)/grid(9/12/1)/74752]
00:10:12 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:2304/K:65536/warps:8/stages:3/manual_time              3982 us         4193 us            1 /Launch_Parameters[block(4/2/32)/grid(6/18/1)/74752]
00:10:33 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:3456/K:65536/warps:8/stages:3/manual_time              4937 us         5143 us            1 /Launch_Parameters[block(4/2/32)/grid(4/27/1)/74752]
00:10:53 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:4608/K:65536/warps:8/stages:3/manual_time              6226 us         6442 us            1 /Launch_Parameters[block(4/2/32)/grid(3/36/1)/74752]
00:11:14 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:6912/K:65536/warps:8/stages:3/manual_time              8677 us         8894 us            1 /Launch_Parameters[block(4/2/32)/grid(2/54/1)/74752]
00:11:14 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:13824/K:65536/warps:8/stages:3/manual_time            11054 us        11242 us            1 /Launch_Parameters[block(4/2/32)/grid(1/108/1)/73728]
00:11:35 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:128/K:65536/warps:8/stages:4/manual_time               3978 us         4182 us            1 /Launch_Parameters[block(4/2/32)/grid(108/1/1)/99328]
00:11:55 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:256/K:65536/warps:8/stages:4/manual_time               2932 us         3127 us            1 /Launch_Parameters[block(4/2/32)/grid(54/2/1)/99328]
00:12:16 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:384/K:65536/warps:8/stages:4/manual_time               2396 us         2604 us            1 /Launch_Parameters[block(4/2/32)/grid(36/3/1)/99328]
00:12:36 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:512/K:65536/warps:8/stages:4/manual_time               2458 us         2672 us            1 /Launch_Parameters[block(4/2/32)/grid(27/4/1)/99328]
00:12:56 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:768/K:65536/warps:8/stages:4/manual_time               2506 us         2711 us            1 /Launch_Parameters[block(4/2/32)/grid(18/6/1)/99328]
00:13:17 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:1152/K:65536/warps:8/stages:4/manual_time              2640 us         2854 us            1 /Launch_Parameters[block(4/2/32)/grid(12/9/1)/99328]
00:13:37 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:1536/K:65536/warps:8/stages:4/manual_time              2955 us         3165 us            1 /Launch_Parameters[block(4/2/32)/grid(9/12/1)/99328]
00:13:58 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:2304/K:65536/warps:8/stages:4/manual_time              3916 us         4129 us            1 /Launch_Parameters[block(4/2/32)/grid(6/18/1)/99328]
00:14:18 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:3456/K:65536/warps:8/stages:4/manual_time              5047 us         5256 us            1 /Launch_Parameters[block(4/2/32)/grid(4/27/1)/99328]
00:14:38 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:4608/K:65536/warps:8/stages:4/manual_time              6245 us         6451 us            1 /Launch_Parameters[block(4/2/32)/grid(3/36/1)/99328]
00:14:59 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:6912/K:65536/warps:8/stages:4/manual_time              8799 us         9017 us            1 /Launch_Parameters[block(4/2/32)/grid(2/54/1)/99328]
00:15:00 NvFuserScheduler_Matmul/nvfuser_auto_splitk_TT/M:256/N:13824/K:65536/warps:8/stages:4/manual_time            14965 us        15158 us            1 /Launch_Parameters[block(4/2/32)/grid(1/108/1)/98304]

probably related #991 #1316

cc @zasdfgbnm

@naoyam
Copy link
Collaborator

naoyam commented Nov 28, 2023

@jacobhinkle

jacobhinkle added a commit that referenced this issue Dec 19, 2023
jacobhinkle added a commit that referenced this issue Dec 19, 2023
Note that these should be re-enabled during or after #1510, since that
should fix the slow compilation speeds.

Fixes #1389.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants