Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Benchmark] Fix GPU benchmark #386

Merged
merged 6 commits into from
May 18, 2023
Merged

[Benchmark] Fix GPU benchmark #386

merged 6 commits into from
May 18, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented May 18, 2023

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 18, 2023
@github-actions
Copy link

github-actions bot commented May 18, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 47. Improved: $\large\color{#35bf28}1$. Worsened: $\large\color{#d91a1a}1$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_common_ops 1.2503ms 1.2189ms 820.4166 Ops/s 814.4906 Ops/s $\color{#35bf28}+0.73\%$
test_creation 4.4521μs 4.2131μs 237.3556 KOps/s 235.9696 KOps/s $\color{#35bf28}+0.59\%$
test_creation_empty 16.7522μs 15.9980μs 62.5079 KOps/s 60.6001 KOps/s $\color{#35bf28}+3.15\%$
test_creation_nested_1 30.2674μs 29.1668μs 34.2855 KOps/s 33.9479 KOps/s $\color{#35bf28}+0.99\%$
test_creation_nested_2 29.7674μs 29.0069μs 34.4746 KOps/s 33.9148 KOps/s $\color{#35bf28}+1.65\%$
test_clone 28.7124μs 26.6301μs 37.5516 KOps/s 37.6257 KOps/s $\color{#d91a1a}-0.20\%$
test_getitem[int] 35.9529μs 32.9372μs 30.3608 KOps/s 30.9120 KOps/s $\color{#d91a1a}-1.78\%$
test_getitem[slice_int] 71.1245μs 66.7213μs 14.9877 KOps/s 14.8163 KOps/s $\color{#35bf28}+1.16\%$
test_getitem[range] 71.4600μs 69.8234μs 14.3218 KOps/s 14.3239 KOps/s $\color{#d91a1a}-0.01\%$
test_getitem[tuple] 65.8509μs 62.3835μs 16.0299 KOps/s 16.1172 KOps/s $\color{#d91a1a}-0.54\%$
test_getitem[list] 62.1116μs 61.3578μs 16.2978 KOps/s 16.2365 KOps/s $\color{#35bf28}+0.38\%$
test_setitem_dim[int] 76.8010μs 45.0438μs 22.2006 KOps/s 22.0569 KOps/s $\color{#35bf28}+0.65\%$
test_setitem_dim[slice_int] 0.1201ms 80.5670μs 12.4120 KOps/s 12.1437 KOps/s $\color{#35bf28}+2.21\%$
test_setitem_dim[range] 0.1361ms 77.8661μs 12.8426 KOps/s 12.8794 KOps/s $\color{#d91a1a}-0.29\%$
test_setitem_dim[tuple] 0.1767ms 73.4513μs 13.6145 KOps/s 13.4568 KOps/s $\color{#35bf28}+1.17\%$
test_setitem 40.0867μs 38.6362μs 25.8825 KOps/s 25.9811 KOps/s $\color{#d91a1a}-0.38\%$
test_set 38.9716μs 37.4837μs 26.6783 KOps/s 26.6821 KOps/s $\color{#d91a1a}-0.01\%$
test_set_shared 0.1791ms 0.1763ms 5.6735 KOps/s 5.7000 KOps/s $\color{#d91a1a}-0.47\%$
test_update 48.3478μs 47.3787μs 21.1065 KOps/s 20.8845 KOps/s $\color{#35bf28}+1.06\%$
test_update_nested 69.6671μs 67.3231μs 14.8537 KOps/s 14.7222 KOps/s $\color{#35bf28}+0.89\%$
test_set_nested 48.6898μs 47.4263μs 21.0853 KOps/s 21.1141 KOps/s $\color{#d91a1a}-0.14\%$
test_set_nested_new 67.1911μs 65.9336μs 15.1668 KOps/s 15.1975 KOps/s $\color{#d91a1a}-0.20\%$
test_select 0.1050ms 0.1032ms 9.6866 KOps/s 9.5879 KOps/s $\color{#35bf28}+1.03\%$
test_creation[device0] 1.2986ms 0.4974ms 2.0105 KOps/s 2.0330 KOps/s $\color{#d91a1a}-1.11\%$
test_creation_from_tensor 0.5813ms 0.4653ms 2.1490 KOps/s 1.8832 KOps/s $\textbf{\color{#35bf28}+14.11\%}$
test_add_one[memmap_tensor0] 37.9736μs 30.5349μs 32.7494 KOps/s 32.8098 KOps/s $\color{#d91a1a}-0.18\%$
test_contiguous[memmap_tensor0] 8.7532μs 8.2149μs 121.7307 KOps/s 124.2277 KOps/s $\color{#d91a1a}-2.01\%$
test_stack[memmap_tensor0] 0.1772ms 43.0607μs 23.2230 KOps/s 23.9288 KOps/s $\color{#d91a1a}-2.95\%$
test_reshape_pytree 38.3856μs 35.4480μs 28.2104 KOps/s 27.8865 KOps/s $\color{#35bf28}+1.16\%$
test_reshape_td 50.7698μs 48.7519μs 20.5120 KOps/s 20.4975 KOps/s $\color{#35bf28}+0.07\%$
test_view_pytree 34.1546μs 32.8233μs 30.4662 KOps/s 30.0866 KOps/s $\color{#35bf28}+1.26\%$
test_view_td 9.6352μs 9.0758μs 110.1831 KOps/s 112.1925 KOps/s $\color{#d91a1a}-1.79\%$
test_unbind_pytree 37.9536μs 36.7563μs 27.2062 KOps/s 27.0192 KOps/s $\color{#35bf28}+0.69\%$
test_unbind_td 0.1867ms 0.1848ms 5.4106 KOps/s 5.3398 KOps/s $\color{#35bf28}+1.33\%$
test_split_pytree 43.7767μs 41.7395μs 23.9581 KOps/s 24.1833 KOps/s $\color{#d91a1a}-0.93\%$
test_split_td 0.1182ms 0.1147ms 8.7203 KOps/s 8.7664 KOps/s $\color{#d91a1a}-0.53\%$
test_add_pytree 47.2278μs 45.3378μs 22.0567 KOps/s 22.0406 KOps/s $\color{#35bf28}+0.07\%$
test_add_td 78.6043μs 76.4401μs 13.0821 KOps/s 12.9257 KOps/s $\color{#35bf28}+1.21\%$
test_distributed 73.5010μs 73.5010μs 13.6053 KOps/s 13.8311 KOps/s $\color{#d91a1a}-1.63\%$
test_tdmodule 87.4010μs 28.0538μs 35.6458 KOps/s 35.6721 KOps/s $\color{#d91a1a}-0.07\%$
test_tdmodule_dispatch 60.7421ms 66.3092μs 15.0809 KOps/s 16.3723 KOps/s $\textbf{\color{#d91a1a}-7.89\%}$
test_tdseq 0.1887ms 38.3915μs 26.0474 KOps/s 26.0454 KOps/s $+0.01\%$
test_tdseq_dispatch 0.1206ms 70.3281μs 14.2191 KOps/s 14.0452 KOps/s $\color{#35bf28}+1.24\%$
test_instantiation_functorch 1.7216ms 1.5811ms 632.4843 Ops/s 639.3993 Ops/s $\color{#d91a1a}-1.08\%$
test_instantiation_td 7.8393ms 1.2629ms 791.8403 Ops/s 830.9754 Ops/s $\color{#d91a1a}-4.71\%$
test_exec_functorch 0.1860ms 0.1810ms 5.5249 KOps/s 5.5310 KOps/s $\color{#d91a1a}-0.11\%$
test_exec_td 0.3335ms 0.3306ms 3.0244 KOps/s 3.0394 KOps/s $\color{#d91a1a}-0.49\%$

@vmoens vmoens merged commit da83c3b into main May 18, 2023
@vmoens vmoens deleted the fix_gpu_bench branch May 18, 2023 09:26
@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 47. Improved: $\large\color{#35bf28}2$. Worsened: $\large\color{#d91a1a}1$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_common_ops 1.1097ms 1.0707ms 933.9646 Ops/s 925.5185 Ops/s $\color{#35bf28}+0.91\%$
test_creation 3.3590μs 3.1660μs 315.8553 KOps/s 313.5322 KOps/s $\color{#35bf28}+0.74\%$
test_creation_empty 13.7972μs 13.2702μs 75.3569 KOps/s 74.7068 KOps/s $\color{#35bf28}+0.87\%$
test_creation_nested_1 23.6043μs 22.5789μs 44.2891 KOps/s 45.4928 KOps/s $\color{#d91a1a}-2.65\%$
test_creation_nested_2 25.9054μs 23.5424μs 42.4765 KOps/s 42.3503 KOps/s $\color{#35bf28}+0.30\%$
test_clone 22.7513μs 21.4895μs 46.5343 KOps/s 45.7436 KOps/s $\color{#35bf28}+1.73\%$
test_getitem[int] 26.7899μs 25.7370μs 38.8545 KOps/s 39.2289 KOps/s $\color{#d91a1a}-0.95\%$
test_getitem[slice_int] 57.1179μs 53.1039μs 18.8310 KOps/s 19.1551 KOps/s $\color{#d91a1a}-1.69\%$
test_getitem[range] 65.0189μs 59.7029μs 16.7496 KOps/s 17.0419 KOps/s $\color{#d91a1a}-1.72\%$
test_getitem[tuple] 53.3438μs 49.4914μs 20.2055 KOps/s 20.6274 KOps/s $\color{#d91a1a}-2.05\%$
test_getitem[list] 59.0771μs 53.8204μs 18.5803 KOps/s 19.2811 KOps/s $\color{#d91a1a}-3.63\%$
test_setitem_dim[int] 68.8010μs 38.2819μs 26.1220 KOps/s 26.5803 KOps/s $\color{#d91a1a}-1.72\%$
test_setitem_dim[slice_int] 0.1436ms 68.4015μs 14.6196 KOps/s 14.7240 KOps/s $\color{#d91a1a}-0.71\%$
test_setitem_dim[range] 0.1424ms 69.6866μs 14.3500 KOps/s 14.6778 KOps/s $\color{#d91a1a}-2.23\%$
test_setitem_dim[tuple] 95.8010μs 61.2950μs 16.3145 KOps/s 16.4174 KOps/s $\color{#d91a1a}-0.63\%$
test_setitem 31.8324μs 29.9289μs 33.4125 KOps/s 33.0491 KOps/s $\color{#35bf28}+1.10\%$
test_set 30.9124μs 29.2409μs 34.1987 KOps/s 33.8203 KOps/s $\color{#35bf28}+1.12\%$
test_set_shared 0.1713ms 0.1666ms 6.0027 KOps/s 5.9589 KOps/s $\color{#35bf28}+0.73\%$
test_update 39.4416μs 37.3758μs 26.7552 KOps/s 26.5816 KOps/s $\color{#35bf28}+0.65\%$
test_update_nested 54.5058μs 53.4388μs 18.7130 KOps/s 18.5926 KOps/s $\color{#35bf28}+0.65\%$
test_set_nested 38.4466μs 36.8375μs 27.1463 KOps/s 26.8862 KOps/s $\color{#35bf28}+0.97\%$
test_set_nested_new 52.7938μs 51.2965μs 19.4945 KOps/s 19.1044 KOps/s $\color{#35bf28}+2.04\%$
test_select 83.8602μs 81.5532μs 12.2619 KOps/s 12.3248 KOps/s $\color{#d91a1a}-0.51\%$
test_creation[device0] 1.2133ms 0.4971ms 2.0117 KOps/s 2.0000 KOps/s $\color{#35bf28}+0.58\%$
test_creation_from_tensor 0.5867ms 0.4685ms 2.1346 KOps/s 2.1217 KOps/s $\color{#35bf28}+0.61\%$
test_add_one[memmap_tensor0] 48.1687μs 29.4526μs 33.9529 KOps/s 33.8414 KOps/s $\color{#35bf28}+0.33\%$
test_contiguous[memmap_tensor0] 8.3231μs 7.8694μs 127.0749 KOps/s 120.6105 KOps/s $\textbf{\color{#35bf28}+5.36\%}$
test_stack[memmap_tensor0] 0.1907ms 44.5407μs 22.4514 KOps/s 23.2880 KOps/s $\color{#d91a1a}-3.59\%$
test_reshape_pytree 31.1724μs 28.4134μs 35.1946 KOps/s 34.6509 KOps/s $\color{#35bf28}+1.57\%$
test_reshape_td 41.9076μs 39.4706μs 25.3353 KOps/s 25.5044 KOps/s $\color{#d91a1a}-0.66\%$
test_view_pytree 27.2894μs 26.0390μs 38.4040 KOps/s 38.1078 KOps/s $\color{#35bf28}+0.78\%$
test_view_td 7.9931μs 6.9969μs 142.9208 KOps/s 142.8573 KOps/s $\color{#35bf28}+0.04\%$
test_unbind_pytree 32.1215μs 30.5991μs 32.6807 KOps/s 33.3162 KOps/s $\color{#d91a1a}-1.91\%$
test_unbind_td 0.1537ms 0.1509ms 6.6281 KOps/s 7.0089 KOps/s $\textbf{\color{#d91a1a}-5.43\%}$
test_split_pytree 36.0085μs 34.0331μs 29.3831 KOps/s 29.4610 KOps/s $\color{#d91a1a}-0.26\%$
test_split_td 97.4744μs 94.6659μs 10.5635 KOps/s 10.8312 KOps/s $\color{#d91a1a}-2.47\%$
test_add_pytree 40.0206μs 37.7422μs 26.4956 KOps/s 26.5014 KOps/s $\color{#d91a1a}-0.02\%$
test_add_td 64.4949μs 61.9605μs 16.1393 KOps/s 16.4946 KOps/s $\color{#d91a1a}-2.15\%$
test_distributed 73.1010μs 73.1010μs 13.6797 KOps/s 12.0770 KOps/s $\textbf{\color{#35bf28}+13.27\%}$
test_tdmodule 46.9010μs 24.2864μs 41.1753 KOps/s 42.2632 KOps/s $\color{#d91a1a}-2.57\%$
test_tdmodule_dispatch 0.2217ms 53.5443μs 18.6761 KOps/s 19.1907 KOps/s $\color{#d91a1a}-2.68\%$
test_tdseq 98.6010μs 32.4940μs 30.7749 KOps/s 31.5865 KOps/s $\color{#d91a1a}-2.57\%$
test_tdseq_dispatch 0.1135ms 63.1244μs 15.8417 KOps/s 15.9736 KOps/s $\color{#d91a1a}-0.83\%$
test_instantiation_functorch 1.3406ms 1.2724ms 785.8951 Ops/s 781.9152 Ops/s $\color{#35bf28}+0.51\%$
test_instantiation_td 1.1561ms 0.9940ms 1.0061 KOps/s 1.0198 KOps/s $\color{#d91a1a}-1.34\%$
test_exec_functorch 0.1874ms 0.1579ms 6.3343 KOps/s 6.2909 KOps/s $\color{#35bf28}+0.69\%$
test_exec_td 0.2786ms 0.2723ms 3.6729 KOps/s 3.6314 KOps/s $\color{#35bf28}+1.14\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmarks CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants