Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Fix benchmark on gpu #560

Merged
merged 8 commits into from
Nov 20, 2023
Merged

[CI] Fix benchmark on gpu #560

merged 8 commits into from
Nov 20, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Nov 20, 2023

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 20, 2023
@vmoens vmoens added the CI label Nov 20, 2023
@vmoens vmoens marked this pull request as ready for review November 20, 2023 15:07
Copy link

github-actions bot commented Nov 20, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 105. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 32.8110μs 15.0174μs 66.5896 KOps/s 67.6372 KOps/s $\color{#d91a1a}-1.55\%$
test_plain_set_stack_nested 0.2666ms 0.1389ms 7.2015 KOps/s 7.1892 KOps/s $\color{#35bf28}+0.17\%$
test_plain_set_nested_inplace 44.0530μs 18.1322μs 55.1506 KOps/s 56.4284 KOps/s $\color{#d91a1a}-2.26\%$
test_plain_set_stack_nested_inplace 0.2174ms 0.1715ms 5.8316 KOps/s 5.9171 KOps/s $\color{#d91a1a}-1.44\%$
test_items 24.5160μs 2.5501μs 392.1389 KOps/s 407.9049 KOps/s $\color{#d91a1a}-3.87\%$
test_items_nested 0.8962ms 0.2737ms 3.6538 KOps/s 3.7161 KOps/s $\color{#d91a1a}-1.68\%$
test_items_nested_locked 1.3444ms 0.2868ms 3.4873 KOps/s 3.7543 KOps/s $\textbf{\color{#d91a1a}-7.11\%}$
test_items_nested_leaf 0.5263ms 0.1685ms 5.9350 KOps/s 6.0886 KOps/s $\color{#d91a1a}-2.52\%$
test_items_stack_nested 1.4919ms 1.3734ms 728.1098 Ops/s 729.5118 Ops/s $\color{#d91a1a}-0.19\%$
test_items_stack_nested_leaf 1.3394ms 1.2428ms 804.6587 Ops/s 799.6878 Ops/s $\color{#35bf28}+0.62\%$
test_items_stack_nested_locked 1.8183ms 0.7459ms 1.3407 KOps/s 1.3335 KOps/s $\color{#35bf28}+0.54\%$
test_keys 28.0220μs 3.8352μs 260.7409 KOps/s 259.6660 KOps/s $\color{#35bf28}+0.41\%$
test_keys_nested 0.5208ms 0.1368ms 7.3076 KOps/s 6.8070 KOps/s $\textbf{\color{#35bf28}+7.35\%}$
test_keys_nested_locked 0.2361ms 0.1370ms 7.3018 KOps/s 7.2956 KOps/s $\color{#35bf28}+0.09\%$
test_keys_nested_leaf 0.5872ms 0.1348ms 7.4209 KOps/s 7.3736 KOps/s $\color{#35bf28}+0.64\%$
test_keys_stack_nested 3.0926ms 1.2718ms 786.2568 Ops/s 790.5633 Ops/s $\color{#d91a1a}-0.54\%$
test_keys_stack_nested_leaf 1.5245ms 1.2673ms 789.0876 Ops/s 788.9020 Ops/s $\color{#35bf28}+0.02\%$
test_keys_stack_nested_locked 0.7326ms 0.6194ms 1.6145 KOps/s 1.5757 KOps/s $\color{#35bf28}+2.46\%$
test_values 6.7125μs 1.1593μs 862.5927 KOps/s 867.8786 KOps/s $\color{#d91a1a}-0.61\%$
test_values_nested 0.1361ms 47.9432μs 20.8580 KOps/s 21.5702 KOps/s $\color{#d91a1a}-3.30\%$
test_values_nested_locked 91.0600μs 48.4163μs 20.6542 KOps/s 20.3825 KOps/s $\color{#35bf28}+1.33\%$
test_values_nested_leaf 2.4463ms 43.0675μs 23.2194 KOps/s 24.0572 KOps/s $\color{#d91a1a}-3.48\%$
test_values_stack_nested 1.2635ms 1.0961ms 912.3259 Ops/s 915.8886 Ops/s $\color{#d91a1a}-0.39\%$
test_values_stack_nested_leaf 1.3428ms 1.0947ms 913.4774 Ops/s 920.9305 Ops/s $\color{#d91a1a}-0.81\%$
test_values_stack_nested_locked 0.8541ms 0.4901ms 2.0405 KOps/s 2.0166 KOps/s $\color{#35bf28}+1.18\%$
test_membership 13.4450μs 1.3328μs 750.3026 KOps/s 739.9266 KOps/s $\color{#35bf28}+1.40\%$
test_membership_nested 19.6170μs 2.8144μs 355.3147 KOps/s 362.0207 KOps/s $\color{#d91a1a}-1.85\%$
test_membership_nested_leaf 17.5730μs 2.8470μs 351.2500 KOps/s 358.2648 KOps/s $\color{#d91a1a}-1.96\%$
test_membership_stacked_nested 35.6060μs 11.5916μs 86.2693 KOps/s 88.6321 KOps/s $\color{#d91a1a}-2.67\%$
test_membership_stacked_nested_leaf 36.2870μs 11.4610μs 87.2528 KOps/s 87.9574 KOps/s $\color{#d91a1a}-0.80\%$
test_membership_nested_last 22.3620μs 5.8782μs 170.1201 KOps/s 150.8290 KOps/s $\textbf{\color{#35bf28}+12.79\%}$
test_membership_nested_leaf_last 21.0290μs 5.8709μs 170.3310 KOps/s 173.2386 KOps/s $\color{#d91a1a}-1.68\%$
test_membership_stacked_nested_last 0.2519ms 0.1782ms 5.6110 KOps/s 5.6937 KOps/s $\color{#d91a1a}-1.45\%$
test_membership_stacked_nested_leaf_last 44.2820μs 13.3764μs 74.7588 KOps/s 75.3404 KOps/s $\color{#d91a1a}-0.77\%$
test_nested_getleaf 29.8860μs 11.9594μs 83.6162 KOps/s 84.1485 KOps/s $\color{#d91a1a}-0.63\%$
test_nested_get 31.0880μs 11.2533μs 88.8625 KOps/s 89.5569 KOps/s $\color{#d91a1a}-0.78\%$
test_stacked_getleaf 3.4390ms 0.5668ms 1.7644 KOps/s 1.7771 KOps/s $\color{#d91a1a}-0.71\%$
test_stacked_get 0.6164ms 0.5376ms 1.8602 KOps/s 1.8660 KOps/s $\color{#d91a1a}-0.31\%$
test_nested_getitemleaf 38.5510μs 11.7643μs 85.0029 KOps/s 81.7771 KOps/s $\color{#35bf28}+3.94\%$
test_nested_getitem 32.8710μs 11.2684μs 88.7438 KOps/s 87.0752 KOps/s $\color{#35bf28}+1.92\%$
test_stacked_getitemleaf 0.7182ms 0.5639ms 1.7735 KOps/s 1.7817 KOps/s $\color{#d91a1a}-0.46\%$
test_stacked_getitem 0.6847ms 0.5354ms 1.8677 KOps/s 1.8639 KOps/s $\color{#35bf28}+0.20\%$
test_lock_nested 50.2652ms 0.9497ms 1.0529 KOps/s 1.1293 KOps/s $\textbf{\color{#d91a1a}-6.76\%}$
test_lock_stack_nested 68.1261ms 12.1231ms 82.4871 Ops/s 76.0472 Ops/s $\textbf{\color{#35bf28}+8.47\%}$
test_unlock_nested 46.9563ms 0.9394ms 1.0645 KOps/s 1.0689 KOps/s $\color{#d91a1a}-0.40\%$
test_unlock_stack_nested 58.9615ms 12.4471ms 80.3397 Ops/s 73.3793 Ops/s $\textbf{\color{#35bf28}+9.49\%}$
test_flatten_speed 0.7517ms 0.6620ms 1.5106 KOps/s 1.5325 KOps/s $\color{#d91a1a}-1.43\%$
test_unflatten_speed 1.7602ms 1.1526ms 867.6162 Ops/s 879.6400 Ops/s $\color{#d91a1a}-1.37\%$
test_common_ops 0.7766ms 0.6301ms 1.5871 KOps/s 1.5271 KOps/s $\color{#35bf28}+3.93\%$
test_creation 25.7080μs 2.0901μs 478.4375 KOps/s 451.6151 KOps/s $\textbf{\color{#35bf28}+5.94\%}$
test_creation_empty 23.9950μs 7.0561μs 141.7208 KOps/s 138.3305 KOps/s $\color{#35bf28}+2.45\%$
test_creation_nested_1 34.4940μs 11.0209μs 90.7365 KOps/s 89.1912 KOps/s $\color{#35bf28}+1.73\%$
test_creation_nested_2 55.2430μs 13.4992μs 74.0787 KOps/s 74.6328 KOps/s $\color{#d91a1a}-0.74\%$
test_clone 64.3600μs 10.6614μs 93.7959 KOps/s 93.8949 KOps/s $\color{#d91a1a}-0.11\%$
test_getitem[int] 31.3990μs 13.0654μs 76.5381 KOps/s 76.0302 KOps/s $\color{#35bf28}+0.67\%$
test_getitem[slice_int] 0.3794ms 32.9490μs 30.3500 KOps/s 32.1224 KOps/s $\textbf{\color{#d91a1a}-5.52\%}$
test_getitem[range] 0.1709ms 54.1174μs 18.4783 KOps/s 18.7081 KOps/s $\color{#d91a1a}-1.23\%$
test_getitem[tuple] 65.8720μs 23.6103μs 42.3545 KOps/s 41.0725 KOps/s $\color{#35bf28}+3.12\%$
test_getitem[list] 0.1769ms 48.6429μs 20.5580 KOps/s 18.7149 KOps/s $\textbf{\color{#35bf28}+9.85\%}$
test_setitem_dim[int] 52.4380μs 25.6440μs 38.9954 KOps/s 38.0116 KOps/s $\color{#35bf28}+2.59\%$
test_setitem_dim[slice_int] 90.6290μs 49.6643μs 20.1352 KOps/s 19.9082 KOps/s $\color{#35bf28}+1.14\%$
test_setitem_dim[range] 0.1027ms 70.2550μs 14.2339 KOps/s 14.1430 KOps/s $\color{#35bf28}+0.64\%$
test_setitem_dim[tuple] 80.9810μs 39.0954μs 25.5785 KOps/s 25.5216 KOps/s $\color{#35bf28}+0.22\%$
test_setitem 65.5920μs 14.9509μs 66.8856 KOps/s 67.1441 KOps/s $\color{#d91a1a}-0.38\%$
test_set 62.9570μs 14.2338μs 70.2553 KOps/s 70.4042 KOps/s $\color{#d91a1a}-0.21\%$
test_set_shared 3.3062ms 0.1576ms 6.3470 KOps/s 6.3694 KOps/s $\color{#d91a1a}-0.35\%$
test_update 0.1205ms 18.8311μs 53.1037 KOps/s 52.8918 KOps/s $\color{#35bf28}+0.40\%$
test_update_nested 73.2260μs 27.1920μs 36.7755 KOps/s 36.2440 KOps/s $\color{#35bf28}+1.47\%$
test_set_nested 85.6700μs 16.2125μs 61.6809 KOps/s 62.6507 KOps/s $\color{#d91a1a}-1.55\%$
test_set_nested_new 87.1520μs 22.3787μs 44.6854 KOps/s 43.2272 KOps/s $\color{#35bf28}+3.37\%$
test_select 94.5460μs 45.6572μs 21.9023 KOps/s 21.2024 KOps/s $\color{#35bf28}+3.30\%$
test_unbind_speed 0.7873ms 0.2888ms 3.4622 KOps/s 3.5481 KOps/s $\color{#d91a1a}-2.42\%$
test_unbind_speed_stack0 54.8914ms 4.2757ms 233.8787 Ops/s 221.7600 Ops/s $\textbf{\color{#35bf28}+5.46\%}$
test_unbind_speed_stack1 1.9947μs 0.5964μs 1.6768 MOps/s 1.6065 MOps/s $\color{#35bf28}+4.38\%$
test_creation[device0] 3.4215ms 0.3029ms 3.3012 KOps/s 3.3388 KOps/s $\color{#d91a1a}-1.13\%$
test_creation_from_tensor 0.7272ms 0.3324ms 3.0080 KOps/s 2.9701 KOps/s $\color{#35bf28}+1.27\%$
test_add_one[memmap_tensor0] 0.1274ms 26.5851μs 37.6151 KOps/s 38.7605 KOps/s $\color{#d91a1a}-2.96\%$
test_contiguous[memmap_tensor0] 23.1530μs 5.7710μs 173.2792 KOps/s 173.4526 KOps/s $\color{#d91a1a}-0.10\%$
test_stack[memmap_tensor0] 51.4560μs 19.6720μs 50.8337 KOps/s 51.7013 KOps/s $\color{#d91a1a}-1.68\%$
test_memmaptd_index 0.2390ms 0.1815ms 5.5102 KOps/s 5.3761 KOps/s $\color{#35bf28}+2.49\%$
test_memmaptd_index_astensor 0.2947ms 0.2425ms 4.1242 KOps/s 4.0411 KOps/s $\color{#35bf28}+2.06\%$
test_memmaptd_index_op 0.8274ms 0.4968ms 2.0129 KOps/s 2.1646 KOps/s $\textbf{\color{#d91a1a}-7.01\%}$
test_reshape_pytree 68.9180μs 22.7993μs 43.8611 KOps/s 43.5543 KOps/s $\color{#35bf28}+0.70\%$
test_reshape_td 55.6240μs 20.7115μs 48.2823 KOps/s 48.4528 KOps/s $\color{#d91a1a}-0.35\%$
test_view_pytree 56.4550μs 22.6816μs 44.0885 KOps/s 43.4113 KOps/s $\color{#35bf28}+1.56\%$
test_view_td 16.5510μs 4.0878μs 244.6302 KOps/s 246.1416 KOps/s $\color{#d91a1a}-0.61\%$
test_unbind_pytree 68.4680μs 26.2128μs 38.1493 KOps/s 37.6576 KOps/s $\color{#35bf28}+1.31\%$
test_unbind_td 89.2370μs 39.4235μs 25.3656 KOps/s 25.0111 KOps/s $\color{#35bf28}+1.42\%$
test_split_pytree 56.8960μs 25.9174μs 38.5842 KOps/s 37.7629 KOps/s $\color{#35bf28}+2.17\%$
test_split_td 0.1585ms 74.4740μs 13.4275 KOps/s 13.2572 KOps/s $\color{#35bf28}+1.28\%$
test_add_pytree 71.1320μs 32.0725μs 31.1793 KOps/s 31.0985 KOps/s $\color{#35bf28}+0.26\%$
test_add_td 0.1240ms 42.2238μs 23.6833 KOps/s 23.6382 KOps/s $\color{#35bf28}+0.19\%$
test_distributed 17.6330μs 5.8927μs 169.7007 KOps/s 167.8953 KOps/s $\color{#35bf28}+1.08\%$
test_tdmodule 1.0494ms 21.9455μs 45.5674 KOps/s 49.2205 KOps/s $\textbf{\color{#d91a1a}-7.42\%}$
test_tdmodule_dispatch 0.1923ms 37.8644μs 26.4100 KOps/s 27.3217 KOps/s $\color{#d91a1a}-3.34\%$
test_tdseq 0.1180ms 23.1662μs 43.1663 KOps/s 43.7531 KOps/s $\color{#d91a1a}-1.34\%$
test_tdseq_dispatch 0.1322ms 40.4695μs 24.7100 KOps/s 24.5219 KOps/s $\color{#35bf28}+0.77\%$
test_instantiation_functorch 2.2132ms 1.3241ms 755.2473 Ops/s 777.3539 Ops/s $\color{#d91a1a}-2.84\%$
test_instantiation_td 71.4649ms 1.1474ms 871.5006 Ops/s 974.1470 Ops/s $\textbf{\color{#d91a1a}-10.54\%}$
test_exec_functorch 0.3823ms 0.1502ms 6.6570 KOps/s 6.7436 KOps/s $\color{#d91a1a}-1.28\%$
test_exec_td 0.2310ms 0.1493ms 6.6977 KOps/s 7.1063 KOps/s $\textbf{\color{#d91a1a}-5.75\%}$
test_vmap_mlp_speed[True-True] 1.1895ms 0.8553ms 1.1692 KOps/s 1.2024 KOps/s $\color{#d91a1a}-2.77\%$
test_vmap_mlp_speed[True-False] 0.6074ms 0.4690ms 2.1322 KOps/s 2.1745 KOps/s $\color{#d91a1a}-1.95\%$
test_vmap_mlp_speed[False-True] 1.0908ms 0.7452ms 1.3419 KOps/s 1.3735 KOps/s $\color{#d91a1a}-2.30\%$
test_vmap_mlp_speed[False-False] 0.5439ms 0.3900ms 2.5638 KOps/s 2.5208 KOps/s $\color{#35bf28}+1.71\%$

Copy link

github-actions bot commented Nov 20, 2023

$\color{#35bf28}\textsf{\Large✔\kern{0.2cm}\normalsize OK}$ Result of GPU Benchmark Tests

Total Benchmarks: 115. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}0$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 92.4110μs 12.1926μs 82.0170 KOps/s 81.5906 KOps/s $\color{#35bf28}+0.52\%$
test_plain_set_stack_nested 0.1279ms 0.1150ms 8.6971 KOps/s 8.6189 KOps/s $\color{#35bf28}+0.91\%$
test_plain_set_nested_inplace 31.3700μs 14.4738μs 69.0903 KOps/s 68.6399 KOps/s $\color{#35bf28}+0.66\%$
test_plain_set_stack_nested_inplace 0.2091ms 0.1409ms 7.0986 KOps/s 7.0937 KOps/s $\color{#35bf28}+0.07\%$
test_items 20.3410μs 4.7345μs 211.2146 KOps/s 210.1243 KOps/s $\color{#35bf28}+0.52\%$
test_items_nested 0.3898ms 0.3367ms 2.9703 KOps/s 2.9746 KOps/s $\color{#d91a1a}-0.14\%$
test_items_nested_locked 0.3786ms 0.3370ms 2.9677 KOps/s 2.9677 KOps/s $-0.00\%$
test_items_nested_leaf 0.2469ms 0.1992ms 5.0201 KOps/s 5.0349 KOps/s $\color{#d91a1a}-0.29\%$
test_items_stack_nested 1.4571ms 1.4167ms 705.8879 Ops/s 705.3260 Ops/s $\color{#35bf28}+0.08\%$
test_items_stack_nested_leaf 1.3258ms 1.2488ms 800.7984 Ops/s 797.0642 Ops/s $\color{#35bf28}+0.47\%$
test_items_stack_nested_locked 0.8452ms 0.7923ms 1.2621 KOps/s 1.2604 KOps/s $\color{#35bf28}+0.14\%$
test_keys 26.3600μs 4.6179μs 216.5477 KOps/s 202.2872 KOps/s $\textbf{\color{#35bf28}+7.05\%}$
test_keys_nested 0.5408ms 90.1231μs 11.0959 KOps/s 10.4007 KOps/s $\textbf{\color{#35bf28}+6.68\%}$
test_keys_nested_locked 0.1053ms 89.7879μs 11.1374 KOps/s 10.9854 KOps/s $\color{#35bf28}+1.38\%$
test_keys_nested_leaf 0.1603ms 81.5775μs 12.2583 KOps/s 12.0583 KOps/s $\color{#35bf28}+1.66\%$
test_keys_stack_nested 1.2893ms 1.2085ms 827.4975 Ops/s 830.9497 Ops/s $\color{#d91a1a}-0.42\%$
test_keys_stack_nested_leaf 1.3297ms 1.2116ms 825.3555 Ops/s 839.2417 Ops/s $\color{#d91a1a}-1.65\%$
test_keys_stack_nested_locked 0.6331ms 0.5852ms 1.7088 KOps/s 1.7294 KOps/s $\color{#d91a1a}-1.19\%$
test_values 9.9303μs 1.8838μs 530.8503 KOps/s 521.7376 KOps/s $\color{#35bf28}+1.75\%$
test_values_nested 65.0010μs 43.4554μs 23.0121 KOps/s 23.0069 KOps/s $\color{#35bf28}+0.02\%$
test_values_nested_locked 58.2410μs 43.5325μs 22.9713 KOps/s 22.9603 KOps/s $\color{#35bf28}+0.05\%$
test_values_nested_leaf 52.0410μs 37.7098μs 26.5183 KOps/s 26.5824 KOps/s $\color{#d91a1a}-0.24\%$
test_values_stack_nested 1.1235ms 1.0724ms 932.5194 Ops/s 930.0000 Ops/s $\color{#35bf28}+0.27\%$
test_values_stack_nested_leaf 1.2352ms 1.0669ms 937.2617 Ops/s 938.8818 Ops/s $\color{#d91a1a}-0.17\%$
test_values_stack_nested_locked 0.5761ms 0.4784ms 2.0905 KOps/s 2.1031 KOps/s $\color{#d91a1a}-0.60\%$
test_membership 3.5540μs 0.9315μs 1.0736 MOps/s 1.0686 MOps/s $\color{#35bf28}+0.47\%$
test_membership_nested 13.9600μs 2.1943μs 455.7247 KOps/s 448.8045 KOps/s $\color{#35bf28}+1.54\%$
test_membership_nested_leaf 17.1600μs 2.1905μs 456.5150 KOps/s 449.2791 KOps/s $\color{#35bf28}+1.61\%$
test_membership_stacked_nested 33.8800μs 10.8392μs 92.2577 KOps/s 92.1503 KOps/s $\color{#35bf28}+0.12\%$
test_membership_stacked_nested_leaf 29.1020μs 10.7255μs 93.2357 KOps/s 93.0755 KOps/s $\color{#35bf28}+0.17\%$
test_membership_nested_last 28.0600μs 4.5002μs 222.2124 KOps/s 218.4311 KOps/s $\color{#35bf28}+1.73\%$
test_membership_nested_leaf_last 23.0800μs 4.5662μs 218.9991 KOps/s 218.9421 KOps/s $\color{#35bf28}+0.03\%$
test_membership_stacked_nested_last 0.1845ms 0.1439ms 6.9470 KOps/s 7.0447 KOps/s $\color{#d91a1a}-1.39\%$
test_membership_stacked_nested_leaf_last 27.8100μs 12.5917μs 79.4172 KOps/s 79.2715 KOps/s $\color{#35bf28}+0.18\%$
test_nested_getleaf 22.4110μs 9.3837μs 106.5683 KOps/s 106.4895 KOps/s $\color{#35bf28}+0.07\%$
test_nested_get 26.1700μs 8.8224μs 113.3477 KOps/s 112.6540 KOps/s $\color{#35bf28}+0.62\%$
test_stacked_getleaf 0.5808ms 0.5299ms 1.8871 KOps/s 1.9144 KOps/s $\color{#d91a1a}-1.43\%$
test_stacked_get 0.5328ms 0.5024ms 1.9903 KOps/s 2.0218 KOps/s $\color{#d91a1a}-1.56\%$
test_nested_getitemleaf 23.2900μs 9.4181μs 106.1787 KOps/s 105.9320 KOps/s $\color{#35bf28}+0.23\%$
test_nested_getitem 21.4100μs 8.8945μs 112.4287 KOps/s 111.6064 KOps/s $\color{#35bf28}+0.74\%$
test_stacked_getitemleaf 0.5832ms 0.5189ms 1.9272 KOps/s 1.9254 KOps/s $\color{#35bf28}+0.09\%$
test_stacked_getitem 0.5453ms 0.4912ms 2.0357 KOps/s 2.0378 KOps/s $\color{#d91a1a}-0.11\%$
test_lock_nested 46.7588ms 0.9297ms 1.0757 KOps/s 1.1272 KOps/s $\color{#d91a1a}-4.58\%$
test_lock_stack_nested 57.0526ms 11.9637ms 83.5862 Ops/s 82.2695 Ops/s $\color{#35bf28}+1.60\%$
test_unlock_nested 45.6521ms 0.9540ms 1.0483 KOps/s 1.0332 KOps/s $\color{#35bf28}+1.46\%$
test_unlock_stack_nested 57.8770ms 12.7480ms 78.4439 Ops/s 76.7368 Ops/s $\color{#35bf28}+2.22\%$
test_flatten_speed 0.6116ms 0.5464ms 1.8301 KOps/s 1.8347 KOps/s $\color{#d91a1a}-0.25\%$
test_unflatten_speed 1.0393ms 0.9794ms 1.0210 KOps/s 1.0141 KOps/s $\color{#35bf28}+0.69\%$
test_common_ops 0.6340ms 0.5600ms 1.7856 KOps/s 1.7882 KOps/s $\color{#d91a1a}-0.15\%$
test_creation 27.1710μs 1.7602μs 568.1028 KOps/s 575.2659 KOps/s $\color{#d91a1a}-1.25\%$
test_creation_empty 19.7600μs 5.9800μs 167.2237 KOps/s 168.7143 KOps/s $\color{#d91a1a}-0.88\%$
test_creation_nested_1 22.7000μs 8.8453μs 113.0539 KOps/s 115.9023 KOps/s $\color{#d91a1a}-2.46\%$
test_creation_nested_2 29.6200μs 10.6196μs 94.1654 KOps/s 95.8110 KOps/s $\color{#d91a1a}-1.72\%$
test_clone 70.2010μs 11.5695μs 86.4341 KOps/s 87.7803 KOps/s $\color{#d91a1a}-1.53\%$
test_getitem[int] 27.0000μs 12.3105μs 81.2313 KOps/s 80.3624 KOps/s $\color{#35bf28}+1.08\%$
test_getitem[slice_int] 53.7300μs 27.7116μs 36.0860 KOps/s 35.9572 KOps/s $\color{#35bf28}+0.36\%$
test_getitem[range] 78.0020μs 46.8662μs 21.3373 KOps/s 21.2215 KOps/s $\color{#35bf28}+0.55\%$
test_getitem[tuple] 57.2810μs 23.5850μs 42.3998 KOps/s 42.7968 KOps/s $\color{#d91a1a}-0.93\%$
test_getitem[list] 0.1768ms 44.0559μs 22.6984 KOps/s 22.4660 KOps/s $\color{#35bf28}+1.03\%$
test_setitem_dim[int] 40.2010μs 24.7997μs 40.3231 KOps/s 37.0647 KOps/s $\textbf{\color{#35bf28}+8.79\%}$
test_setitem_dim[slice_int] 60.7910μs 43.8856μs 22.7865 KOps/s 21.7280 KOps/s $\color{#35bf28}+4.87\%$
test_setitem_dim[range] 79.3720μs 61.2701μs 16.3212 KOps/s 15.8088 KOps/s $\color{#35bf28}+3.24\%$
test_setitem_dim[tuple] 59.4210μs 37.9338μs 26.3617 KOps/s 25.0789 KOps/s $\textbf{\color{#35bf28}+5.11\%}$
test_setitem 77.4310μs 14.6363μs 68.3231 KOps/s 66.9711 KOps/s $\color{#35bf28}+2.02\%$
test_set 58.6610μs 13.9550μs 71.6591 KOps/s 69.5657 KOps/s $\color{#35bf28}+3.01\%$
test_set_shared 0.1862ms 0.1130ms 8.8505 KOps/s 8.9332 KOps/s $\color{#d91a1a}-0.93\%$
test_update 77.4910μs 17.5000μs 57.1428 KOps/s 57.3233 KOps/s $\color{#d91a1a}-0.31\%$
test_update_nested 83.3620μs 24.4276μs 40.9372 KOps/s 40.5192 KOps/s $\color{#35bf28}+1.03\%$
test_set_nested 70.9910μs 15.4373μs 64.7783 KOps/s 64.7432 KOps/s $\color{#35bf28}+0.05\%$
test_set_nested_new 79.1520μs 20.5873μs 48.5736 KOps/s 47.5503 KOps/s $\color{#35bf28}+2.15\%$
test_select 93.5310μs 42.3815μs 23.5952 KOps/s 23.4565 KOps/s $\color{#35bf28}+0.59\%$
test_to 75.1610μs 53.9321μs 18.5418 KOps/s 18.7033 KOps/s $\color{#d91a1a}-0.86\%$
test_to_nonblocking 53.4010μs 34.3596μs 29.1040 KOps/s 27.7752 KOps/s $\color{#35bf28}+4.78\%$
test_unbind_speed 0.3863ms 0.2832ms 3.5307 KOps/s 3.5080 KOps/s $\color{#35bf28}+0.64\%$
test_unbind_speed_stack0 51.7947ms 3.7138ms 269.2655 Ops/s 254.0564 Ops/s $\textbf{\color{#35bf28}+5.99\%}$
test_unbind_speed_stack1 1.3010μs 0.4933μs 2.0273 MOps/s 2.0390 MOps/s $\color{#d91a1a}-0.57\%$
test_creation[device0] 0.7475ms 0.3088ms 3.2378 KOps/s 3.2278 KOps/s $\color{#35bf28}+0.31\%$
test_creation[device1] 0.6625ms 0.3115ms 3.2104 KOps/s 3.1750 KOps/s $\color{#35bf28}+1.11\%$
test_creation_from_tensor 0.6618ms 0.3373ms 2.9644 KOps/s 2.8993 KOps/s $\color{#35bf28}+2.25\%$
test_add_one[memmap_tensor0] 88.1920μs 24.6183μs 40.6202 KOps/s 38.4322 KOps/s $\textbf{\color{#35bf28}+5.69\%}$
test_add_one[memmap_tensor1] 0.2031ms 74.6339μs 13.3987 KOps/s 13.1869 KOps/s $\color{#35bf28}+1.61\%$
test_contiguous[memmap_tensor0] 26.0700μs 5.8537μs 170.8336 KOps/s 167.2464 KOps/s $\color{#35bf28}+2.14\%$
test_contiguous[memmap_tensor1] 46.7910μs 21.5969μs 46.3029 KOps/s 45.7398 KOps/s $\color{#35bf28}+1.23\%$
test_stack[memmap_tensor0] 47.1310μs 19.3575μs 51.6597 KOps/s 48.6033 KOps/s $\textbf{\color{#35bf28}+6.29\%}$
test_stack[memmap_tensor1] 0.1543ms 74.3075μs 13.4576 KOps/s 12.7174 KOps/s $\textbf{\color{#35bf28}+5.82\%}$
test_memmaptd_index 0.2657ms 0.2155ms 4.6397 KOps/s 4.4889 KOps/s $\color{#35bf28}+3.36\%$
test_memmaptd_index_astensor 0.3294ms 0.2764ms 3.6182 KOps/s 3.5861 KOps/s $\color{#35bf28}+0.90\%$
test_memmaptd_index_op 0.5821ms 0.5287ms 1.8915 KOps/s 1.8745 KOps/s $\color{#35bf28}+0.91\%$
test_reshape_pytree 46.0210μs 21.1115μs 47.3676 KOps/s 46.0152 KOps/s $\color{#35bf28}+2.94\%$
test_reshape_td 43.3010μs 21.1460μs 47.2903 KOps/s 47.7587 KOps/s $\color{#d91a1a}-0.98\%$
test_view_pytree 46.0310μs 20.5103μs 48.7561 KOps/s 48.2243 KOps/s $\color{#35bf28}+1.10\%$
test_view_td 15.3800μs 3.3233μs 300.9022 KOps/s 300.6594 KOps/s $\color{#35bf28}+0.08\%$
test_unbind_pytree 42.6510μs 25.7700μs 38.8048 KOps/s 37.6719 KOps/s $\color{#35bf28}+3.01\%$
test_unbind_td 58.6710μs 40.4776μs 24.7050 KOps/s 24.1791 KOps/s $\color{#35bf28}+2.17\%$
test_split_pytree 44.4810μs 23.8097μs 41.9997 KOps/s 41.9980 KOps/s $+0.00\%$
test_split_td 90.4620μs 71.3908μs 14.0074 KOps/s 13.9989 KOps/s $\color{#35bf28}+0.06\%$
test_add_pytree 50.0010μs 32.9705μs 30.3301 KOps/s 30.6131 KOps/s $\color{#d91a1a}-0.92\%$
test_add_td 67.3210μs 42.6716μs 23.4348 KOps/s 23.1827 KOps/s $\color{#35bf28}+1.09\%$
test_distributed 18.9900μs 5.5043μs 181.6777 KOps/s 179.2840 KOps/s $\color{#35bf28}+1.34\%$
test_tdmodule 30.9700μs 16.2118μs 61.6833 KOps/s 60.0380 KOps/s $\color{#35bf28}+2.74\%$
test_tdmodule_dispatch 0.1227ms 30.4425μs 32.8488 KOps/s 32.3439 KOps/s $\color{#35bf28}+1.56\%$
test_tdseq 34.7010μs 19.4107μs 51.5180 KOps/s 51.1183 KOps/s $\color{#35bf28}+0.78\%$
test_tdseq_dispatch 49.9810μs 33.9604μs 29.4461 KOps/s 29.3242 KOps/s $\color{#35bf28}+0.42\%$
test_instantiation_functorch 1.7282ms 1.6788ms 595.6772 Ops/s 587.4243 Ops/s $\color{#35bf28}+1.40\%$
test_instantiation_td 1.7949ms 1.2052ms 829.7099 Ops/s 834.2530 Ops/s $\color{#d91a1a}-0.54\%$
test_exec_functorch 0.2000ms 0.1589ms 6.2952 KOps/s 6.3283 KOps/s $\color{#d91a1a}-0.52\%$
test_exec_td 0.1798ms 0.1498ms 6.6735 KOps/s 6.7070 KOps/s $\color{#d91a1a}-0.50\%$
test_vmap_mlp_speed[True-True] 1.1075ms 1.0508ms 951.6783 Ops/s 953.1853 Ops/s $\color{#d91a1a}-0.16\%$
test_vmap_mlp_speed[True-False] 0.6671ms 0.6215ms 1.6089 KOps/s 1.6038 KOps/s $\color{#35bf28}+0.32\%$
test_vmap_mlp_speed[False-True] 1.0009ms 0.9636ms 1.0377 KOps/s 1.0374 KOps/s $\color{#35bf28}+0.03\%$
test_vmap_mlp_speed[False-False] 0.5971ms 0.5577ms 1.7930 KOps/s 1.7193 KOps/s $\color{#35bf28}+4.29\%$
test_vmap_transformer_speed[True-True] 12.5471ms 12.4505ms 80.3178 Ops/s 79.6500 Ops/s $\color{#35bf28}+0.84\%$
test_vmap_transformer_speed[True-False] 8.4848ms 8.3449ms 119.8333 Ops/s 118.5951 Ops/s $\color{#35bf28}+1.04\%$
test_vmap_transformer_speed[False-True] 12.4473ms 12.3517ms 80.9606 Ops/s 80.7120 Ops/s $\color{#35bf28}+0.31\%$
test_vmap_transformer_speed[False-False] 8.4649ms 8.2917ms 120.6025 Ops/s 119.9105 Ops/s $\color{#35bf28}+0.58\%$

@vmoens vmoens merged commit 2fec80e into main Nov 20, 2023
25 of 28 checks passed
@vmoens vmoens deleted the fix-gpu-bench branch November 20, 2023 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants