Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix, CI] Fix GPU benchmarks #611

Merged
merged 15 commits into from
Jan 5, 2024
Merged

[BugFix, CI] Fix GPU benchmarks #611

merged 15 commits into from
Jan 5, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 5, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 5, 2024
@vmoens vmoens added bug Something isn't working CI Benchmarks labels Jan 5, 2024
@vmoens vmoens marked this pull request as ready for review January 5, 2024 11:00
Copy link

github-actions bot commented Jan 5, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 120. Improved: $\large\color{#35bf28}16$. Worsened: $\large\color{#d91a1a}11$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 33.1620μs 17.4007μs 57.4690 KOps/s 59.1184 KOps/s $\color{#d91a1a}-2.79\%$
test_plain_set_stack_nested 0.2043ms 0.1435ms 6.9700 KOps/s 6.9449 KOps/s $\color{#35bf28}+0.36\%$
test_plain_set_nested_inplace 61.4540μs 19.5420μs 51.1720 KOps/s 51.8037 KOps/s $\color{#d91a1a}-1.22\%$
test_plain_set_stack_nested_inplace 0.3318ms 0.1780ms 5.6182 KOps/s 5.6081 KOps/s $\color{#35bf28}+0.18\%$
test_items 18.8950μs 2.4880μs 401.9256 KOps/s 408.1212 KOps/s $\color{#d91a1a}-1.52\%$
test_items_nested 0.4379ms 0.2682ms 3.7281 KOps/s 3.6571 KOps/s $\color{#35bf28}+1.94\%$
test_items_nested_locked 0.4506ms 0.2695ms 3.7110 KOps/s 3.6771 KOps/s $\color{#35bf28}+0.92\%$
test_items_nested_leaf 0.5457ms 0.1673ms 5.9767 KOps/s 5.9699 KOps/s $\color{#35bf28}+0.11\%$
test_items_stack_nested 1.4700ms 1.3231ms 755.8266 Ops/s 747.9916 Ops/s $\color{#35bf28}+1.05\%$
test_items_stack_nested_leaf 1.9284ms 1.2087ms 827.3298 Ops/s 836.1607 Ops/s $\color{#d91a1a}-1.06\%$
test_items_stack_nested_locked 1.0088ms 0.7734ms 1.2930 KOps/s 1.2977 KOps/s $\color{#d91a1a}-0.36\%$
test_keys 37.2300μs 4.1909μs 238.6136 KOps/s 239.8858 KOps/s $\color{#d91a1a}-0.53\%$
test_keys_nested 55.5088ms 0.1593ms 6.2762 KOps/s 6.4632 KOps/s $\color{#d91a1a}-2.89\%$
test_keys_nested_locked 0.3189ms 0.1460ms 6.8472 KOps/s 6.4919 KOps/s $\textbf{\color{#35bf28}+5.47\%}$
test_keys_nested_leaf 0.2135ms 0.1294ms 7.7307 KOps/s 7.5377 KOps/s $\color{#35bf28}+2.56\%$
test_keys_stack_nested 1.5008ms 1.2720ms 786.1468 Ops/s 772.6035 Ops/s $\color{#35bf28}+1.75\%$
test_keys_stack_nested_leaf 1.6517ms 1.2620ms 792.4217 Ops/s 777.8889 Ops/s $\color{#35bf28}+1.87\%$
test_keys_stack_nested_locked 0.8915ms 0.6944ms 1.4401 KOps/s 1.4133 KOps/s $\color{#35bf28}+1.89\%$
test_values 6.7766μs 1.1392μs 877.8436 KOps/s 851.5000 KOps/s $\color{#35bf28}+3.09\%$
test_values_nested 92.8930μs 53.8541μs 18.5687 KOps/s 18.3720 KOps/s $\color{#35bf28}+1.07\%$
test_values_nested_locked 0.1051ms 54.2026μs 18.4493 KOps/s 18.3374 KOps/s $\color{#35bf28}+0.61\%$
test_values_nested_leaf 98.9240μs 48.7197μs 20.5256 KOps/s 20.7840 KOps/s $\color{#d91a1a}-1.24\%$
test_values_stack_nested 1.2811ms 1.0546ms 948.2380 Ops/s 936.5131 Ops/s $\color{#35bf28}+1.25\%$
test_values_stack_nested_leaf 1.2237ms 1.0348ms 966.4084 Ops/s 948.7646 Ops/s $\color{#35bf28}+1.86\%$
test_values_stack_nested_locked 0.7484ms 0.5172ms 1.9334 KOps/s 1.9054 KOps/s $\color{#35bf28}+1.47\%$
test_membership 15.2690μs 1.4040μs 712.2559 KOps/s 699.6134 KOps/s $\color{#35bf28}+1.81\%$
test_membership_nested 45.8350μs 2.8800μs 347.2187 KOps/s 322.5696 KOps/s $\textbf{\color{#35bf28}+7.64\%}$
test_membership_nested_leaf 19.7460μs 2.9073μs 343.9562 KOps/s 315.2099 KOps/s $\textbf{\color{#35bf28}+9.12\%}$
test_membership_stacked_nested 57.0460μs 11.8387μs 84.4687 KOps/s 82.8571 KOps/s $\color{#35bf28}+1.95\%$
test_membership_stacked_nested_leaf 35.7370μs 11.8455μs 84.4200 KOps/s 82.3453 KOps/s $\color{#35bf28}+2.52\%$
test_membership_nested_last 44.3920μs 6.0849μs 164.3422 KOps/s 157.1372 KOps/s $\color{#35bf28}+4.59\%$
test_membership_nested_leaf_last 43.8420μs 6.1198μs 163.4039 KOps/s 157.2650 KOps/s $\color{#35bf28}+3.90\%$
test_membership_stacked_nested_last 0.2726ms 0.1689ms 5.9213 KOps/s 5.8885 KOps/s $\color{#35bf28}+0.56\%$
test_membership_stacked_nested_leaf_last 0.1787ms 14.5510μs 68.7236 KOps/s 70.8581 KOps/s $\color{#d91a1a}-3.01\%$
test_nested_getleaf 43.8110μs 10.8067μs 92.5351 KOps/s 92.9297 KOps/s $\color{#d91a1a}-0.42\%$
test_nested_get 45.7450μs 10.1839μs 98.1938 KOps/s 98.3789 KOps/s $\color{#d91a1a}-0.19\%$
test_stacked_getleaf 0.7038ms 0.4672ms 2.1405 KOps/s 2.1036 KOps/s $\color{#35bf28}+1.75\%$
test_stacked_get 0.5850ms 0.4390ms 2.2779 KOps/s 2.2483 KOps/s $\color{#35bf28}+1.31\%$
test_nested_getitemleaf 27.5410μs 10.8743μs 91.9598 KOps/s 91.8678 KOps/s $\color{#35bf28}+0.10\%$
test_nested_getitem 48.8110μs 10.2185μs 97.8619 KOps/s 97.3062 KOps/s $\color{#35bf28}+0.57\%$
test_stacked_getitemleaf 0.6449ms 0.4702ms 2.1267 KOps/s 2.0745 KOps/s $\color{#35bf28}+2.51\%$
test_stacked_getitem 0.6509ms 0.4407ms 2.2693 KOps/s 2.2290 KOps/s $\color{#35bf28}+1.81\%$
test_lock_nested 1.3721ms 0.4106ms 2.4357 KOps/s 2.2759 KOps/s $\textbf{\color{#35bf28}+7.02\%}$
test_lock_stack_nested 85.7205ms 6.7450ms 148.2590 Ops/s 128.3678 Ops/s $\textbf{\color{#35bf28}+15.50\%}$
test_unlock_nested 75.2142ms 0.4909ms 2.0371 KOps/s 2.1818 KOps/s $\textbf{\color{#d91a1a}-6.63\%}$
test_unlock_stack_nested 82.5595ms 6.3325ms 157.9165 Ops/s 137.2645 Ops/s $\textbf{\color{#35bf28}+15.05\%}$
test_flatten_speed 0.6315ms 0.3636ms 2.7505 KOps/s 2.6929 KOps/s $\color{#35bf28}+2.14\%$
test_unflatten_speed 0.7021ms 0.4522ms 2.2116 KOps/s 2.2209 KOps/s $\color{#d91a1a}-0.42\%$
test_common_ops 2.9020ms 0.7025ms 1.4234 KOps/s 1.3930 KOps/s $\color{#35bf28}+2.19\%$
test_creation 19.2260μs 1.9934μs 501.6487 KOps/s 488.8750 KOps/s $\color{#35bf28}+2.61\%$
test_creation_empty 31.1180μs 11.1392μs 89.7734 KOps/s 102.1292 KOps/s $\textbf{\color{#d91a1a}-12.10\%}$
test_creation_nested_1 97.9920μs 14.0809μs 71.0182 KOps/s 78.3109 KOps/s $\textbf{\color{#d91a1a}-9.31\%}$
test_creation_nested_2 47.7690μs 19.1138μs 52.3182 KOps/s 54.7666 KOps/s $\color{#d91a1a}-4.47\%$
test_clone 0.1348ms 12.1890μs 82.0413 KOps/s 79.1777 KOps/s $\color{#35bf28}+3.62\%$
test_getitem[int] 42.4990μs 11.6985μs 85.4808 KOps/s 81.3152 KOps/s $\textbf{\color{#35bf28}+5.12\%}$
test_getitem[slice_int] 60.6930μs 22.7272μs 44.0002 KOps/s 40.6790 KOps/s $\textbf{\color{#35bf28}+8.16\%}$
test_getitem[range] 83.9870μs 40.6970μs 24.5718 KOps/s 22.2015 KOps/s $\textbf{\color{#35bf28}+10.68\%}$
test_getitem[tuple] 53.0590μs 18.7999μs 53.1919 KOps/s 50.3760 KOps/s $\textbf{\color{#35bf28}+5.59\%}$
test_getitem[list] 0.1011ms 35.9526μs 27.8144 KOps/s 24.9452 KOps/s $\textbf{\color{#35bf28}+11.50\%}$
test_setitem_dim[int] 67.7660μs 31.1082μs 32.1458 KOps/s 31.6479 KOps/s $\color{#35bf28}+1.57\%$
test_setitem_dim[slice_int] 0.1015ms 56.8023μs 17.6049 KOps/s 17.3871 KOps/s $\color{#35bf28}+1.25\%$
test_setitem_dim[range] 0.1137ms 74.6982μs 13.3872 KOps/s 11.7668 KOps/s $\textbf{\color{#35bf28}+13.77\%}$
test_setitem_dim[tuple] 85.4290μs 46.0773μs 21.7027 KOps/s 21.0369 KOps/s $\color{#35bf28}+3.16\%$
test_setitem 0.1523ms 19.1389μs 52.2495 KOps/s 53.9652 KOps/s $\color{#d91a1a}-3.18\%$
test_set 0.1142ms 18.3929μs 54.3687 KOps/s 55.4491 KOps/s $\color{#d91a1a}-1.95\%$
test_set_shared 3.2345ms 0.1370ms 7.3019 KOps/s 6.8422 KOps/s $\textbf{\color{#35bf28}+6.72\%}$
test_update 0.1446ms 21.7032μs 46.0761 KOps/s 47.0377 KOps/s $\color{#d91a1a}-2.04\%$
test_update_nested 0.1841ms 28.8695μs 34.6387 KOps/s 35.2382 KOps/s $\color{#d91a1a}-1.70\%$
test_set_nested 0.1340ms 20.3465μs 49.1486 KOps/s 50.5157 KOps/s $\color{#d91a1a}-2.71\%$
test_set_nested_new 0.2011ms 24.9036μs 40.1548 KOps/s 42.1297 KOps/s $\color{#d91a1a}-4.69\%$
test_select 0.2098ms 49.0579μs 20.3841 KOps/s 20.4928 KOps/s $\color{#d91a1a}-0.53\%$
test_unbind_speed 0.6291ms 0.3361ms 2.9756 KOps/s 2.9002 KOps/s $\color{#35bf28}+2.60\%$
test_unbind_speed_stack0 78.9038ms 4.4025ms 227.1428 Ops/s 227.6393 Ops/s $\color{#d91a1a}-0.22\%$
test_unbind_speed_stack1 2.5963μs 0.6486μs 1.5417 MOps/s 1.4873 MOps/s $\color{#35bf28}+3.66\%$
test_split 70.5956ms 1.6798ms 595.3247 Ops/s 590.1273 Ops/s $\color{#35bf28}+0.88\%$
test_chunk 69.0545ms 1.6434ms 608.4763 Ops/s 606.8120 Ops/s $\color{#35bf28}+0.27\%$
test_creation[device0] 0.6034ms 0.2999ms 3.3349 KOps/s 3.3359 KOps/s $\color{#d91a1a}-0.03\%$
test_creation_from_tensor 4.0362ms 0.3412ms 2.9309 KOps/s 3.0125 KOps/s $\color{#d91a1a}-2.71\%$
test_add_one[memmap_tensor0] 0.4437ms 26.8323μs 37.2686 KOps/s 40.1310 KOps/s $\textbf{\color{#d91a1a}-7.13\%}$
test_contiguous[memmap_tensor0] 28.0720μs 5.9548μs 167.9315 KOps/s 173.8288 KOps/s $\color{#d91a1a}-3.39\%$
test_stack[memmap_tensor0] 55.2730μs 20.4794μs 48.8295 KOps/s 51.7539 KOps/s $\textbf{\color{#d91a1a}-5.65\%}$
test_memmaptd_index 0.3149ms 0.1995ms 5.0136 KOps/s 5.0154 KOps/s $\color{#d91a1a}-0.04\%$
test_memmaptd_index_astensor 0.6127ms 0.2583ms 3.8716 KOps/s 3.8430 KOps/s $\color{#35bf28}+0.75\%$
test_memmaptd_index_op 0.8338ms 0.5508ms 1.8154 KOps/s 1.8966 KOps/s $\color{#d91a1a}-4.28\%$
test_serialize_model 0.1773s 0.1122s 8.9142 Ops/s 9.0243 Ops/s $\color{#d91a1a}-1.22\%$
test_serialize_model_pickle 0.4468s 0.3743s 2.6717 Ops/s 2.5670 Ops/s $\color{#35bf28}+4.08\%$
test_serialize_weights 0.1669s 0.1058s 9.4511 Ops/s 9.1583 Ops/s $\color{#35bf28}+3.20\%$
test_serialize_weights_returnearly 0.1917s 0.1301s 7.6881 Ops/s 8.2327 Ops/s $\textbf{\color{#d91a1a}-6.61\%}$
test_serialize_weights_pickle 1.0429s 0.6191s 1.6153 Ops/s 2.4452 Ops/s $\textbf{\color{#d91a1a}-33.94\%}$
test_serialize_weights_filesystem 0.1588s 97.8462ms 10.2201 Ops/s 10.6509 Ops/s $\color{#d91a1a}-4.04\%$
test_serialize_model_filesystem 97.7528ms 92.7128ms 10.7860 Ops/s 9.6574 Ops/s $\textbf{\color{#35bf28}+11.69\%}$
test_reshape_pytree 72.4170μs 22.7098μs 44.0339 KOps/s 42.2356 KOps/s $\color{#35bf28}+4.26\%$
test_reshape_td 76.6320μs 29.7785μs 33.5812 KOps/s 33.4378 KOps/s $\color{#35bf28}+0.43\%$
test_view_pytree 81.0080μs 22.8504μs 43.7629 KOps/s 42.9622 KOps/s $\color{#35bf28}+1.86\%$
test_view_td 25.4770μs 4.8589μs 205.8079 KOps/s 199.7121 KOps/s $\color{#35bf28}+3.05\%$
test_unbind_pytree 63.8090μs 26.1668μs 38.2164 KOps/s 35.9609 KOps/s $\textbf{\color{#35bf28}+6.27\%}$
test_unbind_td 0.1260ms 53.8304μs 18.5769 KOps/s 17.3659 KOps/s $\textbf{\color{#35bf28}+6.97\%}$
test_split_pytree 82.7340μs 26.1600μs 38.2263 KOps/s 36.9322 KOps/s $\color{#35bf28}+3.50\%$
test_split_td 0.5785ms 42.3182μs 23.6305 KOps/s 22.8579 KOps/s $\color{#35bf28}+3.38\%$
test_add_pytree 75.2690μs 32.1385μs 31.1153 KOps/s 30.6513 KOps/s $\color{#35bf28}+1.51\%$
test_add_td 0.1004ms 49.3366μs 20.2689 KOps/s 21.4296 KOps/s $\textbf{\color{#d91a1a}-5.42\%}$
test_distributed 27.1600μs 5.9601μs 167.7814 KOps/s 163.4396 KOps/s $\color{#35bf28}+2.66\%$
test_tdmodule 0.3486ms 23.2029μs 43.0980 KOps/s 45.4925 KOps/s $\textbf{\color{#d91a1a}-5.26\%}$
test_tdmodule_dispatch 0.1875ms 43.0371μs 23.2358 KOps/s 24.4844 KOps/s $\textbf{\color{#d91a1a}-5.10\%}$
test_tdseq 0.1287ms 26.4491μs 37.8085 KOps/s 38.9412 KOps/s $\color{#d91a1a}-2.91\%$
test_tdseq_dispatch 0.1366ms 46.6884μs 21.4186 KOps/s 22.0511 KOps/s $\color{#d91a1a}-2.87\%$
test_instantiation_functorch 1.9472ms 1.2980ms 770.3986 Ops/s 764.9018 Ops/s $\color{#35bf28}+0.72\%$
test_instantiation_td 1.6311ms 1.0057ms 994.3012 Ops/s 991.2855 Ops/s $\color{#35bf28}+0.30\%$
test_exec_functorch 0.2887ms 0.1557ms 6.4235 KOps/s 6.2931 KOps/s $\color{#35bf28}+2.07\%$
test_exec_functional_call 0.3714ms 0.1464ms 6.8304 KOps/s 6.8423 KOps/s $\color{#d91a1a}-0.17\%$
test_exec_td 0.2803ms 0.1413ms 7.0749 KOps/s 6.9198 KOps/s $\color{#35bf28}+2.24\%$
test_exec_td_decorator 0.8991ms 0.1758ms 5.6872 KOps/s 5.6753 KOps/s $\color{#35bf28}+0.21\%$
test_vmap_mlp_speed[True-True] 1.4874ms 0.8804ms 1.1358 KOps/s 1.1069 KOps/s $\color{#35bf28}+2.61\%$
test_vmap_mlp_speed[True-False] 0.7210ms 0.4730ms 2.1141 KOps/s 2.1380 KOps/s $\color{#d91a1a}-1.12\%$
test_vmap_mlp_speed[False-True] 0.9439ms 0.7564ms 1.3220 KOps/s 1.2836 KOps/s $\color{#35bf28}+2.99\%$
test_vmap_mlp_speed[False-False] 0.6310ms 0.3819ms 2.6185 KOps/s 2.5822 KOps/s $\color{#35bf28}+1.41\%$
test_vmap_mlp_speed_decorator[True-True] 3.2742ms 1.7902ms 558.6117 Ops/s 566.5992 Ops/s $\color{#d91a1a}-1.41\%$
test_vmap_mlp_speed_decorator[True-False] 1.0537ms 0.5294ms 1.8890 KOps/s 1.8835 KOps/s $\color{#35bf28}+0.29\%$
test_vmap_mlp_speed_decorator[False-True] 2.2263ms 1.5150ms 660.0778 Ops/s 674.4944 Ops/s $\color{#d91a1a}-2.14\%$
test_vmap_mlp_speed_decorator[False-False] 83.6564ms 0.4405ms 2.2699 KOps/s 2.5156 KOps/s $\textbf{\color{#d91a1a}-9.77\%}$

Copy link

github-actions bot commented Jan 5, 2024

Result of GPU Benchmark Tests

Expand to view detailed results
Name Max Mean Ops
test_plain_set_nested 29.4600μs 13.2348μs 75.5584 KOps/s
test_plain_set_stack_nested 0.1332ms 0.1171ms 8.5420 KOps/s
test_plain_set_nested_inplace 30.1600μs 14.4820μs 69.0510 KOps/s
test_plain_set_stack_nested_inplace 0.1942ms 0.1444ms 6.9228 KOps/s
test_items 21.8100μs 4.7150μs 212.0898 KOps/s
test_items_nested 0.3710ms 0.3384ms 2.9548 KOps/s
test_items_nested_locked 0.4510ms 0.3418ms 2.9261 KOps/s
test_items_nested_leaf 0.9678ms 0.2022ms 4.9467 KOps/s
test_items_stack_nested 1.3730ms 1.3200ms 757.5513 Ops/s
test_items_stack_nested_leaf 1.2506ms 1.1511ms 868.7250 Ops/s
test_items_stack_nested_locked 0.8713ms 0.8212ms 1.2178 KOps/s
test_keys 19.2800μs 4.6431μs 215.3752 KOps/s
test_keys_nested 1.9731ms 95.2435μs 10.4994 KOps/s
test_keys_nested_locked 0.1194ms 94.3764μs 10.5959 KOps/s
test_keys_nested_leaf 0.1936ms 78.6101μs 12.7210 KOps/s
test_keys_stack_nested 1.2274ms 1.1586ms 863.1235 Ops/s
test_keys_stack_nested_leaf 1.3530ms 1.1403ms 876.9580 Ops/s
test_keys_stack_nested_locked 0.7183ms 0.6448ms 1.5508 KOps/s
test_values 9.1133μs 1.8985μs 526.7364 KOps/s
test_values_nested 68.0310μs 45.5360μs 21.9607 KOps/s
test_values_nested_locked 63.0610μs 47.4557μs 21.0723 KOps/s
test_values_nested_leaf 62.7200μs 39.6052μs 25.2492 KOps/s
test_values_stack_nested 1.0121ms 0.9700ms 1.0310 KOps/s
test_values_stack_nested_leaf 1.0169ms 0.9572ms 1.0447 KOps/s
test_values_stack_nested_locked 0.5649ms 0.5049ms 1.9806 KOps/s
test_membership 14.9700μs 1.0748μs 930.3921 KOps/s
test_membership_nested 15.8500μs 2.2949μs 435.7410 KOps/s
test_membership_nested_leaf 12.4155μs 2.2023μs 454.0652 KOps/s
test_membership_stacked_nested 29.8500μs 11.0251μs 90.7025 KOps/s
test_membership_stacked_nested_leaf 31.8010μs 11.0480μs 90.5137 KOps/s
test_membership_nested_last 27.3200μs 4.7423μs 210.8684 KOps/s
test_membership_nested_leaf_last 20.4500μs 4.7445μs 210.7684 KOps/s
test_membership_stacked_nested_last 0.1629ms 0.1369ms 7.3035 KOps/s
test_membership_stacked_nested_leaf_last 34.6100μs 12.8964μs 77.5410 KOps/s
test_nested_getleaf 23.7300μs 8.5204μs 117.3653 KOps/s
test_nested_get 30.0900μs 8.0228μs 124.6443 KOps/s
test_stacked_getleaf 0.4315ms 0.3907ms 2.5595 KOps/s
test_stacked_get 0.4564ms 0.3624ms 2.7597 KOps/s
test_nested_getitemleaf 25.4100μs 8.5705μs 116.6788 KOps/s
test_nested_getitem 22.2600μs 8.0898μs 123.6122 KOps/s
test_stacked_getitemleaf 0.4480ms 0.3929ms 2.5455 KOps/s
test_stacked_getitem 0.3984ms 0.3622ms 2.7611 KOps/s
test_lock_nested 8.1174ms 0.4266ms 2.3444 KOps/s
test_lock_stack_nested 0.1075s 6.8342ms 146.3225 Ops/s
test_unlock_nested 0.8115ms 0.4142ms 2.4140 KOps/s
test_unlock_stack_nested 0.1012s 7.1586ms 139.6927 Ops/s
test_flatten_speed 82.3558ms 0.2886ms 3.4648 KOps/s
test_unflatten_speed 0.3748ms 0.3575ms 2.7971 KOps/s
test_common_ops 1.0671ms 0.5970ms 1.6749 KOps/s
test_creation 32.9310μs 1.5935μs 627.5298 KOps/s
test_creation_empty 21.3800μs 7.3310μs 136.4072 KOps/s
test_creation_nested_1 28.6900μs 9.2185μs 108.4779 KOps/s
test_creation_nested_2 33.7410μs 13.9271μs 71.8023 KOps/s
test_clone 0.1358ms 13.2226μs 75.6279 KOps/s
test_getitem[int] 38.4510μs 11.0003μs 90.9067 KOps/s
test_getitem[slice_int] 43.4210μs 21.6433μs 46.2036 KOps/s
test_getitem[range] 68.2810μs 38.8481μs 25.7413 KOps/s
test_getitem[tuple] 45.6810μs 18.8492μs 53.0527 KOps/s
test_getitem[list] 0.3602ms 34.9496μs 28.6127 KOps/s
test_setitem_dim[int] 40.5700μs 25.3027μs 39.5215 KOps/s
test_setitem_dim[slice_int] 82.2610μs 46.4252μs 21.5400 KOps/s
test_setitem_dim[range] 87.6810μs 63.2165μs 15.8187 KOps/s
test_setitem_dim[tuple] 67.2610μs 41.1281μs 24.3143 KOps/s
test_setitem 0.1360ms 17.3633μs 57.5929 KOps/s
test_set 0.1318ms 16.8902μs 59.2059 KOps/s
test_set_shared 2.7020ms 0.1047ms 9.5524 KOps/s
test_update 0.1316ms 18.8236μs 53.1249 KOps/s
test_update_nested 0.1183ms 24.8721μs 40.2057 KOps/s
test_set_nested 0.1072ms 17.7402μs 56.3692 KOps/s
test_set_nested_new 0.1074ms 21.8247μs 45.8196 KOps/s
test_select 78.4610μs 42.9427μs 23.2868 KOps/s
test_to 75.1310μs 54.4268μs 18.3733 KOps/s
test_to_nonblocking 65.2010μs 34.3985μs 29.0710 KOps/s
test_unbind_speed 0.3655ms 0.3336ms 2.9974 KOps/s
test_unbind_speed_stack0 81.8353ms 3.7952ms 263.4873 Ops/s
test_unbind_speed_stack1 3.1171μs 0.5423μs 1.8439 MOps/s
test_split 78.7775ms 1.7436ms 573.5385 Ops/s
test_chunk 1.6532ms 1.5692ms 637.2533 Ops/s
test_creation[device0] 0.3761ms 0.3080ms 3.2463 KOps/s
test_creation[device1] 0.4101ms 0.3127ms 3.1977 KOps/s
test_creation_from_tensor 0.6274ms 0.3377ms 2.9616 KOps/s
test_add_one[memmap_tensor0] 75.4410μs 24.9110μs 40.1430 KOps/s
test_add_one[memmap_tensor1] 0.2276ms 70.8505μs 14.1142 KOps/s
test_contiguous[memmap_tensor0] 25.2200μs 5.9025μs 169.4193 KOps/s
test_contiguous[memmap_tensor1] 56.1510μs 21.9970μs 45.4608 KOps/s
test_stack[memmap_tensor0] 43.4500μs 19.4589μs 51.3905 KOps/s
test_stack[memmap_tensor1] 0.1081ms 71.8150μs 13.9247 KOps/s
test_memmaptd_index 0.2782ms 0.2320ms 4.3105 KOps/s
test_memmaptd_index_astensor 0.3494ms 0.2879ms 3.4732 KOps/s
test_memmaptd_index_op 0.6529ms 0.5786ms 1.7284 KOps/s
test_serialize_model 0.1791s 0.1011s 9.8920 Ops/s
test_serialize_model_pickle 1.3488s 1.2364s 0.8088 Ops/s
test_serialize_weights 0.1758s 97.2846ms 10.2791 Ops/s
test_serialize_weights_returnearly 0.2889s 69.0289ms 14.4867 Ops/s
test_serialize_weights_pickle 1.3544s 1.2369s 0.8085 Ops/s
test_reshape_pytree 0.1645ms 24.3846μs 41.0094 KOps/s
test_reshape_td 0.1595ms 29.4939μs 33.9053 KOps/s
test_view_pytree 51.5910μs 23.5197μs 42.5176 KOps/s
test_view_td 19.4100μs 4.1489μs 241.0286 KOps/s
test_unbind_pytree 49.2300μs 28.9653μs 34.5241 KOps/s
test_unbind_td 68.5110μs 53.0792μs 18.8398 KOps/s
test_split_pytree 50.3500μs 28.2648μs 35.3797 KOps/s
test_split_td 73.9010μs 40.5842μs 24.6401 KOps/s
test_add_pytree 84.1810μs 36.1373μs 27.6722 KOps/s
test_add_td 70.3510μs 45.9516μs 21.7620 KOps/s
test_distributed 19.5210μs 5.6102μs 178.2460 KOps/s
test_tdmodule 84.5703ms 0.1017ms 9.8326 KOps/s
test_tdmodule_dispatch 0.1971ms 33.2206μs 30.1018 KOps/s
test_tdseq 35.9610μs 20.3274μs 49.1946 KOps/s
test_tdseq_dispatch 72.4310μs 36.2109μs 27.6160 KOps/s
test_instantiation_functorch 1.8969ms 1.6572ms 603.4440 Ops/s
test_instantiation_td 1.6981ms 1.1634ms 859.5383 Ops/s
test_exec_functorch 0.2124ms 0.1593ms 6.2785 KOps/s
test_exec_functional_call 0.2048ms 0.1592ms 6.2816 KOps/s
test_exec_td 0.1897ms 0.1507ms 6.6351 KOps/s
test_exec_td_decorator 0.9679ms 0.1868ms 5.3543 KOps/s
test_vmap_mlp_speed[True-True] 1.2368ms 1.1187ms 893.8669 Ops/s
test_vmap_mlp_speed[True-False] 0.7043ms 0.6626ms 1.5093 KOps/s
test_vmap_mlp_speed[False-True] 1.0800ms 1.0253ms 975.3337 Ops/s
test_vmap_mlp_speed[False-False] 0.7483ms 0.5950ms 1.6808 KOps/s
test_vmap_mlp_speed_decorator[True-True] 2.8128ms 2.0754ms 481.8294 Ops/s
test_vmap_mlp_speed_decorator[True-False] 1.0738ms 0.7091ms 1.4102 KOps/s
test_vmap_mlp_speed_decorator[False-True] 2.2992ms 1.8408ms 543.2389 Ops/s
test_vmap_mlp_speed_decorator[False-False] 0.9197ms 0.6142ms 1.6283 KOps/s
test_vmap_transformer_speed[True-True] 12.5950ms 12.4154ms 80.5450 Ops/s
test_vmap_transformer_speed[True-False] 8.4832ms 8.2384ms 121.3829 Ops/s
test_vmap_transformer_speed[False-True] 12.7062ms 12.3182ms 81.1804 Ops/s
test_vmap_transformer_speed[False-False] 8.4051ms 8.1445ms 122.7824 Ops/s
test_vmap_transformer_speed_decorator[True-True] 64.8753ms 63.7497ms 15.6863 Ops/s
test_vmap_transformer_speed_decorator[True-False] 21.4107ms 19.8152ms 50.4662 Ops/s
test_vmap_transformer_speed_decorator[False-True] 59.0935ms 58.0979ms 17.2123 Ops/s
test_vmap_transformer_speed_decorator[False-False] 21.0560ms 19.3791ms 51.6019 Ops/s

@vmoens vmoens merged commit 3d15f49 into main Jan 5, 2024
46 of 47 checks passed
@vmoens vmoens deleted the fix-gpu-bench branch January 5, 2024 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmarks bug Something isn't working CI CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants