Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Update nn inline_inbuilt check #1029

Merged
merged 2 commits into from
Oct 8, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Oct 4, 2024

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Oct 4, 2024
ghstack-source-id: 86c8a6dacd50387f76fd0a5b9ec9fd643b6d057f
Pull Request resolved: #1029
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 4, 2024
Copy link

github-actions bot commented Oct 4, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 216. Improved: $\large\color{#35bf28}36$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 64.0890μs 23.9661μs 41.7256 KOps/s 40.3761 KOps/s $\color{#35bf28}+3.34\%$
test_plain_set_stack_nested 73.7470μs 24.6015μs 40.6480 KOps/s 40.8406 KOps/s $\color{#d91a1a}-0.47\%$
test_plain_set_nested_inplace 77.2440μs 26.4248μs 37.8432 KOps/s 37.3285 KOps/s $\color{#35bf28}+1.38\%$
test_plain_set_stack_nested_inplace 81.2710μs 26.4190μs 37.8516 KOps/s 37.2078 KOps/s $\color{#35bf28}+1.73\%$
test_items 0.1354ms 4.2297μs 236.4251 KOps/s 238.7453 KOps/s $\color{#d91a1a}-0.97\%$
test_items_nested 0.5751ms 0.3822ms 2.6162 KOps/s 2.6025 KOps/s $\color{#35bf28}+0.53\%$
test_items_nested_locked 0.5694ms 0.3790ms 2.6383 KOps/s 2.5933 KOps/s $\color{#35bf28}+1.74\%$
test_items_nested_leaf 0.1598ms 79.3146μs 12.6080 KOps/s 12.5147 KOps/s $\color{#35bf28}+0.75\%$
test_items_stack_nested 0.5390ms 0.3851ms 2.5964 KOps/s 2.5693 KOps/s $\color{#35bf28}+1.06\%$
test_items_stack_nested_leaf 0.1761ms 82.4420μs 12.1297 KOps/s 12.2046 KOps/s $\color{#d91a1a}-0.61\%$
test_items_stack_nested_locked 0.5896ms 0.3824ms 2.6150 KOps/s 2.5636 KOps/s $\color{#35bf28}+2.01\%$
test_keys 20.5390μs 3.4953μs 286.1013 KOps/s 287.6261 KOps/s $\color{#d91a1a}-0.53\%$
test_keys_nested 0.2521ms 0.1342ms 7.4533 KOps/s 7.3909 KOps/s $\color{#35bf28}+0.84\%$
test_keys_nested_locked 1.6490ms 0.1397ms 7.1603 KOps/s 7.1198 KOps/s $\color{#35bf28}+0.57\%$
test_keys_nested_leaf 0.2389ms 0.1172ms 8.5293 KOps/s 8.4294 KOps/s $\color{#35bf28}+1.18\%$
test_keys_stack_nested 0.2369ms 0.1339ms 7.4660 KOps/s 7.1847 KOps/s $\color{#35bf28}+3.92\%$
test_keys_stack_nested_leaf 0.2041ms 0.1166ms 8.5730 KOps/s 8.4934 KOps/s $\color{#35bf28}+0.94\%$
test_keys_stack_nested_locked 0.2102ms 0.1392ms 7.1827 KOps/s 7.1148 KOps/s $\color{#35bf28}+0.95\%$
test_values 7.0372μs 1.0538μs 948.9887 KOps/s 956.7748 KOps/s $\color{#d91a1a}-0.81\%$
test_values_nested 0.1607ms 91.4858μs 10.9307 KOps/s 10.4355 KOps/s $\color{#35bf28}+4.74\%$
test_values_nested_locked 0.1697ms 92.3886μs 10.8238 KOps/s 10.4335 KOps/s $\color{#35bf28}+3.74\%$
test_values_nested_leaf 0.1576ms 78.9885μs 12.6601 KOps/s 12.4123 KOps/s $\color{#35bf28}+2.00\%$
test_values_stack_nested 0.1677ms 93.6781μs 10.6748 KOps/s 10.5921 KOps/s $\color{#35bf28}+0.78\%$
test_values_stack_nested_leaf 0.1484ms 78.7848μs 12.6928 KOps/s 12.3894 KOps/s $\color{#35bf28}+2.45\%$
test_values_stack_nested_locked 0.1739ms 94.1023μs 10.6267 KOps/s 10.4923 KOps/s $\color{#35bf28}+1.28\%$
test_membership 5.8080μs 0.7497μs 1.3339 MOps/s 1.3105 MOps/s $\color{#35bf28}+1.79\%$
test_membership_nested 36.9790μs 2.7681μs 361.2581 KOps/s 357.1578 KOps/s $\color{#35bf28}+1.15\%$
test_membership_nested_leaf 19.9670μs 2.8113μs 355.7023 KOps/s 354.8912 KOps/s $\color{#35bf28}+0.23\%$
test_membership_stacked_nested 42.1080μs 2.7424μs 364.6423 KOps/s 366.8093 KOps/s $\color{#d91a1a}-0.59\%$
test_membership_stacked_nested_leaf 17.3120μs 2.8084μs 356.0708 KOps/s 358.8642 KOps/s $\color{#d91a1a}-0.78\%$
test_membership_nested_last 28.1830μs 4.1773μs 239.3869 KOps/s 237.5539 KOps/s $\color{#35bf28}+0.77\%$
test_membership_nested_leaf_last 48.8330μs 4.1660μs 240.0393 KOps/s 237.4130 KOps/s $\color{#35bf28}+1.11\%$
test_membership_stacked_nested_last 28.0020μs 4.1825μs 239.0910 KOps/s 185.6407 KOps/s $\textbf{\color{#35bf28}+28.79\%}$
test_membership_stacked_nested_leaf_last 22.2620μs 4.1400μs 241.5430 KOps/s 184.0590 KOps/s $\textbf{\color{#35bf28}+31.23\%}$
test_nested_getleaf 33.2120μs 10.7899μs 92.6794 KOps/s 90.8161 KOps/s $\color{#35bf28}+2.05\%$
test_nested_get 35.2260μs 10.2591μs 97.4746 KOps/s 96.2255 KOps/s $\color{#35bf28}+1.30\%$
test_stacked_getleaf 34.5350μs 10.7765μs 92.7943 KOps/s 92.7374 KOps/s $\color{#35bf28}+0.06\%$
test_stacked_get 32.7110μs 10.3686μs 96.4453 KOps/s 97.6266 KOps/s $\color{#d91a1a}-1.21\%$
test_nested_getitemleaf 36.6180μs 11.1818μs 89.4310 KOps/s 88.1563 KOps/s $\color{#35bf28}+1.45\%$
test_nested_getitem 32.6510μs 10.5160μs 95.0934 KOps/s 96.3982 KOps/s $\color{#d91a1a}-1.35\%$
test_stacked_getitemleaf 47.1070μs 11.0728μs 90.3113 KOps/s 88.5544 KOps/s $\color{#35bf28}+1.98\%$
test_stacked_getitem 28.2420μs 10.5025μs 95.2150 KOps/s 94.5041 KOps/s $\color{#35bf28}+0.75\%$
test_lock_nested 84.7239ms 0.5904ms 1.6937 KOps/s 1.9593 KOps/s $\textbf{\color{#d91a1a}-13.56\%}$
test_lock_stack_nested 0.7325ms 0.4687ms 2.1337 KOps/s 2.0730 KOps/s $\color{#35bf28}+2.93\%$
test_unlock_nested 86.6992ms 0.5091ms 1.9644 KOps/s 2.3078 KOps/s $\textbf{\color{#d91a1a}-14.88\%}$
test_unlock_stack_nested 0.5943ms 0.3842ms 2.6029 KOps/s 2.4853 KOps/s $\color{#35bf28}+4.74\%$
test_flatten_speed 0.2195ms 0.1007ms 9.9281 KOps/s 10.0569 KOps/s $\color{#d91a1a}-1.28\%$
test_unflatten_speed 2.0164ms 0.5277ms 1.8951 KOps/s 1.9564 KOps/s $\color{#d91a1a}-3.13\%$
test_common_ops 2.1111ms 1.1379ms 878.8005 Ops/s 846.8160 Ops/s $\color{#35bf28}+3.78\%$
test_creation 36.4280μs 2.0915μs 478.1262 KOps/s 478.1387 KOps/s $-0.00\%$
test_creation_empty 43.3610μs 18.6163μs 53.7165 KOps/s 50.0477 KOps/s $\textbf{\color{#35bf28}+7.33\%}$
test_creation_nested_1 54.3910μs 21.6710μs 46.1447 KOps/s 41.3428 KOps/s $\textbf{\color{#35bf28}+11.61\%}$
test_creation_nested_2 69.6400μs 25.6862μs 38.9313 KOps/s 36.3043 KOps/s $\textbf{\color{#35bf28}+7.24\%}$
test_clone 66.9250μs 16.7563μs 59.6790 KOps/s 57.7686 KOps/s $\color{#35bf28}+3.31\%$
test_getitem[int] 0.9382ms 16.5254μs 60.5131 KOps/s 58.3927 KOps/s $\color{#35bf28}+3.63\%$
test_getitem[slice_int] 0.1475ms 31.5774μs 31.6682 KOps/s 31.7434 KOps/s $\color{#d91a1a}-0.24\%$
test_getitem[range] 0.2197ms 56.5663μs 17.6784 KOps/s 17.1964 KOps/s $\color{#35bf28}+2.80\%$
test_getitem[tuple] 0.1304ms 24.7855μs 40.3462 KOps/s 37.8213 KOps/s $\textbf{\color{#35bf28}+6.68\%}$
test_getitem[list] 0.2013ms 51.9310μs 19.2563 KOps/s 18.8820 KOps/s $\color{#35bf28}+1.98\%$
test_setitem_dim[int] 63.4390μs 32.9364μs 30.3616 KOps/s 30.5474 KOps/s $\color{#d91a1a}-0.61\%$
test_setitem_dim[slice_int] 0.1053ms 61.7111μs 16.2046 KOps/s 16.0739 KOps/s $\color{#35bf28}+0.81\%$
test_setitem_dim[range] 0.1370ms 85.2504μs 11.7301 KOps/s 11.5804 KOps/s $\color{#35bf28}+1.29\%$
test_setitem_dim[tuple] 80.2400μs 49.2941μs 20.2864 KOps/s 20.0254 KOps/s $\color{#35bf28}+1.30\%$
test_setitem 0.1042ms 29.4194μs 33.9912 KOps/s 32.4766 KOps/s $\color{#35bf28}+4.66\%$
test_set 0.3065ms 29.0613μs 34.4100 KOps/s 32.5231 KOps/s $\textbf{\color{#35bf28}+5.80\%}$
test_set_shared 3.2905ms 0.2184ms 4.5791 KOps/s 4.4839 KOps/s $\color{#35bf28}+2.12\%$
test_update 0.8359ms 37.0764μs 26.9714 KOps/s 25.8934 KOps/s $\color{#35bf28}+4.16\%$
test_update_nested 0.1293ms 48.4159μs 20.6544 KOps/s 20.1594 KOps/s $\color{#35bf28}+2.46\%$
test_update__nested 0.1330ms 43.9742μs 22.7406 KOps/s 21.6187 KOps/s $\textbf{\color{#35bf28}+5.19\%}$
test_set_nested 85.6000μs 31.4443μs 31.8023 KOps/s 30.1097 KOps/s $\textbf{\color{#35bf28}+5.62\%}$
test_set_nested_new 0.1109ms 36.8005μs 27.1735 KOps/s 25.7628 KOps/s $\textbf{\color{#35bf28}+5.48\%}$
test_select 0.2092ms 54.0201μs 18.5116 KOps/s 17.7929 KOps/s $\color{#35bf28}+4.04\%$
test_select_nested 0.1418ms 60.6544μs 16.4868 KOps/s 16.7796 KOps/s $\color{#d91a1a}-1.74\%$
test_exclude_nested 0.1455ms 75.1523μs 13.3063 KOps/s 13.5030 KOps/s $\color{#d91a1a}-1.46\%$
test_empty[True] 0.7295ms 0.3499ms 2.8577 KOps/s 2.8488 KOps/s $\color{#35bf28}+0.31\%$
test_empty[False] 9.8110μs 1.2329μs 811.1221 KOps/s 824.3601 KOps/s $\color{#d91a1a}-1.61\%$
test_unbind_speed 1.4136ms 0.3379ms 2.9594 KOps/s 3.2658 KOps/s $\textbf{\color{#d91a1a}-9.38\%}$
test_unbind_speed_stack0 0.5989ms 0.2937ms 3.4045 KOps/s 3.3223 KOps/s $\color{#35bf28}+2.47\%$
test_unbind_speed_stack1 0.1104s 0.8363ms 1.1958 KOps/s 1.3301 KOps/s $\textbf{\color{#d91a1a}-10.10\%}$
test_split 3.2342ms 2.0093ms 497.6841 Ops/s 449.9728 Ops/s $\textbf{\color{#35bf28}+10.60\%}$
test_chunk 89.8989ms 2.1997ms 454.6139 Ops/s 446.4661 Ops/s $\color{#35bf28}+1.82\%$
test_creation[device0] 0.2629ms 0.1174ms 8.5212 KOps/s 8.4460 KOps/s $\color{#35bf28}+0.89\%$
test_creation_from_tensor 3.8417ms 0.1177ms 8.4984 KOps/s 8.4137 KOps/s $\color{#35bf28}+1.01\%$
test_add_one[memmap_tensor0] 0.1582ms 7.4960μs 133.4041 KOps/s 132.8800 KOps/s $\color{#35bf28}+0.39\%$
test_contiguous[memmap_tensor0] 30.3460μs 1.9300μs 518.1456 KOps/s 465.3993 KOps/s $\textbf{\color{#35bf28}+11.33\%}$
test_stack[memmap_tensor0] 35.0760μs 5.6633μs 176.5753 KOps/s 150.0957 KOps/s $\textbf{\color{#35bf28}+17.64\%}$
test_memmaptd_index 1.6005ms 0.4138ms 2.4167 KOps/s 2.3914 KOps/s $\color{#35bf28}+1.06\%$
test_memmaptd_index_astensor 0.9947ms 0.5073ms 1.9712 KOps/s 1.9239 KOps/s $\color{#35bf28}+2.46\%$
test_memmaptd_index_op 90.3606ms 1.1525ms 867.6830 Ops/s 889.1527 Ops/s $\color{#d91a1a}-2.41\%$
test_serialize_model 0.1218s 0.1166s 8.5735 Ops/s 8.3267 Ops/s $\color{#35bf28}+2.96\%$
test_serialize_model_pickle 0.4489s 0.3887s 2.5724 Ops/s 2.5057 Ops/s $\color{#35bf28}+2.66\%$
test_serialize_weights 0.1254s 0.1161s 8.6108 Ops/s 8.3722 Ops/s $\color{#35bf28}+2.85\%$
test_serialize_weights_returnearly 0.2459s 0.1725s 5.7959 Ops/s 5.4556 Ops/s $\textbf{\color{#35bf28}+6.24\%}$
test_serialize_weights_pickle 0.5484s 0.4450s 2.2470 Ops/s 2.5153 Ops/s $\textbf{\color{#d91a1a}-10.67\%}$
test_serialize_weights_filesystem 0.1492s 0.1433s 6.9773 Ops/s 6.8180 Ops/s $\color{#35bf28}+2.34\%$
test_serialize_model_filesystem 0.1650s 0.1459s 6.8560 Ops/s 6.4744 Ops/s $\textbf{\color{#35bf28}+5.89\%}$
test_reshape_pytree 0.1029ms 38.0168μs 26.3042 KOps/s 24.9960 KOps/s $\textbf{\color{#35bf28}+5.23\%}$
test_reshape_td 0.1006ms 44.8181μs 22.3124 KOps/s 21.0375 KOps/s $\textbf{\color{#35bf28}+6.06\%}$
test_view_pytree 84.2860μs 37.9034μs 26.3828 KOps/s 25.1871 KOps/s $\color{#35bf28}+4.75\%$
test_view_td 0.1210ms 50.2218μs 19.9117 KOps/s 19.1661 KOps/s $\color{#35bf28}+3.89\%$
test_unbind_pytree 72.0140μs 35.6204μs 28.0738 KOps/s 26.6773 KOps/s $\textbf{\color{#35bf28}+5.24\%}$
test_unbind_td 0.3047ms 45.7753μs 21.8458 KOps/s 21.5765 KOps/s $\color{#35bf28}+1.25\%$
test_split_pytree 76.4520μs 37.6788μs 26.5401 KOps/s 25.2278 KOps/s $\textbf{\color{#35bf28}+5.20\%}$
test_split_td 0.4320ms 58.7542μs 17.0201 KOps/s 17.0215 KOps/s $-0.01\%$
test_add_pytree 0.1597ms 44.6452μs 22.3988 KOps/s 20.9342 KOps/s $\textbf{\color{#35bf28}+7.00\%}$
test_add_td 0.1690ms 85.1300μs 11.7467 KOps/s 10.9216 KOps/s $\textbf{\color{#35bf28}+7.56\%}$
test_compile_add_one_nested[tensordict-compile] 0.1122ms 57.6173μs 17.3559 KOps/s 16.7746 KOps/s $\color{#35bf28}+3.47\%$
test_compile_add_one_nested[tensordict-eager] 0.3398ms 0.1976ms 5.0619 KOps/s 5.0930 KOps/s $\color{#d91a1a}-0.61\%$
test_compile_add_one_nested[pytree-compile] 0.1305ms 55.5317μs 18.0077 KOps/s 17.0056 KOps/s $\textbf{\color{#35bf28}+5.89\%}$
test_compile_add_one_nested[pytree-eager] 0.2897ms 0.1386ms 7.2132 KOps/s 6.8934 KOps/s $\color{#35bf28}+4.64\%$
test_compile_copy_nested[tensordict-compile] 58.8400μs 23.5511μs 42.4608 KOps/s 40.8461 KOps/s $\color{#35bf28}+3.95\%$
test_compile_copy_nested[tensordict-eager] 0.1508ms 74.6404μs 13.3976 KOps/s 13.3393 KOps/s $\color{#35bf28}+0.44\%$
test_compile_copy_nested[pytree-compile] 0.1478ms 75.7931μs 13.1938 KOps/s 13.0894 KOps/s $\color{#35bf28}+0.80\%$
test_compile_copy_nested[pytree-eager] 0.1345ms 69.5641μs 14.3752 KOps/s 14.3488 KOps/s $\color{#35bf28}+0.18\%$
test_compile_add_one_flat[tensordict-compile] 0.3489ms 0.1800ms 5.5560 KOps/s 5.3803 KOps/s $\color{#35bf28}+3.27\%$
test_compile_add_one_flat[tensordict-eager] 0.4300ms 0.2376ms 4.2080 KOps/s 4.1440 KOps/s $\color{#35bf28}+1.54\%$
test_compile_add_one_flat[tensorclass-compile] 0.1233ms 48.6206μs 20.5674 KOps/s 20.7572 KOps/s $\color{#d91a1a}-0.91\%$
test_compile_add_one_flat[tensorclass-eager] 0.4347ms 78.1734μs 12.7921 KOps/s 12.5643 KOps/s $\color{#35bf28}+1.81\%$
test_compile_add_one_flat[pytree-compile] 0.3849ms 0.1738ms 5.7549 KOps/s 5.6772 KOps/s $\color{#35bf28}+1.37\%$
test_compile_add_one_flat[pytree-eager] 0.5749ms 0.2892ms 3.4575 KOps/s 3.3694 KOps/s $\color{#35bf28}+2.61\%$
test_compile_add_self_flat[tensordict-eager] 0.4253ms 0.2728ms 3.6651 KOps/s 3.6260 KOps/s $\color{#35bf28}+1.08\%$
test_compile_add_self_flat[tensordict-compile] 0.4071ms 0.1850ms 5.4062 KOps/s 5.4962 KOps/s $\color{#d91a1a}-1.64\%$
test_compile_add_self_flat[tensorclass-eager] 0.1747ms 74.1267μs 13.4904 KOps/s 13.4113 KOps/s $\color{#35bf28}+0.59\%$
test_compile_add_self_flat[tensorclass-compile] 0.1618ms 48.6236μs 20.5662 KOps/s 20.1605 KOps/s $\color{#35bf28}+2.01\%$
test_compile_add_self_flat[pytree-eager] 0.4216ms 0.2316ms 4.3177 KOps/s 4.1964 KOps/s $\color{#35bf28}+2.89\%$
test_compile_add_self_flat[pytree-compile] 0.3186ms 0.1733ms 5.7692 KOps/s 5.5874 KOps/s $\color{#35bf28}+3.25\%$
test_compile_copy_flat[tensordict-compile] 0.1999ms 0.1093ms 9.1529 KOps/s 8.9306 KOps/s $\color{#35bf28}+2.49\%$
test_compile_copy_flat[tensordict-eager] 0.1357ms 78.7564μs 12.6974 KOps/s 12.6309 KOps/s $\color{#35bf28}+0.53\%$
test_compile_copy_flat[pytree-compile] 0.1999ms 77.3830μs 12.9227 KOps/s 12.3939 KOps/s $\color{#35bf28}+4.27\%$
test_compile_copy_flat[pytree-eager] 0.1303ms 69.2095μs 14.4489 KOps/s 13.9759 KOps/s $\color{#35bf28}+3.38\%$
test_compile_assign_and_add[tensordict-compile] 0.2735ms 0.1906ms 5.2469 KOps/s 5.0593 KOps/s $\color{#35bf28}+3.71\%$
test_compile_assign_and_add[tensordict-eager] 1.9582ms 1.7168ms 582.4645 Ops/s 553.2107 Ops/s $\textbf{\color{#35bf28}+5.29\%}$
test_compile_assign_and_add[pytree-compile] 0.2647ms 0.1868ms 5.3526 KOps/s 5.1567 KOps/s $\color{#35bf28}+3.80\%$
test_compile_assign_and_add[pytree-eager] 1.1949ms 1.0837ms 922.7596 Ops/s 882.5285 Ops/s $\color{#35bf28}+4.56\%$
test_compile_assign_and_add_stack[compile] 0.7385ms 0.4175ms 2.3950 KOps/s 2.3853 KOps/s $\color{#35bf28}+0.41\%$
test_compile_assign_and_add_stack[eager] 4.4990ms 3.9828ms 251.0822 Ops/s 233.9889 Ops/s $\textbf{\color{#35bf28}+7.31\%}$
test_compile_indexing[tensor-tensordict-compile] 91.4110μs 34.3573μs 29.1059 KOps/s 27.5405 KOps/s $\textbf{\color{#35bf28}+5.68\%}$
test_compile_indexing[tensor-tensordict-eager] 0.8402ms 48.8481μs 20.4716 KOps/s 20.3399 KOps/s $\color{#35bf28}+0.65\%$
test_compile_indexing[tensor-tensorclass-compile] 72.2540μs 29.4927μs 33.9067 KOps/s 32.9239 KOps/s $\color{#35bf28}+2.98\%$
test_compile_indexing[tensor-tensorclass-eager] 91.7810μs 28.3711μs 35.2471 KOps/s 34.1182 KOps/s $\color{#35bf28}+3.31\%$
test_compile_indexing[tensor-pytree-compile] 97.7220μs 29.3376μs 34.0860 KOps/s 32.2247 KOps/s $\textbf{\color{#35bf28}+5.78\%}$
test_compile_indexing[tensor-pytree-eager] 95.9590μs 29.3369μs 34.0867 KOps/s 34.2779 KOps/s $\color{#d91a1a}-0.56\%$
test_compile_indexing[slice-tensordict-compile] 0.1314ms 73.0659μs 13.6863 KOps/s 13.3190 KOps/s $\color{#35bf28}+2.76\%$
test_compile_indexing[slice-tensordict-eager] 0.5662ms 28.1992μs 35.4620 KOps/s 35.7222 KOps/s $\color{#d91a1a}-0.73\%$
test_compile_indexing[slice-tensorclass-compile] 0.1407ms 66.9021μs 14.9472 KOps/s 14.1491 KOps/s $\textbf{\color{#35bf28}+5.64\%}$
test_compile_indexing[slice-tensorclass-eager] 0.2201ms 23.1681μs 43.1628 KOps/s 42.2880 KOps/s $\color{#35bf28}+2.07\%$
test_compile_indexing[slice-pytree-compile] 0.1881ms 67.2834μs 14.8625 KOps/s 14.3009 KOps/s $\color{#35bf28}+3.93\%$
test_compile_indexing[slice-pytree-eager] 0.1123ms 23.1584μs 43.1810 KOps/s 42.6605 KOps/s $\color{#35bf28}+1.22\%$
test_compile_indexing[int-tensordict-compile] 0.1485ms 72.6371μs 13.7671 KOps/s 13.2564 KOps/s $\color{#35bf28}+3.85\%$
test_compile_indexing[int-tensordict-eager] 0.7744ms 27.8788μs 35.8696 KOps/s 35.8777 KOps/s $\color{#d91a1a}-0.02\%$
test_compile_indexing[int-tensorclass-compile] 0.1360ms 66.6136μs 15.0119 KOps/s 14.4074 KOps/s $\color{#35bf28}+4.20\%$
test_compile_indexing[int-tensorclass-eager] 62.1050μs 23.0207μs 43.4391 KOps/s 42.6022 KOps/s $\color{#35bf28}+1.96\%$
test_compile_indexing[int-pytree-compile] 0.1323ms 66.6007μs 15.0149 KOps/s 14.3659 KOps/s $\color{#35bf28}+4.52\%$
test_compile_indexing[int-pytree-eager] 78.4160μs 22.9420μs 43.5882 KOps/s 42.4524 KOps/s $\color{#35bf28}+2.68\%$
test_mod_add[eager] 87.6930μs 25.2302μs 39.6350 KOps/s 37.7281 KOps/s $\textbf{\color{#35bf28}+5.05\%}$
test_mod_add[compile] 0.1067ms 36.9729μs 27.0468 KOps/s 25.8018 KOps/s $\color{#35bf28}+4.83\%$
test_mod_add[compile-overhead] 83.8960μs 37.8328μs 26.4321 KOps/s 24.4321 KOps/s $\textbf{\color{#35bf28}+8.19\%}$
test_mod_wrap[eager] 0.4428ms 0.2089ms 4.7864 KOps/s 4.6731 KOps/s $\color{#35bf28}+2.42\%$
test_mod_wrap[compile] 0.4514ms 0.2337ms 4.2785 KOps/s 4.2124 KOps/s $\color{#35bf28}+1.57\%$
test_mod_wrap[compile-overhead] 0.3788ms 0.2297ms 4.3529 KOps/s 4.2489 KOps/s $\color{#35bf28}+2.45\%$
test_mod_wrap_and_backward[eager] 12.4185ms 10.5751ms 94.5619 Ops/s 92.4932 Ops/s $\color{#35bf28}+2.24\%$
test_mod_wrap_and_backward[compile] 11.9743ms 10.7013ms 93.4463 Ops/s 87.1146 Ops/s $\textbf{\color{#35bf28}+7.27\%}$
test_mod_wrap_and_backward[compile-overhead] 12.2038ms 10.6834ms 93.6033 Ops/s 86.3893 Ops/s $\textbf{\color{#35bf28}+8.35\%}$
test_seq_add[eager] 0.1666ms 91.0391μs 10.9843 KOps/s 10.7831 KOps/s $\color{#35bf28}+1.87\%$
test_seq_add[compile] 0.1342ms 63.5729μs 15.7300 KOps/s 14.9877 KOps/s $\color{#35bf28}+4.95\%$
test_seq_add[compile-overhead] 0.1291ms 62.4616μs 16.0098 KOps/s 15.2277 KOps/s $\textbf{\color{#35bf28}+5.14\%}$
test_seq_wrap[eager] 0.6591ms 0.3824ms 2.6151 KOps/s 2.5640 KOps/s $\color{#35bf28}+2.00\%$
test_seq_wrap[compile] 0.5007ms 0.2681ms 3.7303 KOps/s 3.6496 KOps/s $\color{#35bf28}+2.21\%$
test_seq_wrap[compile-overhead] 0.5164ms 0.2665ms 3.7528 KOps/s 3.4635 KOps/s $\textbf{\color{#35bf28}+8.35\%}$
test_func_call_runtime[False-eager] 0.7732ms 0.5091ms 1.9642 KOps/s 1.8603 KOps/s $\textbf{\color{#35bf28}+5.59\%}$
test_func_call_runtime[False-compile] 0.7447ms 0.4972ms 2.0112 KOps/s 1.9555 KOps/s $\color{#35bf28}+2.85\%$
test_func_call_runtime[False-compile-overhead] 1.0914ms 0.5016ms 1.9935 KOps/s 1.9817 KOps/s $\color{#35bf28}+0.59\%$
test_func_call_runtime[True-eager] 1.2038ms 0.7301ms 1.3697 KOps/s 1.3187 KOps/s $\color{#35bf28}+3.87\%$
test_func_call_runtime[True-compile] 0.6756ms 0.5114ms 1.9554 KOps/s 1.9543 KOps/s $\color{#35bf28}+0.06\%$
test_func_call_runtime[True-compile-overhead] 0.5962ms 0.5130ms 1.9492 KOps/s 1.9603 KOps/s $\color{#d91a1a}-0.57\%$
test_func_call_cm_runtime[False-eager] 0.6270ms 0.5240ms 1.9082 KOps/s 1.8816 KOps/s $\color{#35bf28}+1.42\%$
test_func_call_cm_runtime[False-compile] 0.9662ms 0.5003ms 1.9990 KOps/s 1.9904 KOps/s $\color{#35bf28}+0.43\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5764ms 0.4969ms 2.0123 KOps/s 1.8786 KOps/s $\textbf{\color{#35bf28}+7.12\%}$
test_func_call_cm_runtime[True-eager] 1.2083ms 0.8732ms 1.1452 KOps/s 1.1160 KOps/s $\color{#35bf28}+2.62\%$
test_func_call_cm_runtime[True-compile] 1.0264ms 0.7283ms 1.3731 KOps/s 1.3318 KOps/s $\color{#35bf28}+3.11\%$
test_func_call_cm_runtime[True-compile-overhead] 0.8243ms 0.7291ms 1.3716 KOps/s 1.3321 KOps/s $\color{#35bf28}+2.96\%$
test_vmap_func_call_cm_runtime[eager] 2.5895ms 1.9023ms 525.6888 Ops/s 523.2335 Ops/s $\color{#35bf28}+0.47\%$
test_vmap_func_call_cm_runtime[compile] 2.5312ms 1.9587ms 510.5526 Ops/s 508.8822 Ops/s $\color{#35bf28}+0.33\%$
test_vmap_func_call_cm_runtime[compile-overhead] 2.5837ms 1.9396ms 515.5668 Ops/s 509.7985 Ops/s $\color{#35bf28}+1.13\%$
test_distributed 0.4947ms 0.1278ms 7.8246 KOps/s 7.8451 KOps/s $\color{#d91a1a}-0.26\%$
test_tdmodule 0.2842ms 23.7915μs 42.0319 KOps/s 52.9710 KOps/s $\textbf{\color{#d91a1a}-20.65\%}$
test_tdmodule_dispatch 60.6830μs 36.9570μs 27.0584 KOps/s 26.7151 KOps/s $\color{#35bf28}+1.29\%$
test_tdseq 36.5680μs 21.6241μs 46.2447 KOps/s 46.6775 KOps/s $\color{#d91a1a}-0.93\%$
test_tdseq_dispatch 73.5970μs 42.5673μs 23.4922 KOps/s 23.4616 KOps/s $\color{#35bf28}+0.13\%$
test_instantiation_functorch 1.8050ms 1.5725ms 635.9184 Ops/s 630.8680 Ops/s $\color{#35bf28}+0.80\%$
test_exec_functorch 0.4128ms 0.1858ms 5.3810 KOps/s 5.3545 KOps/s $\color{#35bf28}+0.49\%$
test_exec_functional_call 0.2728ms 0.1737ms 5.7556 KOps/s 5.6733 KOps/s $\color{#35bf28}+1.45\%$
test_exec_td_decorator 0.5114ms 0.2328ms 4.2956 KOps/s 4.2058 KOps/s $\color{#35bf28}+2.14\%$
test_vmap_mlp_speed_decorator[True-True] 1.0147ms 0.6537ms 1.5297 KOps/s 1.5414 KOps/s $\color{#d91a1a}-0.76\%$
test_vmap_mlp_speed_decorator[True-False] 1.1178ms 0.6517ms 1.5344 KOps/s 1.5192 KOps/s $\color{#35bf28}+1.00\%$
test_vmap_mlp_speed_decorator[False-True] 1.2588ms 0.5340ms 1.8728 KOps/s 1.8160 KOps/s $\color{#35bf28}+3.13\%$
test_vmap_mlp_speed_decorator[False-False] 0.7187ms 0.5258ms 1.9018 KOps/s 1.8629 KOps/s $\color{#35bf28}+2.09\%$
test_to_module_speed[True] 2.0204ms 1.4247ms 701.9266 Ops/s 689.3084 Ops/s $\color{#35bf28}+1.83\%$
test_to_module_speed[False] 1.6355ms 1.3803ms 724.4924 Ops/s 712.2248 Ops/s $\color{#35bf28}+1.72\%$
test_tc_init 86.2220μs 46.2579μs 21.6180 KOps/s 21.1119 KOps/s $\color{#35bf28}+2.40\%$
test_tc_init_nested 0.1412ms 92.3927μs 10.8234 KOps/s 10.3748 KOps/s $\color{#35bf28}+4.32\%$
test_tc_first_layer_tensor 42.2180μs 1.5099μs 662.3011 KOps/s 654.3074 KOps/s $\color{#35bf28}+1.22\%$
test_tc_first_layer_nontensor 15.7590μs 4.7053μs 212.5261 KOps/s 211.9877 KOps/s $\color{#35bf28}+0.25\%$
test_tc_second_layer_tensor 25.0770μs 2.8662μs 348.8888 KOps/s 352.4934 KOps/s $\color{#d91a1a}-1.02\%$
test_tc_second_layer_nontensor 26.1190μs 6.0727μs 164.6724 KOps/s 163.3894 KOps/s $\color{#35bf28}+0.79\%$
test_unbind 7.7108ms 7.4688ms 133.8897 Ops/s 76.3632 Ops/s $\textbf{\color{#35bf28}+75.33\%}$
test_full_like 20.5934ms 11.7054ms 85.4307 Ops/s 133.2378 Ops/s $\textbf{\color{#d91a1a}-35.88\%}$
test_zeros_like 14.5310ms 7.3173ms 136.6618 Ops/s 353.9653 Ops/s $\textbf{\color{#d91a1a}-61.39\%}$
test_ones_like 15.9134ms 7.4007ms 135.1233 Ops/s 301.3796 Ops/s $\textbf{\color{#d91a1a}-55.17\%}$
test_clone 14.5969ms 9.0662ms 110.2992 Ops/s 192.8710 Ops/s $\textbf{\color{#d91a1a}-42.81\%}$
test_squeeze 54.1110μs 12.2571μs 81.5856 KOps/s 81.4313 KOps/s $\color{#35bf28}+0.19\%$
test_unsqueeze 0.2009ms 93.5313μs 10.6916 KOps/s 10.7040 KOps/s $\color{#d91a1a}-0.12\%$
test_split 0.4403ms 0.1906ms 5.2462 KOps/s 5.0596 KOps/s $\color{#35bf28}+3.69\%$
test_permute 0.4817ms 0.2178ms 4.5907 KOps/s 4.4733 KOps/s $\color{#35bf28}+2.63\%$
test_stack 29.8145ms 25.3905ms 39.3849 Ops/s 40.4738 Ops/s $\color{#d91a1a}-2.69\%$
test_cat 32.3179ms 25.4359ms 39.3146 Ops/s 40.5863 Ops/s $\color{#d91a1a}-3.13\%$

Copy link

github-actions bot commented Oct 4, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 218. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.4651ms 16.0175μs 62.4317 KOps/s 61.1340 KOps/s $\color{#35bf28}+2.12\%$
test_plain_set_stack_nested 35.1510μs 16.1743μs 61.8265 KOps/s 60.6548 KOps/s $\color{#35bf28}+1.93\%$
test_plain_set_nested_inplace 40.9900μs 17.3547μs 57.6212 KOps/s 57.1083 KOps/s $\color{#35bf28}+0.90\%$
test_plain_set_stack_nested_inplace 47.0610μs 17.3988μs 57.4752 KOps/s 57.7637 KOps/s $\color{#d91a1a}-0.50\%$
test_items 25.5310μs 2.8586μs 349.8193 KOps/s 344.1908 KOps/s $\color{#35bf28}+1.64\%$
test_items_nested 0.4337ms 0.3393ms 2.9475 KOps/s 2.9659 KOps/s $\color{#d91a1a}-0.62\%$
test_items_nested_locked 0.3823ms 0.3397ms 2.9441 KOps/s 2.9438 KOps/s $+0.01\%$
test_items_nested_leaf 94.2510μs 62.1990μs 16.0774 KOps/s 16.0599 KOps/s $\color{#35bf28}+0.11\%$
test_items_stack_nested 0.5323ms 0.3400ms 2.9409 KOps/s 2.9290 KOps/s $\color{#35bf28}+0.41\%$
test_items_stack_nested_leaf 99.0120μs 64.3025μs 15.5515 KOps/s 15.6135 KOps/s $\color{#d91a1a}-0.40\%$
test_items_stack_nested_locked 0.3778ms 0.3426ms 2.9192 KOps/s 2.8856 KOps/s $\color{#35bf28}+1.17\%$
test_keys 30.7300μs 3.4216μs 292.2626 KOps/s 292.7639 KOps/s $\color{#d91a1a}-0.17\%$
test_keys_nested 95.3020μs 69.1842μs 14.4542 KOps/s 14.2743 KOps/s $\color{#35bf28}+1.26\%$
test_keys_nested_locked 2.1457ms 76.3915μs 13.0905 KOps/s 13.0672 KOps/s $\color{#35bf28}+0.18\%$
test_keys_nested_leaf 92.0610μs 60.6503μs 16.4880 KOps/s 16.2137 KOps/s $\color{#35bf28}+1.69\%$
test_keys_stack_nested 0.1160ms 71.1627μs 14.0523 KOps/s 14.0010 KOps/s $\color{#35bf28}+0.37\%$
test_keys_stack_nested_leaf 91.7120μs 62.4053μs 16.0243 KOps/s 15.9083 KOps/s $\color{#35bf28}+0.73\%$
test_keys_stack_nested_locked 0.1025ms 75.9725μs 13.1627 KOps/s 12.9873 KOps/s $\color{#35bf28}+1.35\%$
test_values 5.2500μs 0.8399μs 1.1906 MOps/s 1.1463 MOps/s $\color{#35bf28}+3.86\%$
test_values_nested 78.4710μs 48.2796μs 20.7127 KOps/s 20.4650 KOps/s $\color{#35bf28}+1.21\%$
test_values_nested_locked 0.1785ms 49.5703μs 20.1734 KOps/s 20.0132 KOps/s $\color{#35bf28}+0.80\%$
test_values_nested_leaf 72.5910μs 42.3864μs 23.5925 KOps/s 23.5299 KOps/s $\color{#35bf28}+0.27\%$
test_values_stack_nested 89.2720μs 49.2802μs 20.2921 KOps/s 20.0262 KOps/s $\color{#35bf28}+1.33\%$
test_values_stack_nested_leaf 73.7710μs 43.2941μs 23.0978 KOps/s 22.8597 KOps/s $\color{#35bf28}+1.04\%$
test_values_stack_nested_locked 0.1200ms 51.5432μs 19.4012 KOps/s 19.4273 KOps/s $\color{#d91a1a}-0.13\%$
test_membership 1.9161μs 0.4993μs 2.0028 MOps/s 2.0045 MOps/s $\color{#d91a1a}-0.09\%$
test_membership_nested 14.9000μs 1.8386μs 543.8962 KOps/s 552.1392 KOps/s $\color{#d91a1a}-1.49\%$
test_membership_nested_leaf 10.9233μs 1.8247μs 548.0455 KOps/s 561.1668 KOps/s $\color{#d91a1a}-2.34\%$
test_membership_stacked_nested 49.0510μs 1.8798μs 531.9577 KOps/s 540.8289 KOps/s $\color{#d91a1a}-1.64\%$
test_membership_stacked_nested_leaf 30.2610μs 1.8856μs 530.3333 KOps/s 529.3785 KOps/s $\color{#35bf28}+0.18\%$
test_membership_nested_last 0.7403ms 2.9111μs 343.5151 KOps/s 346.5344 KOps/s $\color{#d91a1a}-0.87\%$
test_membership_nested_leaf_last 50.2710μs 2.9451μs 339.5423 KOps/s 336.3437 KOps/s $\color{#35bf28}+0.95\%$
test_membership_stacked_nested_last 48.6610μs 3.4476μs 290.0540 KOps/s 339.7419 KOps/s $\textbf{\color{#d91a1a}-14.63\%}$
test_membership_stacked_nested_leaf_last 42.4910μs 3.4422μs 290.5083 KOps/s 339.7646 KOps/s $\textbf{\color{#d91a1a}-14.50\%}$
test_nested_getleaf 38.2010μs 5.9968μs 166.7554 KOps/s 165.9710 KOps/s $\color{#35bf28}+0.47\%$
test_nested_get 34.3910μs 5.7103μs 175.1212 KOps/s 176.4739 KOps/s $\color{#d91a1a}-0.77\%$
test_stacked_getleaf 36.4510μs 5.9934μs 166.8493 KOps/s 167.4392 KOps/s $\color{#d91a1a}-0.35\%$
test_stacked_get 55.5410μs 5.6890μs 175.7779 KOps/s 179.7282 KOps/s $\color{#d91a1a}-2.20\%$
test_nested_getitemleaf 50.9110μs 6.1026μs 163.8650 KOps/s 164.5363 KOps/s $\color{#d91a1a}-0.41\%$
test_nested_getitem 31.1600μs 5.6953μs 175.5828 KOps/s 176.2546 KOps/s $\color{#d91a1a}-0.38\%$
test_stacked_getitemleaf 38.6300μs 6.0966μs 164.0255 KOps/s 166.3484 KOps/s $\color{#d91a1a}-1.40\%$
test_stacked_getitem 34.6900μs 5.6958μs 175.5665 KOps/s 177.1762 KOps/s $\color{#d91a1a}-0.91\%$
test_lock_nested 2.7275ms 0.4220ms 2.3697 KOps/s 2.3914 KOps/s $\color{#d91a1a}-0.90\%$
test_lock_stack_nested 0.5777ms 0.3849ms 2.5982 KOps/s 2.5941 KOps/s $\color{#35bf28}+0.16\%$
test_unlock_nested 0.7457ms 0.3529ms 2.8335 KOps/s 2.7913 KOps/s $\color{#35bf28}+1.51\%$
test_unlock_stack_nested 0.4447ms 0.3225ms 3.1011 KOps/s 3.0663 KOps/s $\color{#35bf28}+1.13\%$
test_flatten_speed 0.2006ms 75.9088μs 13.1737 KOps/s 13.1477 KOps/s $\color{#35bf28}+0.20\%$
test_unflatten_speed 0.3805ms 0.3181ms 3.1432 KOps/s 3.1821 KOps/s $\color{#d91a1a}-1.22\%$
test_common_ops 1.9099ms 1.1932ms 838.0774 Ops/s 819.7017 Ops/s $\color{#35bf28}+2.24\%$
test_creation 27.0500μs 1.4437μs 692.6726 KOps/s 694.5520 KOps/s $\color{#d91a1a}-0.27\%$
test_creation_empty 47.6610μs 14.2092μs 70.3767 KOps/s 68.4076 KOps/s $\color{#35bf28}+2.88\%$
test_creation_nested_1 54.7710μs 15.8734μs 62.9985 KOps/s 61.0120 KOps/s $\color{#35bf28}+3.26\%$
test_creation_nested_2 54.6410μs 18.4775μs 54.1199 KOps/s 53.2510 KOps/s $\color{#35bf28}+1.63\%$
test_clone 0.1677ms 27.4142μs 36.4775 KOps/s 35.2810 KOps/s $\color{#35bf28}+3.39\%$
test_getitem[int] 1.5362ms 15.8704μs 63.0105 KOps/s 65.3548 KOps/s $\color{#d91a1a}-3.59\%$
test_getitem[slice_int] 0.1314ms 27.3205μs 36.6025 KOps/s 38.4363 KOps/s $\color{#d91a1a}-4.77\%$
test_getitem[range] 0.1479ms 0.1082ms 9.2464 KOps/s 9.2107 KOps/s $\color{#35bf28}+0.39\%$
test_getitem[tuple] 0.1358ms 23.4498μs 42.6443 KOps/s 43.6489 KOps/s $\color{#d91a1a}-2.30\%$
test_getitem[list] 0.2475ms 97.6664μs 10.2389 KOps/s 10.2136 KOps/s $\color{#35bf28}+0.25\%$
test_setitem_dim[int] 0.1800ms 43.6915μs 22.8878 KOps/s 23.1985 KOps/s $\color{#d91a1a}-1.34\%$
test_setitem_dim[slice_int] 0.1127ms 66.7762μs 14.9754 KOps/s 15.3396 KOps/s $\color{#d91a1a}-2.37\%$
test_setitem_dim[range] 0.1730ms 0.1254ms 7.9741 KOps/s 7.9609 KOps/s $\color{#35bf28}+0.17\%$
test_setitem_dim[tuple] 91.1910μs 59.7714μs 16.7304 KOps/s 17.1201 KOps/s $\color{#d91a1a}-2.28\%$
test_setitem 0.1939ms 39.9659μs 25.0213 KOps/s 24.4391 KOps/s $\color{#35bf28}+2.38\%$
test_set 0.1884ms 39.0371μs 25.6167 KOps/s 25.2460 KOps/s $\color{#35bf28}+1.47\%$
test_set_shared 94.7575ms 61.9604μs 16.1393 KOps/s 18.8269 KOps/s $\textbf{\color{#d91a1a}-14.28\%}$
test_update 0.1963ms 47.1200μs 21.2224 KOps/s 20.5427 KOps/s $\color{#35bf28}+3.31\%$
test_update_nested 0.2052ms 55.3277μs 18.0741 KOps/s 17.7319 KOps/s $\color{#35bf28}+1.93\%$
test_update__nested 0.1482ms 60.7228μs 16.4683 KOps/s 15.5127 KOps/s $\textbf{\color{#35bf28}+6.16\%}$
test_set_nested 0.1920ms 41.5181μs 24.0859 KOps/s 23.8710 KOps/s $\color{#35bf28}+0.90\%$
test_set_nested_new 0.1943ms 44.6157μs 22.4136 KOps/s 22.0164 KOps/s $\color{#35bf28}+1.80\%$
test_select 0.2108ms 57.3158μs 17.4472 KOps/s 16.9840 KOps/s $\color{#35bf28}+2.73\%$
test_select_nested 64.9100μs 42.9482μs 23.2839 KOps/s 24.0423 KOps/s $\color{#d91a1a}-3.15\%$
test_exclude_nested 0.1106ms 57.3866μs 17.4257 KOps/s 17.1854 KOps/s $\color{#35bf28}+1.40\%$
test_empty[True] 0.3480ms 0.2538ms 3.9403 KOps/s 3.9257 KOps/s $\color{#35bf28}+0.37\%$
test_empty[False] 0.4702ms 0.7383μs 1.3545 MOps/s 1.3252 MOps/s $\color{#35bf28}+2.21\%$
test_to 47.6310μs 25.0287μs 39.9541 KOps/s 37.8234 KOps/s $\textbf{\color{#35bf28}+5.63\%}$
test_to_nonblocking 60.2510μs 23.9575μs 41.7406 KOps/s 42.5862 KOps/s $\color{#d91a1a}-1.99\%$
test_unbind_speed 1.5355ms 0.2642ms 3.7848 KOps/s 3.7524 KOps/s $\color{#35bf28}+0.86\%$
test_unbind_speed_stack0 0.3702ms 0.2624ms 3.8109 KOps/s 3.7244 KOps/s $\color{#35bf28}+2.32\%$
test_unbind_speed_stack1 97.6042ms 0.6959ms 1.4369 KOps/s 1.5572 KOps/s $\textbf{\color{#d91a1a}-7.73\%}$
test_split 0.1003s 2.1253ms 470.5145 Ops/s 482.8425 Ops/s $\color{#d91a1a}-2.55\%$
test_chunk 0.1015s 2.1396ms 467.3728 Ops/s 481.4057 Ops/s $\color{#d91a1a}-2.91\%$
test_creation[device0] 0.3408ms 0.1244ms 8.0415 KOps/s 7.9292 KOps/s $\color{#35bf28}+1.42\%$
test_creation_from_tensor 0.3424ms 0.1311ms 7.6284 KOps/s 7.8101 KOps/s $\color{#d91a1a}-2.33\%$
test_add_one[memmap_tensor0] 0.1310ms 8.5129μs 117.4683 KOps/s 116.5889 KOps/s $\color{#35bf28}+0.75\%$
test_contiguous[memmap_tensor0] 31.1200μs 2.0590μs 485.6624 KOps/s 468.8325 KOps/s $\color{#35bf28}+3.59\%$
test_stack[memmap_tensor0] 36.4200μs 6.3710μs 156.9617 KOps/s 153.3419 KOps/s $\color{#35bf28}+2.36\%$
test_memmaptd_index 1.2295ms 0.4087ms 2.4469 KOps/s 2.4183 KOps/s $\color{#35bf28}+1.19\%$
test_memmaptd_index_astensor 0.7360ms 0.4770ms 2.0964 KOps/s 2.0842 KOps/s $\color{#35bf28}+0.59\%$
test_memmaptd_index_op 1.3637ms 0.9782ms 1.0223 KOps/s 1.0249 KOps/s $\color{#d91a1a}-0.25\%$
test_serialize_model 0.1318s 0.1303s 7.6773 Ops/s 7.6575 Ops/s $\color{#35bf28}+0.26\%$
test_serialize_model_pickle 1.3518s 1.2128s 0.8245 Ops/s 0.8241 Ops/s $\color{#35bf28}+0.06\%$
test_serialize_weights 0.1304s 0.1300s 7.6915 Ops/s 6.9669 Ops/s $\textbf{\color{#35bf28}+10.40\%}$
test_serialize_weights_returnearly 0.2357s 55.8417ms 17.9078 Ops/s 17.6846 Ops/s $\color{#35bf28}+1.26\%$
test_serialize_weights_pickle 1.3522s 1.2142s 0.8236 Ops/s 0.8246 Ops/s $\color{#d91a1a}-0.12\%$
test_reshape_pytree 66.9610μs 33.8416μs 29.5494 KOps/s 29.6018 KOps/s $\color{#d91a1a}-0.18\%$
test_reshape_td 71.2910μs 38.2308μs 26.1570 KOps/s 25.5033 KOps/s $\color{#35bf28}+2.56\%$
test_view_pytree 0.1275ms 33.4760μs 29.8721 KOps/s 30.1517 KOps/s $\color{#d91a1a}-0.93\%$
test_view_td 78.4610μs 44.4581μs 22.4931 KOps/s 23.3986 KOps/s $\color{#d91a1a}-3.87\%$
test_unbind_pytree 0.1702ms 33.0335μs 30.2723 KOps/s 30.6705 KOps/s $\color{#d91a1a}-1.30\%$
test_unbind_td 0.7813ms 40.7140μs 24.5616 KOps/s 24.8852 KOps/s $\color{#d91a1a}-1.30\%$
test_split_pytree 0.1218ms 42.5929μs 23.4781 KOps/s 23.4412 KOps/s $\color{#35bf28}+0.16\%$
test_split_td 0.1797ms 51.9119μs 19.2634 KOps/s 19.4737 KOps/s $\color{#d91a1a}-1.08\%$
test_add_pytree 0.2323ms 55.6553μs 17.9678 KOps/s 18.8980 KOps/s $\color{#d91a1a}-4.92\%$
test_add_td 0.2694ms 85.4505μs 11.7027 KOps/s 11.3933 KOps/s $\color{#35bf28}+2.72\%$
test_compile_add_one_nested[tensordict-compile] 0.3092ms 0.1591ms 6.2837 KOps/s 6.3645 KOps/s $\color{#d91a1a}-1.27\%$
test_compile_add_one_nested[tensordict-eager] 0.3027ms 0.1557ms 6.4205 KOps/s 6.2272 KOps/s $\color{#35bf28}+3.10\%$
test_compile_add_one_nested[pytree-compile] 0.2909ms 0.1392ms 7.1819 KOps/s 7.1332 KOps/s $\color{#35bf28}+0.68\%$
test_compile_add_one_nested[pytree-eager] 0.3230ms 0.1797ms 5.5654 KOps/s 5.6550 KOps/s $\color{#d91a1a}-1.58\%$
test_compile_copy_nested[tensordict-compile] 0.1573ms 20.0154μs 49.9615 KOps/s 48.1259 KOps/s $\color{#35bf28}+3.81\%$
test_compile_copy_nested[tensordict-eager] 88.0610μs 47.6839μs 20.9714 KOps/s 21.2174 KOps/s $\color{#d91a1a}-1.16\%$
test_compile_copy_nested[pytree-compile] 0.2556ms 63.9410μs 15.6394 KOps/s 16.1012 KOps/s $\color{#d91a1a}-2.87\%$
test_compile_copy_nested[pytree-eager] 0.1459ms 49.3807μs 20.2508 KOps/s 20.4020 KOps/s $\color{#d91a1a}-0.74\%$
test_compile_add_one_flat[tensordict-compile] 0.4575ms 0.3088ms 3.2387 KOps/s 3.2107 KOps/s $\color{#35bf28}+0.87\%$
test_compile_add_one_flat[tensordict-eager] 0.3257ms 0.2310ms 4.3291 KOps/s 4.1796 KOps/s $\color{#35bf28}+3.57\%$
test_compile_add_one_flat[tensorclass-compile] 0.2476ms 0.1228ms 8.1459 KOps/s 8.1369 KOps/s $\color{#35bf28}+0.11\%$
test_compile_add_one_flat[tensorclass-eager] 0.2112ms 63.0335μs 15.8646 KOps/s 15.4340 KOps/s $\color{#35bf28}+2.79\%$
test_compile_add_one_flat[pytree-compile] 0.4629ms 0.3080ms 3.2471 KOps/s 3.2320 KOps/s $\color{#35bf28}+0.47\%$
test_compile_add_one_flat[pytree-eager] 0.8102ms 0.6147ms 1.6269 KOps/s 1.6779 KOps/s $\color{#d91a1a}-3.04\%$
test_compile_add_self_flat[tensordict-eager] 0.4291ms 0.2851ms 3.5075 KOps/s 3.4813 KOps/s $\color{#35bf28}+0.75\%$
test_compile_add_self_flat[tensordict-compile] 0.5227ms 0.3227ms 3.0988 KOps/s 3.2014 KOps/s $\color{#d91a1a}-3.20\%$
test_compile_add_self_flat[tensorclass-eager] 0.2551ms 76.5139μs 13.0695 KOps/s 13.2271 KOps/s $\color{#d91a1a}-1.19\%$
test_compile_add_self_flat[tensorclass-compile] 0.2723ms 0.1310ms 7.6342 KOps/s 7.8977 KOps/s $\color{#d91a1a}-3.34\%$
test_compile_add_self_flat[pytree-eager] 0.7073ms 0.5194ms 1.9253 KOps/s 1.9982 KOps/s $\color{#d91a1a}-3.65\%$
test_compile_add_self_flat[pytree-compile] 0.4509ms 0.3132ms 3.1930 KOps/s 3.2422 KOps/s $\color{#d91a1a}-1.52\%$
test_compile_copy_flat[tensordict-compile] 0.1476ms 18.0794μs 55.3116 KOps/s 54.9604 KOps/s $\color{#35bf28}+0.64\%$
test_compile_copy_flat[tensordict-eager] 0.1370ms 37.9037μs 26.3827 KOps/s 25.7580 KOps/s $\color{#35bf28}+2.43\%$
test_compile_copy_flat[pytree-compile] 0.1788ms 68.6372μs 14.5694 KOps/s 14.6590 KOps/s $\color{#d91a1a}-0.61\%$
test_compile_copy_flat[pytree-eager] 0.1440ms 50.9162μs 19.6401 KOps/s 19.7956 KOps/s $\color{#d91a1a}-0.79\%$
test_compile_assign_and_add[tensordict-compile] 2.3938ms 0.8305ms 1.2042 KOps/s 1.1260 KOps/s $\textbf{\color{#35bf28}+6.95\%}$
test_compile_assign_and_add[tensordict-eager] 3.4080ms 3.1671ms 315.7433 Ops/s 305.2655 Ops/s $\color{#35bf28}+3.43\%$
test_compile_assign_and_add[pytree-compile] 2.2425ms 0.7948ms 1.2582 KOps/s 1.1653 KOps/s $\textbf{\color{#35bf28}+7.97\%}$
test_compile_assign_and_add[pytree-eager] 3.4133ms 3.1347ms 319.0100 Ops/s 327.4789 Ops/s $\color{#d91a1a}-2.59\%$
test_compile_indexing[tensor-tensordict-compile] 0.2592ms 0.1105ms 9.0457 KOps/s 9.3177 KOps/s $\color{#d91a1a}-2.92\%$
test_compile_indexing[tensor-tensordict-eager] 0.2083ms 61.3271μs 16.3060 KOps/s 16.6586 KOps/s $\color{#d91a1a}-2.12\%$
test_compile_indexing[tensor-tensorclass-compile] 0.2735ms 0.1048ms 9.5452 KOps/s 9.8848 KOps/s $\color{#d91a1a}-3.44\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2299ms 42.5957μs 23.4766 KOps/s 24.2241 KOps/s $\color{#d91a1a}-3.09\%$
test_compile_indexing[tensor-pytree-compile] 0.2809ms 0.1036ms 9.6516 KOps/s 9.8020 KOps/s $\color{#d91a1a}-1.53\%$
test_compile_indexing[tensor-pytree-eager] 0.2127ms 42.9054μs 23.3071 KOps/s 24.5340 KOps/s $\textbf{\color{#d91a1a}-5.00\%}$
test_compile_indexing[slice-tensordict-compile] 0.2696ms 0.1358ms 7.3626 KOps/s 7.5655 KOps/s $\color{#d91a1a}-2.68\%$
test_compile_indexing[slice-tensordict-eager] 0.2076ms 23.0023μs 43.4739 KOps/s 42.8211 KOps/s $\color{#35bf28}+1.52\%$
test_compile_indexing[slice-tensorclass-compile] 0.2368ms 0.1315ms 7.6056 KOps/s 7.9693 KOps/s $\color{#d91a1a}-4.56\%$
test_compile_indexing[slice-tensorclass-eager] 0.1918ms 19.5640μs 51.1143 KOps/s 51.6139 KOps/s $\color{#d91a1a}-0.97\%$
test_compile_indexing[slice-pytree-compile] 0.2740ms 0.1272ms 7.8646 KOps/s 7.9248 KOps/s $\color{#d91a1a}-0.76\%$
test_compile_indexing[slice-pytree-eager] 0.1700ms 19.8318μs 50.4240 KOps/s 51.9054 KOps/s $\color{#d91a1a}-2.85\%$
test_compile_indexing[int-tensordict-compile] 0.2647ms 0.1339ms 7.4696 KOps/s 7.5328 KOps/s $\color{#d91a1a}-0.84\%$
test_compile_indexing[int-tensordict-eager] 0.5062ms 24.4477μs 40.9037 KOps/s 42.9370 KOps/s $\color{#d91a1a}-4.74\%$
test_compile_indexing[int-tensorclass-compile] 0.2945ms 0.1277ms 7.8292 KOps/s 7.9037 KOps/s $\color{#d91a1a}-0.94\%$
test_compile_indexing[int-tensorclass-eager] 62.1710μs 19.9359μs 50.1607 KOps/s 51.7495 KOps/s $\color{#d91a1a}-3.07\%$
test_compile_indexing[int-pytree-compile] 0.2693ms 0.1293ms 7.7345 KOps/s 7.9269 KOps/s $\color{#d91a1a}-2.43\%$
test_compile_indexing[int-pytree-eager] 48.4110μs 19.6502μs 50.8900 KOps/s 51.8060 KOps/s $\color{#d91a1a}-1.77\%$
test_mod_add[eager] 0.2155ms 29.9449μs 33.3947 KOps/s 33.7344 KOps/s $\color{#d91a1a}-1.01\%$
test_mod_add[compile] 0.2122ms 66.9858μs 14.9285 KOps/s 14.0748 KOps/s $\textbf{\color{#35bf28}+6.07\%}$
test_mod_add[compile-overhead] 0.2750ms 0.1339ms 7.4668 KOps/s 6.2950 KOps/s $\textbf{\color{#35bf28}+18.61\%}$
test_mod_wrap[eager] 0.4361ms 0.2366ms 4.2263 KOps/s 4.0120 KOps/s $\textbf{\color{#35bf28}+5.34\%}$
test_mod_wrap[compile] 1.3683ms 0.2852ms 3.5064 KOps/s 3.3274 KOps/s $\textbf{\color{#35bf28}+5.38\%}$
test_mod_wrap[compile-overhead] 7.6884ms 4.0758ms 245.3513 Ops/s 249.3888 Ops/s $\color{#d91a1a}-1.62\%$
test_mod_wrap_and_backward[eager] 1.5110ms 1.3197ms 757.7320 Ops/s 699.8025 Ops/s $\textbf{\color{#35bf28}+8.28\%}$
test_mod_wrap_and_backward[compile] 1.5315ms 1.2864ms 777.3465 Ops/s 698.4912 Ops/s $\textbf{\color{#35bf28}+11.29\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3309ms 0.8943ms 1.1182 KOps/s 935.3359 Ops/s $\textbf{\color{#35bf28}+19.55\%}$
test_seq_add[eager] 0.2475ms 93.6599μs 10.6769 KOps/s 10.2619 KOps/s $\color{#35bf28}+4.04\%$
test_seq_add[compile] 0.2533ms 79.5275μs 12.5743 KOps/s 12.5875 KOps/s $\color{#d91a1a}-0.11\%$
test_seq_add[compile-overhead] 0.2590ms 0.1124ms 8.8974 KOps/s 8.8932 KOps/s $\color{#35bf28}+0.05\%$
test_seq_wrap[eager] 0.5427ms 0.3907ms 2.5598 KOps/s 2.6222 KOps/s $\color{#d91a1a}-2.38\%$
test_seq_wrap[compile] 0.4625ms 0.3054ms 3.2746 KOps/s 3.2494 KOps/s $\color{#35bf28}+0.78\%$
test_seq_wrap[compile-overhead] 0.3628ms 0.2172ms 4.6040 KOps/s 4.6354 KOps/s $\color{#d91a1a}-0.68\%$
test_func_call_runtime[False-eager] 0.9439ms 0.7615ms 1.3132 KOps/s 1.3908 KOps/s $\textbf{\color{#d91a1a}-5.58\%}$
test_func_call_runtime[False-compile] 0.9439ms 0.7650ms 1.3071 KOps/s 1.2876 KOps/s $\color{#35bf28}+1.51\%$
test_func_call_runtime[False-compile-overhead] 0.4955ms 0.3501ms 2.8567 KOps/s 2.8073 KOps/s $\color{#35bf28}+1.76\%$
test_func_call_runtime[True-eager] 1.0800ms 0.8803ms 1.1360 KOps/s 1.0840 KOps/s $\color{#35bf28}+4.80\%$
test_func_call_runtime[True-compile] 0.9795ms 0.7886ms 1.2681 KOps/s 1.2538 KOps/s $\color{#35bf28}+1.13\%$
test_func_call_runtime[True-compile-overhead] 0.5091ms 0.3716ms 2.6908 KOps/s 2.6391 KOps/s $\color{#35bf28}+1.96\%$
test_func_call_cm_runtime[False-eager] 0.8780ms 0.7156ms 1.3973 KOps/s 1.3064 KOps/s $\textbf{\color{#35bf28}+6.96\%}$
test_func_call_cm_runtime[False-compile] 0.9497ms 0.7663ms 1.3049 KOps/s 1.2785 KOps/s $\color{#35bf28}+2.07\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4848ms 0.3519ms 2.8417 KOps/s 2.7930 KOps/s $\color{#35bf28}+1.74\%$
test_func_call_cm_runtime[True-eager] 1.2187ms 0.9928ms 1.0072 KOps/s 962.1117 Ops/s $\color{#35bf28}+4.69\%$
test_func_call_cm_runtime[True-compile] 0.9582ms 0.8074ms 1.2386 KOps/s 1.1890 KOps/s $\color{#35bf28}+4.17\%$
test_func_call_cm_runtime[True-compile-overhead] 0.6089ms 0.3967ms 2.5210 KOps/s 2.4704 KOps/s $\color{#35bf28}+2.05\%$
test_vmap_func_call_cm_runtime[eager] 2.5344ms 2.0814ms 480.4474 Ops/s 467.7112 Ops/s $\color{#35bf28}+2.72\%$
test_vmap_func_call_cm_runtime[compile] 1.0269ms 0.8279ms 1.2079 KOps/s 1.1618 KOps/s $\color{#35bf28}+3.97\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5368ms 0.4004ms 2.4978 KOps/s 2.4922 KOps/s $\color{#35bf28}+0.22\%$
test_distributed 5.1631ms 0.2240ms 4.4635 KOps/s 8.8117 KOps/s $\textbf{\color{#d91a1a}-49.35\%}$
test_tdmodule 0.1838ms 14.2997μs 69.9316 KOps/s 67.2915 KOps/s $\color{#35bf28}+3.92\%$
test_tdmodule_dispatch 47.8810μs 27.1065μs 36.8915 KOps/s 36.6041 KOps/s $\color{#35bf28}+0.79\%$
test_tdseq 35.5010μs 15.0847μs 66.2924 KOps/s 66.9264 KOps/s $\color{#d91a1a}-0.95\%$
test_tdseq_dispatch 0.1553ms 30.3672μs 32.9302 KOps/s 33.2084 KOps/s $\color{#d91a1a}-0.84\%$
test_instantiation_functorch 2.2430ms 1.8542ms 539.3125 Ops/s 556.8718 Ops/s $\color{#d91a1a}-3.15\%$
test_exec_functorch 0.3420ms 0.2083ms 4.8008 KOps/s 4.9443 KOps/s $\color{#d91a1a}-2.90\%$
test_exec_functional_call 0.6058ms 0.2115ms 4.7282 KOps/s 4.9112 KOps/s $\color{#d91a1a}-3.73\%$
test_exec_td_decorator 0.6501ms 0.2657ms 3.7632 KOps/s 3.8859 KOps/s $\color{#d91a1a}-3.16\%$
test_vmap_mlp_speed_decorator[True-True] 1.0820ms 0.6890ms 1.4513 KOps/s 1.5028 KOps/s $\color{#d91a1a}-3.43\%$
test_vmap_mlp_speed_decorator[True-False] 1.1148ms 0.6899ms 1.4494 KOps/s 1.5032 KOps/s $\color{#d91a1a}-3.58\%$
test_vmap_mlp_speed_decorator[False-True] 1.0279ms 0.6104ms 1.6383 KOps/s 1.7048 KOps/s $\color{#d91a1a}-3.90\%$
test_vmap_mlp_speed_decorator[False-False] 1.0088ms 0.6109ms 1.6368 KOps/s 1.7041 KOps/s $\color{#d91a1a}-3.95\%$
test_vmap_transformer_speed_decorator[True-True] 20.3087ms 19.9815ms 50.0464 Ops/s 51.9532 Ops/s $\color{#d91a1a}-3.67\%$
test_vmap_transformer_speed_decorator[True-False] 20.3744ms 19.9635ms 50.0915 Ops/s 51.1774 Ops/s $\color{#d91a1a}-2.12\%$
test_vmap_transformer_speed_decorator[False-True] 20.4420ms 19.8273ms 50.4356 Ops/s 50.8844 Ops/s $\color{#d91a1a}-0.88\%$
test_vmap_transformer_speed_decorator[False-False] 20.1504ms 19.8320ms 50.4234 Ops/s 50.5182 Ops/s $\color{#d91a1a}-0.19\%$
test_to_module_speed[True] 1.4234ms 0.9939ms 1.0062 KOps/s 1.0238 KOps/s $\color{#d91a1a}-1.72\%$
test_to_module_speed[False] 1.3984ms 0.9669ms 1.0342 KOps/s 1.0502 KOps/s $\color{#d91a1a}-1.52\%$
test_tc_init 54.7910μs 31.4816μs 31.7646 KOps/s 31.0970 KOps/s $\color{#35bf28}+2.15\%$
test_tc_init_nested 0.4651ms 65.3516μs 15.3019 KOps/s 15.0107 KOps/s $\color{#35bf28}+1.94\%$
test_tc_first_layer_tensor 54.9736μs 0.6690μs 1.4947 MOps/s 1.4885 MOps/s $\color{#35bf28}+0.41\%$
test_tc_first_layer_nontensor 24.3800μs 2.1917μs 456.2684 KOps/s 454.5519 KOps/s $\color{#35bf28}+0.38\%$
test_tc_second_layer_tensor 0.1008ms 1.3446μs 743.7366 KOps/s 741.1637 KOps/s $\color{#35bf28}+0.35\%$
test_tc_second_layer_nontensor 30.3000μs 2.9124μs 343.3651 KOps/s 345.6805 KOps/s $\color{#d91a1a}-0.67\%$
test_unbind 0.1961s 12.2904ms 81.3642 Ops/s 93.3283 Ops/s $\textbf{\color{#d91a1a}-12.82\%}$
test_full_like 0.7763ms 0.5732ms 1.7446 KOps/s 1.7419 KOps/s $\color{#35bf28}+0.15\%$
test_zeros_like 0.3193ms 0.1980ms 5.0502 KOps/s 5.0556 KOps/s $\color{#d91a1a}-0.11\%$
test_ones_like 0.5599ms 0.1980ms 5.0513 KOps/s 5.0513 KOps/s $+0.00\%$
test_clone 1.2068ms 0.4151ms 2.4090 KOps/s 2.4117 KOps/s $\color{#d91a1a}-0.11\%$
test_squeeze 51.1310μs 9.1823μs 108.9050 KOps/s 107.6735 KOps/s $\color{#35bf28}+1.14\%$
test_unsqueeze 0.2696ms 71.1274μs 14.0593 KOps/s 14.1359 KOps/s $\color{#d91a1a}-0.54\%$
test_split 0.2823ms 0.1487ms 6.7231 KOps/s 6.6899 KOps/s $\color{#35bf28}+0.50\%$
test_permute 0.3379ms 0.1706ms 5.8604 KOps/s 5.8035 KOps/s $\color{#35bf28}+0.98\%$
test_stack 1.3801ms 0.8603ms 1.1624 KOps/s 1.1762 KOps/s $\color{#d91a1a}-1.17\%$
test_cat 1.3576ms 1.2315ms 811.9999 Ops/s 811.9906 Ops/s $+0.00\%$

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Oct 8, 2024
ghstack-source-id: 9a4bc97d50e182e9dec72feec8d772bb7d46426b
Pull Request resolved: #1029
@vmoens vmoens added the Refactor Refactoring code - not a new feature label Oct 8, 2024
@vmoens vmoens merged commit 6beb5e1 into gh/vmoens/26/base Oct 8, 2024
50 of 51 checks passed
vmoens added a commit that referenced this pull request Oct 8, 2024
ghstack-source-id: 9a4bc97d50e182e9dec72feec8d772bb7d46426b
Pull Request resolved: #1029
@vmoens vmoens deleted the gh/vmoens/26/head branch October 8, 2024 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Refactor Refactoring code - not a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants