Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Ensure grads and noned when needed #1069

Merged
merged 1 commit into from
Nov 1, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Nov 1, 2024

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 1, 2024
ghstack-source-id: 5e9c5a974e5a5c73b033e5b85c3eb70c2f433512
Pull Request resolved: #1069
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 1, 2024
@vmoens vmoens merged commit 035ce7d into gh/vmoens/35/base Nov 1, 2024
26 of 37 checks passed
vmoens added a commit that referenced this pull request Nov 1, 2024
ghstack-source-id: 5e9c5a974e5a5c73b033e5b85c3eb70c2f433512
Pull Request resolved: #1069
@vmoens vmoens deleted the gh/vmoens/35/head branch November 1, 2024 16:11
Copy link

github-actions bot commented Nov 1, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 216. Improved: $\large\color{#35bf28}27$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 49.6130μs 21.4264μs 46.6714 KOps/s 46.0430 KOps/s $\color{#35bf28}+1.36\%$
test_plain_set_stack_nested 60.1230μs 21.4382μs 46.6456 KOps/s 45.5783 KOps/s $\color{#35bf28}+2.34\%$
test_plain_set_nested_inplace 61.1640μs 23.5136μs 42.5285 KOps/s 42.2147 KOps/s $\color{#35bf28}+0.74\%$
test_plain_set_stack_nested_inplace 61.3350μs 23.5616μs 42.4420 KOps/s 42.0425 KOps/s $\color{#35bf28}+0.95\%$
test_items 25.1470μs 4.1992μs 238.1395 KOps/s 240.2238 KOps/s $\color{#d91a1a}-0.87\%$
test_items_nested 0.4065ms 0.3353ms 2.9824 KOps/s 2.9166 KOps/s $\color{#35bf28}+2.26\%$
test_items_nested_locked 0.6329ms 0.3410ms 2.9328 KOps/s 2.9302 KOps/s $\color{#35bf28}+0.09\%$
test_items_nested_leaf 0.1300ms 71.0052μs 14.0835 KOps/s 13.7194 KOps/s $\color{#35bf28}+2.65\%$
test_items_stack_nested 0.6525ms 0.3407ms 2.9356 KOps/s 2.8923 KOps/s $\color{#35bf28}+1.49\%$
test_items_stack_nested_leaf 0.1424ms 74.1460μs 13.4869 KOps/s 13.2453 KOps/s $\color{#35bf28}+1.82\%$
test_items_stack_nested_locked 0.6212ms 0.3399ms 2.9423 KOps/s 2.9050 KOps/s $\color{#35bf28}+1.28\%$
test_keys 17.7340μs 4.8409μs 206.5738 KOps/s 278.2824 KOps/s $\textbf{\color{#d91a1a}-25.77\%}$
test_keys_nested 0.2003ms 0.1371ms 7.2965 KOps/s 7.0518 KOps/s $\color{#35bf28}+3.47\%$
test_keys_nested_locked 0.6684ms 0.1413ms 7.0793 KOps/s 6.9443 KOps/s $\color{#35bf28}+1.94\%$
test_keys_nested_leaf 0.2444ms 0.1142ms 8.7550 KOps/s 8.3382 KOps/s $\color{#35bf28}+5.00\%$
test_keys_stack_nested 0.2517ms 0.1351ms 7.4026 KOps/s 7.1741 KOps/s $\color{#35bf28}+3.19\%$
test_keys_stack_nested_leaf 0.2002ms 0.1154ms 8.6639 KOps/s 8.6160 KOps/s $\color{#35bf28}+0.56\%$
test_keys_stack_nested_locked 0.2705ms 0.1403ms 7.1262 KOps/s 6.9075 KOps/s $\color{#35bf28}+3.17\%$
test_values 5.2058μs 1.0354μs 965.7813 KOps/s 942.7813 KOps/s $\color{#35bf28}+2.44\%$
test_values_nested 0.1091ms 55.1086μs 18.1460 KOps/s 17.3217 KOps/s $\color{#35bf28}+4.76\%$
test_values_nested_locked 0.1111ms 54.6999μs 18.2816 KOps/s 17.4438 KOps/s $\color{#35bf28}+4.80\%$
test_values_nested_leaf 0.1072ms 59.5297μs 16.7983 KOps/s 16.3596 KOps/s $\color{#35bf28}+2.68\%$
test_values_stack_nested 0.1100ms 56.2781μs 17.7689 KOps/s 16.9999 KOps/s $\color{#35bf28}+4.52\%$
test_values_stack_nested_leaf 0.1124ms 59.7659μs 16.7319 KOps/s 16.3294 KOps/s $\color{#35bf28}+2.47\%$
test_values_stack_nested_locked 0.1075ms 55.9499μs 17.8731 KOps/s 17.1308 KOps/s $\color{#35bf28}+4.33\%$
test_membership 5.6191μs 0.7299μs 1.3701 MOps/s 1.3123 MOps/s $\color{#35bf28}+4.40\%$
test_membership_nested 33.5130μs 2.7396μs 365.0187 KOps/s 359.7195 KOps/s $\color{#35bf28}+1.47\%$
test_membership_nested_leaf 23.8250μs 2.7362μs 365.4651 KOps/s 355.8225 KOps/s $\color{#35bf28}+2.71\%$
test_membership_stacked_nested 22.1320μs 2.7166μs 368.1096 KOps/s 357.7841 KOps/s $\color{#35bf28}+2.89\%$
test_membership_stacked_nested_leaf 15.4690μs 2.6897μs 371.7944 KOps/s 351.5829 KOps/s $\textbf{\color{#35bf28}+5.75\%}$
test_membership_nested_last 19.9570μs 3.9851μs 250.9333 KOps/s 240.7254 KOps/s $\color{#35bf28}+4.24\%$
test_membership_nested_leaf_last 37.3090μs 4.0172μs 248.9301 KOps/s 238.4439 KOps/s $\color{#35bf28}+4.40\%$
test_membership_stacked_nested_last 27.1110μs 3.9664μs 252.1182 KOps/s 158.9135 KOps/s $\textbf{\color{#35bf28}+58.65\%}$
test_membership_stacked_nested_leaf_last 35.8770μs 4.0215μs 248.6638 KOps/s 158.9444 KOps/s $\textbf{\color{#35bf28}+56.45\%}$
test_nested_getleaf 36.2580μs 10.4859μs 95.3661 KOps/s 95.9031 KOps/s $\color{#d91a1a}-0.56\%$
test_nested_get 28.1820μs 9.9769μs 100.2317 KOps/s 96.9187 KOps/s $\color{#35bf28}+3.42\%$
test_stacked_getleaf 48.9320μs 10.4643μs 95.5628 KOps/s 95.0474 KOps/s $\color{#35bf28}+0.54\%$
test_stacked_get 47.3420μs 10.0308μs 99.6933 KOps/s 99.2650 KOps/s $\color{#35bf28}+0.43\%$
test_nested_getitemleaf 28.5730μs 10.8662μs 92.0287 KOps/s 88.4752 KOps/s $\color{#35bf28}+4.02\%$
test_nested_getitem 33.6730μs 10.0651μs 99.3532 KOps/s 93.2774 KOps/s $\textbf{\color{#35bf28}+6.51\%}$
test_stacked_getitemleaf 33.6930μs 10.9282μs 91.5067 KOps/s 89.8618 KOps/s $\color{#35bf28}+1.83\%$
test_stacked_getitem 33.4330μs 10.2571μs 97.4937 KOps/s 95.5856 KOps/s $\color{#35bf28}+2.00\%$
test_lock_nested 0.9772ms 0.4851ms 2.0612 KOps/s 2.0656 KOps/s $\color{#d91a1a}-0.21\%$
test_lock_stack_nested 0.5613ms 0.4565ms 2.1904 KOps/s 2.2320 KOps/s $\color{#d91a1a}-1.87\%$
test_unlock_nested 0.9912ms 0.4073ms 2.4555 KOps/s 2.4789 KOps/s $\color{#d91a1a}-0.94\%$
test_unlock_stack_nested 0.6383ms 0.3785ms 2.6423 KOps/s 2.7296 KOps/s $\color{#d91a1a}-3.20\%$
test_flatten_speed 0.1653ms 90.9944μs 10.9897 KOps/s 10.8559 KOps/s $\color{#35bf28}+1.23\%$
test_unflatten_speed 0.8835ms 0.4717ms 2.1200 KOps/s 2.0901 KOps/s $\color{#35bf28}+1.43\%$
test_common_ops 2.0218ms 1.1079ms 902.6039 Ops/s 865.2676 Ops/s $\color{#35bf28}+4.31\%$
test_creation 16.7510μs 2.1111μs 473.6904 KOps/s 479.1918 KOps/s $\color{#d91a1a}-1.15\%$
test_creation_empty 50.1330μs 18.0529μs 55.3927 KOps/s 53.2980 KOps/s $\color{#35bf28}+3.93\%$
test_creation_nested_1 1.2426ms 21.2275μs 47.1087 KOps/s 45.2853 KOps/s $\color{#35bf28}+4.03\%$
test_creation_nested_2 72.2050μs 25.5662μs 39.1142 KOps/s 38.2977 KOps/s $\color{#35bf28}+2.13\%$
test_clone 48.4100μs 17.3422μs 57.6627 KOps/s 57.6336 KOps/s $\color{#35bf28}+0.05\%$
test_getitem[int] 0.8518ms 16.4705μs 60.7147 KOps/s 58.3670 KOps/s $\color{#35bf28}+4.02\%$
test_getitem[slice_int] 0.1374ms 30.3036μs 32.9994 KOps/s 31.4157 KOps/s $\textbf{\color{#35bf28}+5.04\%}$
test_getitem[range] 0.1753ms 58.4010μs 17.1230 KOps/s 16.3407 KOps/s $\color{#35bf28}+4.79\%$
test_getitem[tuple] 0.1314ms 24.7842μs 40.3484 KOps/s 39.8414 KOps/s $\color{#35bf28}+1.27\%$
test_getitem[list] 0.1607ms 53.6984μs 18.6225 KOps/s 17.9551 KOps/s $\color{#35bf28}+3.72\%$
test_setitem_dim[int] 65.7530μs 32.4588μs 30.8083 KOps/s 30.2550 KOps/s $\color{#35bf28}+1.83\%$
test_setitem_dim[slice_int] 87.1230μs 60.2406μs 16.6001 KOps/s 15.6774 KOps/s $\textbf{\color{#35bf28}+5.89\%}$
test_setitem_dim[range] 0.1485ms 83.1816μs 12.0219 KOps/s 11.6316 KOps/s $\color{#35bf28}+3.36\%$
test_setitem_dim[tuple] 93.4950μs 47.9060μs 20.8742 KOps/s 19.8161 KOps/s $\textbf{\color{#35bf28}+5.34\%}$
test_setitem 0.1057ms 30.0711μs 33.2545 KOps/s 32.7496 KOps/s $\color{#35bf28}+1.54\%$
test_set 0.1550ms 29.4654μs 33.9381 KOps/s 33.8503 KOps/s $\color{#35bf28}+0.26\%$
test_set_shared 4.6396ms 0.2172ms 4.6030 KOps/s 4.6285 KOps/s $\color{#d91a1a}-0.55\%$
test_update 0.4188ms 37.9177μs 26.3729 KOps/s 26.4824 KOps/s $\color{#d91a1a}-0.41\%$
test_update_nested 0.1988ms 46.4509μs 21.5281 KOps/s 20.3288 KOps/s $\textbf{\color{#35bf28}+5.90\%}$
test_update__nested 0.9772ms 41.5548μs 24.0646 KOps/s 24.0927 KOps/s $\color{#d91a1a}-0.12\%$
test_set_nested 0.1054ms 32.0966μs 31.1560 KOps/s 29.5728 KOps/s $\textbf{\color{#35bf28}+5.35\%}$
test_set_nested_new 0.1009ms 36.7691μs 27.1968 KOps/s 26.3619 KOps/s $\color{#35bf28}+3.17\%$
test_select 0.1233ms 53.9404μs 18.5390 KOps/s 17.2096 KOps/s $\textbf{\color{#35bf28}+7.72\%}$
test_select_nested 0.1180ms 58.1263μs 17.2039 KOps/s 16.7919 KOps/s $\color{#35bf28}+2.45\%$
test_exclude_nested 0.5464ms 77.8463μs 12.8458 KOps/s 13.3917 KOps/s $\color{#d91a1a}-4.08\%$
test_empty[True] 0.4581ms 0.3491ms 2.8646 KOps/s 2.8263 KOps/s $\color{#35bf28}+1.36\%$
test_empty[False] 9.2450μs 1.2207μs 819.2289 KOps/s 749.1616 KOps/s $\textbf{\color{#35bf28}+9.35\%}$
test_unbind_speed 0.3584ms 0.3047ms 3.2822 KOps/s 3.3607 KOps/s $\color{#d91a1a}-2.34\%$
test_unbind_speed_stack0 0.9042ms 0.2977ms 3.3588 KOps/s 3.4455 KOps/s $\color{#d91a1a}-2.52\%$
test_unbind_speed_stack1 98.8955ms 0.8115ms 1.2323 KOps/s 1.3729 KOps/s $\textbf{\color{#d91a1a}-10.24\%}$
test_split 2.4152ms 1.9673ms 508.3194 Ops/s 507.1358 Ops/s $\color{#35bf28}+0.23\%$
test_chunk 0.1055s 2.1924ms 456.1262 Ops/s 420.2582 Ops/s $\textbf{\color{#35bf28}+8.53\%}$
test_creation[device0] 0.5197ms 0.1162ms 8.6035 KOps/s 8.4894 KOps/s $\color{#35bf28}+1.34\%$
test_creation_from_tensor 3.3715ms 0.1190ms 8.4047 KOps/s 8.3343 KOps/s $\color{#35bf28}+0.84\%$
test_add_one[memmap_tensor0] 0.2638ms 7.4802μs 133.6855 KOps/s 137.8046 KOps/s $\color{#d91a1a}-2.99\%$
test_contiguous[memmap_tensor0] 18.6950μs 1.8623μs 536.9591 KOps/s 516.9724 KOps/s $\color{#35bf28}+3.87\%$
test_stack[memmap_tensor0] 81.0520μs 5.3121μs 188.2481 KOps/s 175.4203 KOps/s $\textbf{\color{#35bf28}+7.31\%}$
test_memmaptd_index 1.0931ms 0.3875ms 2.5803 KOps/s 2.5127 KOps/s $\color{#35bf28}+2.69\%$
test_memmaptd_index_astensor 1.1106ms 0.4714ms 2.1214 KOps/s 2.0662 KOps/s $\color{#35bf28}+2.67\%$
test_memmaptd_index_op 1.3613ms 1.0071ms 992.9887 Ops/s 974.0518 Ops/s $\color{#35bf28}+1.94\%$
test_serialize_model 0.1257s 0.1159s 8.6251 Ops/s 8.4671 Ops/s $\color{#35bf28}+1.87\%$
test_serialize_model_pickle 0.4356s 0.3883s 2.5756 Ops/s 2.5038 Ops/s $\color{#35bf28}+2.87\%$
test_serialize_weights 0.1156s 0.1126s 8.8802 Ops/s 7.6255 Ops/s $\textbf{\color{#35bf28}+16.45\%}$
test_serialize_weights_returnearly 0.2670s 0.1743s 5.7388 Ops/s 6.3599 Ops/s $\textbf{\color{#d91a1a}-9.77\%}$
test_serialize_weights_pickle 0.5175s 0.4292s 2.3299 Ops/s 2.5267 Ops/s $\textbf{\color{#d91a1a}-7.79\%}$
test_serialize_weights_filesystem 0.2369s 0.1542s 6.4852 Ops/s 6.9500 Ops/s $\textbf{\color{#d91a1a}-6.69\%}$
test_serialize_model_filesystem 0.1609s 0.1438s 6.9523 Ops/s 6.7945 Ops/s $\color{#35bf28}+2.32\%$
test_reshape_pytree 0.1165ms 39.2502μs 25.4776 KOps/s 24.4193 KOps/s $\color{#35bf28}+4.33\%$
test_reshape_td 0.1175ms 48.0432μs 20.8146 KOps/s 21.3112 KOps/s $\color{#d91a1a}-2.33\%$
test_view_pytree 87.7140μs 38.7729μs 25.7912 KOps/s 24.9740 KOps/s $\color{#35bf28}+3.27\%$
test_view_td 0.1143ms 54.2698μs 18.4265 KOps/s 18.9347 KOps/s $\color{#d91a1a}-2.68\%$
test_unbind_pytree 74.1490μs 35.1557μs 28.4449 KOps/s 27.2432 KOps/s $\color{#35bf28}+4.41\%$
test_unbind_td 0.3073ms 45.0138μs 22.2154 KOps/s 22.2173 KOps/s $-0.01\%$
test_split_pytree 0.1151ms 38.0296μs 26.2953 KOps/s 25.6700 KOps/s $\color{#35bf28}+2.44\%$
test_split_td 0.1919ms 56.9988μs 17.5442 KOps/s 17.1918 KOps/s $\color{#35bf28}+2.05\%$
test_add_pytree 0.1466ms 47.2111μs 21.1815 KOps/s 21.5570 KOps/s $\color{#d91a1a}-1.74\%$
test_add_td 0.1706ms 86.3540μs 11.5802 KOps/s 11.6684 KOps/s $\color{#d91a1a}-0.76\%$
test_compile_add_one_nested[tensordict-compile] 0.1523ms 70.0422μs 14.2771 KOps/s 13.8260 KOps/s $\color{#35bf28}+3.26\%$
test_compile_add_one_nested[tensordict-eager] 0.3548ms 0.1877ms 5.3284 KOps/s 5.3839 KOps/s $\color{#d91a1a}-1.03\%$
test_compile_add_one_nested[pytree-compile] 0.1197ms 53.3437μs 18.7464 KOps/s 17.8109 KOps/s $\textbf{\color{#35bf28}+5.25\%}$
test_compile_add_one_nested[pytree-eager] 0.2276ms 0.1454ms 6.8758 KOps/s 6.7806 KOps/s $\color{#35bf28}+1.40\%$
test_compile_copy_nested[tensordict-compile] 83.9970μs 25.6571μs 38.9756 KOps/s 37.9497 KOps/s $\color{#35bf28}+2.70\%$
test_compile_copy_nested[tensordict-eager] 0.1196ms 70.3331μs 14.2181 KOps/s 14.2758 KOps/s $\color{#d91a1a}-0.40\%$
test_compile_copy_nested[pytree-compile] 0.1367ms 78.0182μs 12.8175 KOps/s 12.4027 KOps/s $\color{#35bf28}+3.34\%$
test_compile_copy_nested[pytree-eager] 0.1230ms 67.5378μs 14.8065 KOps/s 14.5954 KOps/s $\color{#35bf28}+1.45\%$
test_compile_add_one_flat[tensordict-compile] 0.1910ms 0.1130ms 8.8502 KOps/s 8.4252 KOps/s $\textbf{\color{#35bf28}+5.04\%}$
test_compile_add_one_flat[tensordict-eager] 0.7595ms 0.2101ms 4.7597 KOps/s 4.6530 KOps/s $\color{#35bf28}+2.29\%$
test_compile_add_one_flat[tensorclass-compile] 0.1269ms 52.2822μs 19.1270 KOps/s 17.4504 KOps/s $\textbf{\color{#35bf28}+9.61\%}$
test_compile_add_one_flat[tensorclass-eager] 0.4547ms 71.3767μs 14.0102 KOps/s 14.1363 KOps/s $\color{#d91a1a}-0.89\%$
test_compile_add_one_flat[pytree-compile] 0.6306ms 0.1136ms 8.8061 KOps/s 8.6940 KOps/s $\color{#35bf28}+1.29\%$
test_compile_add_one_flat[pytree-eager] 0.6002ms 0.2995ms 3.3391 KOps/s 3.2430 KOps/s $\color{#35bf28}+2.96\%$
test_compile_add_self_flat[tensordict-eager] 0.3872ms 0.2176ms 4.5947 KOps/s 4.5223 KOps/s $\color{#35bf28}+1.60\%$
test_compile_add_self_flat[tensordict-compile] 0.2125ms 0.1148ms 8.7085 KOps/s 8.5921 KOps/s $\color{#35bf28}+1.35\%$
test_compile_add_self_flat[tensorclass-eager] 0.3152ms 63.0258μs 15.8665 KOps/s 15.8816 KOps/s $\color{#d91a1a}-0.09\%$
test_compile_add_self_flat[tensorclass-compile] 0.1233ms 53.8319μs 18.5763 KOps/s 17.8261 KOps/s $\color{#35bf28}+4.21\%$
test_compile_add_self_flat[pytree-eager] 0.6478ms 0.2454ms 4.0749 KOps/s 4.0201 KOps/s $\color{#35bf28}+1.36\%$
test_compile_add_self_flat[pytree-compile] 0.1890ms 0.1118ms 8.9424 KOps/s 8.7276 KOps/s $\color{#35bf28}+2.46\%$
test_compile_copy_flat[tensordict-compile] 57.4870μs 21.6270μs 46.2385 KOps/s 47.2645 KOps/s $\color{#d91a1a}-2.17\%$
test_compile_copy_flat[tensordict-eager] 0.1316ms 60.0472μs 16.6536 KOps/s 16.7233 KOps/s $\color{#d91a1a}-0.42\%$
test_compile_copy_flat[pytree-compile] 0.1632ms 79.4468μs 12.5870 KOps/s 11.3378 KOps/s $\textbf{\color{#35bf28}+11.02\%}$
test_compile_copy_flat[pytree-eager] 0.1270ms 69.6111μs 14.3655 KOps/s 14.1133 KOps/s $\color{#35bf28}+1.79\%$
test_compile_assign_and_add[tensordict-compile] 0.4545ms 0.2191ms 4.5631 KOps/s 4.5311 KOps/s $\color{#35bf28}+0.71\%$
test_compile_assign_and_add[tensordict-eager] 2.8607ms 1.7673ms 565.8406 Ops/s 561.3136 Ops/s $\color{#35bf28}+0.81\%$
test_compile_assign_and_add[pytree-compile] 0.2954ms 0.2110ms 4.7384 KOps/s 4.6418 KOps/s $\color{#35bf28}+2.08\%$
test_compile_assign_and_add[pytree-eager] 1.9785ms 1.1743ms 851.5532 Ops/s 838.4011 Ops/s $\color{#35bf28}+1.57\%$
test_compile_assign_and_add_stack[compile] 0.5726ms 0.4648ms 2.1514 KOps/s 2.1203 KOps/s $\color{#35bf28}+1.47\%$
test_compile_assign_and_add_stack[eager] 4.2133ms 3.9498ms 253.1745 Ops/s 247.6838 Ops/s $\color{#35bf28}+2.22\%$
test_compile_indexing[tensor-tensordict-compile] 0.1092ms 44.4576μs 22.4934 KOps/s 21.9444 KOps/s $\color{#35bf28}+2.50\%$
test_compile_indexing[tensor-tensordict-eager] 0.5112ms 50.5057μs 19.7997 KOps/s 19.4237 KOps/s $\color{#35bf28}+1.94\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1072ms 36.8802μs 27.1148 KOps/s 25.6426 KOps/s $\textbf{\color{#35bf28}+5.74\%}$
test_compile_indexing[tensor-tensorclass-eager] 78.4670μs 28.8343μs 34.6809 KOps/s 33.7988 KOps/s $\color{#35bf28}+2.61\%$
test_compile_indexing[tensor-pytree-compile] 0.1151ms 37.2440μs 26.8500 KOps/s 24.9589 KOps/s $\textbf{\color{#35bf28}+7.58\%}$
test_compile_indexing[tensor-pytree-eager] 0.1011ms 28.8898μs 34.6143 KOps/s 33.4919 KOps/s $\color{#35bf28}+3.35\%$
test_compile_indexing[slice-tensordict-compile] 0.1424ms 77.9583μs 12.8274 KOps/s 12.3422 KOps/s $\color{#35bf28}+3.93\%$
test_compile_indexing[slice-tensordict-eager] 0.5571ms 29.1719μs 34.2795 KOps/s 34.8753 KOps/s $\color{#d91a1a}-1.71\%$
test_compile_indexing[slice-tensorclass-compile] 0.5569ms 73.0235μs 13.6942 KOps/s 13.7989 KOps/s $\color{#d91a1a}-0.76\%$
test_compile_indexing[slice-tensorclass-eager] 80.9520μs 23.4144μs 42.7088 KOps/s 42.2024 KOps/s $\color{#35bf28}+1.20\%$
test_compile_indexing[slice-pytree-compile] 0.1570ms 71.6312μs 13.9604 KOps/s 13.5407 KOps/s $\color{#35bf28}+3.10\%$
test_compile_indexing[slice-pytree-eager] 73.5170μs 23.1745μs 43.1508 KOps/s 42.1678 KOps/s $\color{#35bf28}+2.33\%$
test_compile_indexing[int-tensordict-compile] 0.2259ms 80.6972μs 12.3920 KOps/s 12.2007 KOps/s $\color{#35bf28}+1.57\%$
test_compile_indexing[int-tensordict-eager] 1.1007ms 29.3952μs 34.0192 KOps/s 34.6861 KOps/s $\color{#d91a1a}-1.92\%$
test_compile_indexing[int-tensorclass-compile] 0.1617ms 71.4076μs 14.0041 KOps/s 13.7303 KOps/s $\color{#35bf28}+1.99\%$
test_compile_indexing[int-tensorclass-eager] 68.2280μs 23.2690μs 42.9755 KOps/s 41.8765 KOps/s $\color{#35bf28}+2.62\%$
test_compile_indexing[int-pytree-compile] 0.1598ms 71.0495μs 14.0747 KOps/s 13.7412 KOps/s $\color{#35bf28}+2.43\%$
test_compile_indexing[int-pytree-eager] 87.2130μs 23.5938μs 42.3841 KOps/s 42.2316 KOps/s $\color{#35bf28}+0.36\%$
test_mod_add[eager] 85.0690μs 26.7797μs 37.3418 KOps/s 36.8857 KOps/s $\color{#35bf28}+1.24\%$
test_mod_add[compile] 0.1215ms 44.4078μs 22.5186 KOps/s 21.1640 KOps/s $\textbf{\color{#35bf28}+6.40\%}$
test_mod_add[compile-overhead] 0.1181ms 44.5136μs 22.4650 KOps/s 21.6689 KOps/s $\color{#35bf28}+3.67\%$
test_mod_wrap[eager] 0.4209ms 0.2200ms 4.5459 KOps/s 4.5140 KOps/s $\color{#35bf28}+0.71\%$
test_mod_wrap[compile] 1.6748ms 0.2036ms 4.9110 KOps/s 4.8314 KOps/s $\color{#35bf28}+1.65\%$
test_mod_wrap[compile-overhead] 1.6821ms 0.2021ms 4.9475 KOps/s 4.8427 KOps/s $\color{#35bf28}+2.16\%$
test_mod_wrap_and_backward[eager] 18.7637ms 13.1051ms 76.3064 Ops/s 81.8971 Ops/s $\textbf{\color{#d91a1a}-6.83\%}$
test_mod_wrap_and_backward[compile] 17.6042ms 12.1402ms 82.3713 Ops/s 74.8248 Ops/s $\textbf{\color{#35bf28}+10.09\%}$
test_mod_wrap_and_backward[compile-overhead] 13.1888ms 10.7032ms 93.4299 Ops/s 87.4501 Ops/s $\textbf{\color{#35bf28}+6.84\%}$
test_seq_add[eager] 0.1638ms 90.5444μs 11.0443 KOps/s 10.3746 KOps/s $\textbf{\color{#35bf28}+6.46\%}$
test_seq_add[compile] 0.1182ms 59.3116μs 16.8601 KOps/s 16.6813 KOps/s $\color{#35bf28}+1.07\%$
test_seq_add[compile-overhead] 0.1198ms 58.2017μs 17.1816 KOps/s 17.0095 KOps/s $\color{#35bf28}+1.01\%$
test_seq_wrap[eager] 0.7278ms 0.4017ms 2.4893 KOps/s 2.4767 KOps/s $\color{#35bf28}+0.51\%$
test_seq_wrap[compile] 0.3647ms 0.2268ms 4.4095 KOps/s 4.4148 KOps/s $\color{#d91a1a}-0.12\%$
test_seq_wrap[compile-overhead] 0.4283ms 0.2261ms 4.4230 KOps/s 4.4593 KOps/s $\color{#d91a1a}-0.81\%$
test_func_call_runtime[False-eager] 1.2254ms 0.5549ms 1.8021 KOps/s 1.7828 KOps/s $\color{#35bf28}+1.08\%$
test_func_call_runtime[False-compile] 0.7822ms 0.4247ms 2.3547 KOps/s 2.3310 KOps/s $\color{#35bf28}+1.01\%$
test_func_call_runtime[False-compile-overhead] 0.5228ms 0.4260ms 2.3476 KOps/s 2.2045 KOps/s $\textbf{\color{#35bf28}+6.49\%}$
test_func_call_runtime[True-eager] 0.9246ms 0.7709ms 1.2972 KOps/s 1.3022 KOps/s $\color{#d91a1a}-0.38\%$
test_func_call_runtime[True-compile] 0.6935ms 0.4625ms 2.1623 KOps/s 2.1097 KOps/s $\color{#35bf28}+2.49\%$
test_func_call_runtime[True-compile-overhead] 0.6157ms 0.4670ms 2.1414 KOps/s 2.1302 KOps/s $\color{#35bf28}+0.52\%$
test_func_call_cm_runtime[False-eager] 0.9651ms 0.5566ms 1.7966 KOps/s 1.8013 KOps/s $\color{#d91a1a}-0.26\%$
test_func_call_cm_runtime[False-compile] 0.8147ms 0.4261ms 2.3469 KOps/s 2.3156 KOps/s $\color{#35bf28}+1.35\%$
test_func_call_cm_runtime[False-compile-overhead] 0.8769ms 0.4243ms 2.3570 KOps/s 2.3353 KOps/s $\color{#35bf28}+0.93\%$
test_func_call_cm_runtime[True-eager] 1.3964ms 0.9084ms 1.1008 KOps/s 1.1030 KOps/s $\color{#d91a1a}-0.19\%$
test_func_call_cm_runtime[True-compile] 1.0011ms 0.4892ms 2.0443 KOps/s 2.0132 KOps/s $\color{#35bf28}+1.55\%$
test_func_call_cm_runtime[True-compile-overhead] 1.0177ms 0.4940ms 2.0244 KOps/s 1.9634 KOps/s $\color{#35bf28}+3.11\%$
test_vmap_func_call_cm_runtime[eager] 2.5140ms 1.8629ms 536.8002 Ops/s 525.5669 Ops/s $\color{#35bf28}+2.14\%$
test_vmap_func_call_cm_runtime[compile] 0.8839ms 0.5153ms 1.9405 KOps/s 1.9178 KOps/s $\color{#35bf28}+1.18\%$
test_vmap_func_call_cm_runtime[compile-overhead] 1.6189ms 0.5170ms 1.9342 KOps/s 1.9153 KOps/s $\color{#35bf28}+0.99\%$
test_distributed 0.2825ms 0.1233ms 8.1096 KOps/s 7.8963 KOps/s $\color{#35bf28}+2.70\%$
test_tdmodule 34.2540μs 18.7153μs 53.4321 KOps/s 51.5716 KOps/s $\color{#35bf28}+3.61\%$
test_tdmodule_dispatch 62.4470μs 36.5667μs 27.3473 KOps/s 26.3815 KOps/s $\color{#35bf28}+3.66\%$
test_tdseq 47.6890μs 21.3079μs 46.9309 KOps/s 45.4890 KOps/s $\color{#35bf28}+3.17\%$
test_tdseq_dispatch 67.6170μs 40.8210μs 24.4972 KOps/s 23.1888 KOps/s $\textbf{\color{#35bf28}+5.64\%}$
test_instantiation_functorch 2.4139ms 1.5329ms 652.3370 Ops/s 649.6572 Ops/s $\color{#35bf28}+0.41\%$
test_exec_functorch 0.3363ms 0.1795ms 5.5700 KOps/s 5.4672 KOps/s $\color{#35bf28}+1.88\%$
test_exec_functional_call 0.2873ms 0.1752ms 5.7092 KOps/s 5.5880 KOps/s $\color{#35bf28}+2.17\%$
test_exec_td_decorator 0.5023ms 0.2304ms 4.3397 KOps/s 4.3127 KOps/s $\color{#35bf28}+0.63\%$
test_vmap_mlp_speed_decorator[True-True] 0.8587ms 0.6277ms 1.5930 KOps/s 1.5635 KOps/s $\color{#35bf28}+1.89\%$
test_vmap_mlp_speed_decorator[True-False] 0.9204ms 0.6262ms 1.5969 KOps/s 1.5541 KOps/s $\color{#35bf28}+2.75\%$
test_vmap_mlp_speed_decorator[False-True] 0.6986ms 0.5134ms 1.9478 KOps/s 1.8767 KOps/s $\color{#35bf28}+3.79\%$
test_vmap_mlp_speed_decorator[False-False] 0.7983ms 0.5134ms 1.9477 KOps/s 1.9055 KOps/s $\color{#35bf28}+2.21\%$
test_to_module_speed[True] 1.3820ms 1.2839ms 778.9051 Ops/s 784.6799 Ops/s $\color{#d91a1a}-0.74\%$
test_to_module_speed[False] 1.6255ms 1.2676ms 788.9102 Ops/s 795.8454 Ops/s $\color{#d91a1a}-0.87\%$
test_tc_init 98.3140μs 45.8439μs 21.8132 KOps/s 21.6767 KOps/s $\color{#35bf28}+0.63\%$
test_tc_init_nested 0.1985ms 90.2572μs 11.0794 KOps/s 10.9272 KOps/s $\color{#35bf28}+1.39\%$
test_tc_first_layer_tensor 26.0980μs 1.5335μs 652.1134 KOps/s 657.9800 KOps/s $\color{#d91a1a}-0.89\%$
test_tc_first_layer_nontensor 43.9930μs 4.7536μs 210.3669 KOps/s 201.8592 KOps/s $\color{#35bf28}+4.21\%$
test_tc_second_layer_tensor 18.4550μs 2.8354μs 352.6895 KOps/s 363.7948 KOps/s $\color{#d91a1a}-3.05\%$
test_tc_second_layer_nontensor 24.5770μs 6.0851μs 164.3359 KOps/s 159.4528 KOps/s $\color{#35bf28}+3.06\%$
test_unbind 0.2080s 12.1825ms 82.0846 Ops/s 76.4124 Ops/s $\textbf{\color{#35bf28}+7.42\%}$
test_full_like 7.8346ms 6.7981ms 147.0993 Ops/s 143.9310 Ops/s $\color{#35bf28}+2.20\%$
test_zeros_like 3.0139ms 2.6110ms 382.9892 Ops/s 367.2204 Ops/s $\color{#35bf28}+4.29\%$
test_ones_like 3.3971ms 3.0680ms 325.9440 Ops/s 320.4610 Ops/s $\color{#35bf28}+1.71\%$
test_clone 5.3023ms 4.8360ms 206.7840 Ops/s 204.9076 Ops/s $\color{#35bf28}+0.92\%$
test_squeeze 62.4660μs 12.0414μs 83.0471 KOps/s 86.9250 KOps/s $\color{#d91a1a}-4.46\%$
test_unsqueeze 0.2105ms 88.4146μs 11.3104 KOps/s 11.5021 KOps/s $\color{#d91a1a}-1.67\%$
test_split 0.4854ms 0.1841ms 5.4305 KOps/s 5.2653 KOps/s $\color{#35bf28}+3.14\%$
test_permute 0.3362ms 0.2135ms 4.6842 KOps/s 4.6497 KOps/s $\color{#35bf28}+0.74\%$
test_stack 26.8154ms 24.2457ms 41.2444 Ops/s 39.9090 Ops/s $\color{#35bf28}+3.35\%$
test_cat 29.1559ms 24.0184ms 41.6347 Ops/s 40.1004 Ops/s $\color{#35bf28}+3.83\%$

Copy link

github-actions bot commented Nov 1, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 222. Improved: $\large\color{#35bf28}31$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 45.3910μs 14.8827μs 67.1923 KOps/s 63.2061 KOps/s $\textbf{\color{#35bf28}+6.31\%}$
test_plain_set_stack_nested 0.1188ms 15.1261μs 66.1110 KOps/s 62.3097 KOps/s $\textbf{\color{#35bf28}+6.10\%}$
test_plain_set_nested_inplace 60.5710μs 15.8973μs 62.9038 KOps/s 59.4723 KOps/s $\textbf{\color{#35bf28}+5.77\%}$
test_plain_set_stack_nested_inplace 80.2110μs 15.8711μs 63.0075 KOps/s 59.1612 KOps/s $\textbf{\color{#35bf28}+6.50\%}$
test_items 22.9300μs 2.9212μs 342.3247 KOps/s 341.9150 KOps/s $\color{#35bf28}+0.12\%$
test_items_nested 0.4480ms 0.3190ms 3.1349 KOps/s 3.1615 KOps/s $\color{#d91a1a}-0.84\%$
test_items_nested_locked 0.4144ms 0.3217ms 3.1086 KOps/s 3.1491 KOps/s $\color{#d91a1a}-1.29\%$
test_items_nested_leaf 88.9710μs 58.3364μs 17.1420 KOps/s 17.1112 KOps/s $\color{#35bf28}+0.18\%$
test_items_stack_nested 0.3831ms 0.3256ms 3.0717 KOps/s 3.1757 KOps/s $\color{#d91a1a}-3.28\%$
test_items_stack_nested_leaf 88.9020μs 59.3970μs 16.8359 KOps/s 17.0379 KOps/s $\color{#d91a1a}-1.19\%$
test_items_stack_nested_locked 0.4122ms 0.3266ms 3.0617 KOps/s 3.1594 KOps/s $\color{#d91a1a}-3.09\%$
test_keys 0.6847ms 3.4859μs 286.8707 KOps/s 288.5850 KOps/s $\color{#d91a1a}-0.59\%$
test_keys_nested 0.1002ms 71.9301μs 13.9024 KOps/s 13.8355 KOps/s $\color{#35bf28}+0.48\%$
test_keys_nested_locked 2.7462ms 77.5944μs 12.8875 KOps/s 12.8478 KOps/s $\color{#35bf28}+0.31\%$
test_keys_nested_leaf 91.5710μs 63.2571μs 15.8085 KOps/s 15.7421 KOps/s $\color{#35bf28}+0.42\%$
test_keys_stack_nested 0.1201ms 72.6514μs 13.7644 KOps/s 13.6552 KOps/s $\color{#35bf28}+0.80\%$
test_keys_stack_nested_leaf 91.4020μs 63.6645μs 15.7074 KOps/s 15.6833 KOps/s $\color{#35bf28}+0.15\%$
test_keys_stack_nested_locked 0.2483ms 77.4961μs 12.9039 KOps/s 12.8106 KOps/s $\color{#35bf28}+0.73\%$
test_values 32.2523μs 0.8591μs 1.1640 MOps/s 1.1553 MOps/s $\color{#35bf28}+0.75\%$
test_values_nested 57.9910μs 30.9086μs 32.3534 KOps/s 32.2493 KOps/s $\color{#35bf28}+0.32\%$
test_values_nested_locked 65.3410μs 32.5035μs 30.7659 KOps/s 30.5315 KOps/s $\color{#35bf28}+0.77\%$
test_values_nested_leaf 61.9610μs 33.5559μs 29.8010 KOps/s 29.7061 KOps/s $\color{#35bf28}+0.32\%$
test_values_stack_nested 63.8910μs 31.4634μs 31.7830 KOps/s 31.9635 KOps/s $\color{#d91a1a}-0.56\%$
test_values_stack_nested_leaf 55.5210μs 34.0709μs 29.3506 KOps/s 29.5765 KOps/s $\color{#d91a1a}-0.76\%$
test_values_stack_nested_locked 59.9710μs 32.6876μs 30.5926 KOps/s 30.3586 KOps/s $\color{#35bf28}+0.77\%$
test_membership 1.6331μs 0.5053μs 1.9790 MOps/s 1.9640 MOps/s $\color{#35bf28}+0.76\%$
test_membership_nested 15.8855μs 1.9422μs 514.8729 KOps/s 532.5396 KOps/s $\color{#d91a1a}-3.32\%$
test_membership_nested_leaf 13.8700μs 1.9351μs 516.7654 KOps/s 531.4385 KOps/s $\color{#d91a1a}-2.76\%$
test_membership_stacked_nested 28.5510μs 2.0208μs 494.8518 KOps/s 515.1788 KOps/s $\color{#d91a1a}-3.95\%$
test_membership_stacked_nested_leaf 0.1912ms 2.0177μs 495.6232 KOps/s 521.6396 KOps/s $\color{#d91a1a}-4.99\%$
test_membership_nested_last 23.2210μs 2.8269μs 353.7411 KOps/s 355.3344 KOps/s $\color{#d91a1a}-0.45\%$
test_membership_nested_leaf_last 33.1810μs 2.8690μs 348.5521 KOps/s 356.7887 KOps/s $\color{#d91a1a}-2.31\%$
test_membership_stacked_nested_last 0.1997ms 2.8886μs 346.1929 KOps/s 358.2392 KOps/s $\color{#d91a1a}-3.36\%$
test_membership_stacked_nested_leaf_last 28.5100μs 2.8447μs 351.5342 KOps/s 358.8436 KOps/s $\color{#d91a1a}-2.04\%$
test_nested_getleaf 26.2100μs 5.9999μs 166.6688 KOps/s 167.2306 KOps/s $\color{#d91a1a}-0.34\%$
test_nested_get 33.8510μs 5.6845μs 175.9158 KOps/s 176.7904 KOps/s $\color{#d91a1a}-0.49\%$
test_stacked_getleaf 30.8810μs 5.9709μs 167.4781 KOps/s 166.9726 KOps/s $\color{#35bf28}+0.30\%$
test_stacked_get 31.1300μs 5.6780μs 176.1181 KOps/s 176.2536 KOps/s $\color{#d91a1a}-0.08\%$
test_nested_getitemleaf 30.6010μs 6.0791μs 164.4979 KOps/s 165.1749 KOps/s $\color{#d91a1a}-0.41\%$
test_nested_getitem 35.1310μs 5.7838μs 172.8974 KOps/s 174.1798 KOps/s $\color{#d91a1a}-0.74\%$
test_stacked_getitemleaf 26.2310μs 6.1089μs 163.6950 KOps/s 164.6724 KOps/s $\color{#d91a1a}-0.59\%$
test_stacked_getitem 32.2700μs 5.7904μs 172.6984 KOps/s 172.9578 KOps/s $\color{#d91a1a}-0.15\%$
test_lock_nested 5.4375ms 0.4261ms 2.3468 KOps/s 2.3218 KOps/s $\color{#35bf28}+1.07\%$
test_lock_stack_nested 0.5375ms 0.3930ms 2.5447 KOps/s 2.4841 KOps/s $\color{#35bf28}+2.44\%$
test_unlock_nested 0.7615ms 0.3624ms 2.7596 KOps/s 2.6749 KOps/s $\color{#35bf28}+3.17\%$
test_unlock_stack_nested 0.4249ms 0.3327ms 3.0053 KOps/s 2.9243 KOps/s $\color{#35bf28}+2.77\%$
test_flatten_speed 0.1335ms 74.0817μs 13.4986 KOps/s 13.4666 KOps/s $\color{#35bf28}+0.24\%$
test_unflatten_speed 0.3314ms 0.2923ms 3.4208 KOps/s 3.4058 KOps/s $\color{#35bf28}+0.44\%$
test_common_ops 1.5689ms 1.2352ms 809.5987 Ops/s 777.0831 Ops/s $\color{#35bf28}+4.18\%$
test_creation 23.5010μs 1.5893μs 629.2000 KOps/s 628.1802 KOps/s $\color{#35bf28}+0.16\%$
test_creation_empty 56.5310μs 16.2965μs 61.3628 KOps/s 54.2265 KOps/s $\textbf{\color{#35bf28}+13.16\%}$
test_creation_nested_1 53.0710μs 18.0808μs 55.3073 KOps/s 49.6555 KOps/s $\textbf{\color{#35bf28}+11.38\%}$
test_creation_nested_2 47.6110μs 20.6102μs 48.5198 KOps/s 44.0294 KOps/s $\textbf{\color{#35bf28}+10.20\%}$
test_clone 0.1832ms 28.6559μs 34.8968 KOps/s 34.5448 KOps/s $\color{#35bf28}+1.02\%$
test_getitem[int] 1.2449ms 16.5248μs 60.5152 KOps/s 59.7923 KOps/s $\color{#35bf28}+1.21\%$
test_getitem[slice_int] 0.1342ms 29.0648μs 34.4059 KOps/s 34.0083 KOps/s $\color{#35bf28}+1.17\%$
test_getitem[range] 0.2399ms 0.1130ms 8.8500 KOps/s 8.9038 KOps/s $\color{#d91a1a}-0.60\%$
test_getitem[tuple] 0.1507ms 25.3655μs 39.4236 KOps/s 39.9591 KOps/s $\color{#d91a1a}-1.34\%$
test_getitem[list] 0.2710ms 0.1005ms 9.9489 KOps/s 9.9172 KOps/s $\color{#35bf28}+0.32\%$
test_setitem_dim[int] 68.7510μs 43.8002μs 22.8309 KOps/s 22.9551 KOps/s $\color{#d91a1a}-0.54\%$
test_setitem_dim[slice_int] 0.2086ms 66.3746μs 15.0660 KOps/s 15.0765 KOps/s $\color{#d91a1a}-0.07\%$
test_setitem_dim[range] 0.2467ms 0.1273ms 7.8531 KOps/s 7.8574 KOps/s $\color{#d91a1a}-0.06\%$
test_setitem_dim[tuple] 0.1999ms 59.7974μs 16.7231 KOps/s 16.7032 KOps/s $\color{#35bf28}+0.12\%$
test_setitem 0.1932ms 41.8945μs 23.8695 KOps/s 23.5264 KOps/s $\color{#35bf28}+1.46\%$
test_set 0.2196ms 40.3789μs 24.7654 KOps/s 23.7098 KOps/s $\color{#35bf28}+4.45\%$
test_set_shared 0.4014ms 50.2158μs 19.9141 KOps/s 19.6094 KOps/s $\color{#35bf28}+1.55\%$
test_update 0.2261ms 49.1518μs 20.3451 KOps/s 18.9404 KOps/s $\textbf{\color{#35bf28}+7.42\%}$
test_update_nested 0.2339ms 56.8174μs 17.6002 KOps/s 16.8676 KOps/s $\color{#35bf28}+4.34\%$
test_update__nested 0.1991ms 58.4123μs 17.1197 KOps/s 16.2464 KOps/s $\textbf{\color{#35bf28}+5.38\%}$
test_set_nested 0.2290ms 42.5436μs 23.5053 KOps/s 22.0009 KOps/s $\textbf{\color{#35bf28}+6.84\%}$
test_set_nested_new 0.2038ms 46.5038μs 21.5036 KOps/s 20.6735 KOps/s $\color{#35bf28}+4.02\%$
test_select 0.2069ms 60.0347μs 16.6570 KOps/s 16.2701 KOps/s $\color{#35bf28}+2.38\%$
test_select_nested 70.9110μs 42.5966μs 23.4760 KOps/s 22.8503 KOps/s $\color{#35bf28}+2.74\%$
test_exclude_nested 97.0710μs 60.1861μs 16.6151 KOps/s 16.4659 KOps/s $\color{#35bf28}+0.91\%$
test_empty[True] 0.3378ms 0.2614ms 3.8253 KOps/s 3.8942 KOps/s $\color{#d91a1a}-1.77\%$
test_empty[False] 2.9760μs 0.8275μs 1.2084 MOps/s 1.1964 MOps/s $\color{#35bf28}+1.01\%$
test_to 0.1517ms 51.0070μs 19.6052 KOps/s 18.8272 KOps/s $\color{#35bf28}+4.13\%$
test_to_nonblocking 83.2610μs 49.1861μs 20.3309 KOps/s 19.3611 KOps/s $\textbf{\color{#35bf28}+5.01\%}$
test_unbind_speed 1.2034ms 0.2776ms 3.6019 KOps/s 3.4613 KOps/s $\color{#35bf28}+4.06\%$
test_unbind_speed_stack0 0.3764ms 0.2763ms 3.6190 KOps/s 3.5306 KOps/s $\color{#35bf28}+2.50\%$
test_unbind_speed_stack1 97.3206ms 0.7064ms 1.4156 KOps/s 1.3679 KOps/s $\color{#35bf28}+3.49\%$
test_split 0.1002s 2.2768ms 439.2089 Ops/s 433.9057 Ops/s $\color{#35bf28}+1.22\%$
test_chunk 0.1022s 2.2984ms 435.0909 Ops/s 430.5495 Ops/s $\color{#35bf28}+1.05\%$
test_to[False] 6.2445ms 5.9792ms 167.2460 Ops/s 157.6223 Ops/s $\textbf{\color{#35bf28}+6.11\%}$
test_to[True] 4.8230ms 4.4319ms 225.6348 Ops/s 220.7713 Ops/s $\color{#35bf28}+2.20\%$
test_to_njt[False] 0.3596s 0.2754s 3.6306 Ops/s 3.5710 Ops/s $\color{#35bf28}+1.67\%$
test_to_njt[True] 0.3735s 0.2878s 3.4744 Ops/s 3.7192 Ops/s $\textbf{\color{#d91a1a}-6.58\%}$
test_creation[device0] 0.4214ms 0.1289ms 7.7598 KOps/s 7.7932 KOps/s $\color{#d91a1a}-0.43\%$
test_creation_from_tensor 0.4662ms 0.1301ms 7.6875 KOps/s 7.6663 KOps/s $\color{#35bf28}+0.28\%$
test_add_one[memmap_tensor0] 0.1382ms 8.8341μs 113.1978 KOps/s 111.5715 KOps/s $\color{#35bf28}+1.46\%$
test_contiguous[memmap_tensor0] 35.0410μs 2.1896μs 456.7103 KOps/s 440.6131 KOps/s $\color{#35bf28}+3.65\%$
test_stack[memmap_tensor0] 0.1517ms 6.9770μs 143.3284 KOps/s 137.8855 KOps/s $\color{#35bf28}+3.95\%$
test_memmaptd_index 1.0569ms 0.4381ms 2.2828 KOps/s 2.2593 KOps/s $\color{#35bf28}+1.04\%$
test_memmaptd_index_astensor 0.8567ms 0.4933ms 2.0273 KOps/s 1.9877 KOps/s $\color{#35bf28}+1.99\%$
test_memmaptd_index_op 1.4155ms 1.0239ms 976.6815 Ops/s 909.3106 Ops/s $\textbf{\color{#35bf28}+7.41\%}$
test_serialize_model 0.1322s 0.1307s 7.6497 Ops/s 6.8789 Ops/s $\textbf{\color{#35bf28}+11.20\%}$
test_serialize_model_pickle 1.3604s 1.2132s 0.8243 Ops/s 0.8233 Ops/s $\color{#35bf28}+0.12\%$
test_serialize_weights 0.1308s 0.1302s 7.6793 Ops/s 7.6795 Ops/s $-0.00\%$
test_serialize_weights_returnearly 0.2239s 55.8344ms 17.9101 Ops/s 17.8015 Ops/s $\color{#35bf28}+0.61\%$
test_serialize_weights_pickle 1.3457s 1.2128s 0.8245 Ops/s 0.8191 Ops/s $\color{#35bf28}+0.66\%$
test_reshape_pytree 0.1564ms 36.4005μs 27.4721 KOps/s 26.7927 KOps/s $\color{#35bf28}+2.54\%$
test_reshape_td 73.6510μs 41.4370μs 24.1330 KOps/s 23.8291 KOps/s $\color{#35bf28}+1.28\%$
test_view_pytree 0.1703ms 36.4133μs 27.4625 KOps/s 27.0844 KOps/s $\color{#35bf28}+1.40\%$
test_view_td 0.1727ms 46.3442μs 21.5777 KOps/s 21.4751 KOps/s $\color{#35bf28}+0.48\%$
test_unbind_pytree 0.1359ms 35.4885μs 28.1782 KOps/s 27.8468 KOps/s $\color{#35bf28}+1.19\%$
test_unbind_td 0.5523ms 42.6241μs 23.4609 KOps/s 21.9740 KOps/s $\textbf{\color{#35bf28}+6.77\%}$
test_split_pytree 0.5121ms 47.2680μs 21.1560 KOps/s 21.4317 KOps/s $\color{#d91a1a}-1.29\%$
test_split_td 0.2733ms 59.1310μs 16.9116 KOps/s 14.4819 KOps/s $\textbf{\color{#35bf28}+16.78\%}$
test_add_pytree 0.2093ms 59.2161μs 16.8873 KOps/s 16.6475 KOps/s $\color{#35bf28}+1.44\%$
test_add_td 0.2438ms 89.3914μs 11.1868 KOps/s 10.1921 KOps/s $\textbf{\color{#35bf28}+9.76\%}$
test_compile_add_one_nested[tensordict-compile] 0.3098ms 0.1634ms 6.1183 KOps/s 5.9152 KOps/s $\color{#35bf28}+3.43\%$
test_compile_add_one_nested[tensordict-eager] 0.5504ms 0.1527ms 6.5476 KOps/s 6.3960 KOps/s $\color{#35bf28}+2.37\%$
test_compile_add_one_nested[pytree-compile] 0.5701ms 0.1582ms 6.3226 KOps/s 6.1999 KOps/s $\color{#35bf28}+1.98\%$
test_compile_add_one_nested[pytree-eager] 0.5876ms 0.1830ms 5.4638 KOps/s 5.4085 KOps/s $\color{#35bf28}+1.02\%$
test_compile_copy_nested[tensordict-compile] 0.4304ms 21.8075μs 45.8557 KOps/s 45.3592 KOps/s $\color{#35bf28}+1.09\%$
test_compile_copy_nested[tensordict-eager] 0.4710ms 45.6090μs 21.9255 KOps/s 21.9116 KOps/s $\color{#35bf28}+0.06\%$
test_compile_copy_nested[pytree-compile] 0.4669ms 68.5084μs 14.5967 KOps/s 14.7691 KOps/s $\color{#d91a1a}-1.17\%$
test_compile_copy_nested[pytree-eager] 0.4440ms 52.6213μs 19.0037 KOps/s 19.3001 KOps/s $\color{#d91a1a}-1.54\%$
test_compile_add_one_flat[tensordict-compile] 0.4529ms 0.3184ms 3.1411 KOps/s 3.1147 KOps/s $\color{#35bf28}+0.85\%$
test_compile_add_one_flat[tensordict-eager] 0.6074ms 0.2147ms 4.6584 KOps/s 4.6036 KOps/s $\color{#35bf28}+1.19\%$
test_compile_add_one_flat[tensorclass-compile] 0.2836ms 0.1362ms 7.3443 KOps/s 7.2187 KOps/s $\color{#35bf28}+1.74\%$
test_compile_add_one_flat[tensorclass-eager] 0.4667ms 62.9575μs 15.8837 KOps/s 15.7623 KOps/s $\color{#35bf28}+0.77\%$
test_compile_add_one_flat[pytree-compile] 0.4607ms 0.3252ms 3.0749 KOps/s 3.0350 KOps/s $\color{#35bf28}+1.32\%$
test_compile_add_one_flat[pytree-eager] 1.0318ms 0.6500ms 1.5385 KOps/s 1.6019 KOps/s $\color{#d91a1a}-3.96\%$
test_compile_add_self_flat[tensordict-eager] 0.6605ms 0.2553ms 3.9164 KOps/s 3.8592 KOps/s $\color{#35bf28}+1.48\%$
test_compile_add_self_flat[tensordict-compile] 0.5183ms 0.3217ms 3.1086 KOps/s 3.0925 KOps/s $\color{#35bf28}+0.52\%$
test_compile_add_self_flat[tensorclass-eager] 0.4922ms 73.3706μs 13.6294 KOps/s 14.1336 KOps/s $\color{#d91a1a}-3.57\%$
test_compile_add_self_flat[tensorclass-compile] 0.5313ms 0.1362ms 7.3412 KOps/s 7.4573 KOps/s $\color{#d91a1a}-1.56\%$
test_compile_add_self_flat[pytree-eager] 0.9240ms 0.5133ms 1.9482 KOps/s 1.9203 KOps/s $\color{#35bf28}+1.45\%$
test_compile_add_self_flat[pytree-compile] 0.4776ms 0.3262ms 3.0658 KOps/s 3.0281 KOps/s $\color{#35bf28}+1.24\%$
test_compile_copy_flat[tensordict-compile] 0.4206ms 18.3225μs 54.5776 KOps/s 54.3835 KOps/s $\color{#35bf28}+0.36\%$
test_compile_copy_flat[tensordict-eager] 0.4083ms 28.7589μs 34.7718 KOps/s 34.4127 KOps/s $\color{#35bf28}+1.04\%$
test_compile_copy_flat[pytree-compile] 0.2406ms 69.7395μs 14.3391 KOps/s 14.1845 KOps/s $\color{#35bf28}+1.09\%$
test_compile_copy_flat[pytree-eager] 0.4553ms 52.1770μs 19.1655 KOps/s 19.1614 KOps/s $\color{#35bf28}+0.02\%$
test_compile_assign_and_add[tensordict-compile] 2.3991ms 0.8302ms 1.2046 KOps/s 1.0996 KOps/s $\textbf{\color{#35bf28}+9.55\%}$
test_compile_assign_and_add[tensordict-eager] 3.7344ms 3.3650ms 297.1794 Ops/s 309.5542 Ops/s $\color{#d91a1a}-4.00\%$
test_compile_assign_and_add[pytree-compile] 2.4014ms 0.8461ms 1.1819 KOps/s 1.0788 KOps/s $\textbf{\color{#35bf28}+9.56\%}$
test_compile_assign_and_add[pytree-eager] 3.6678ms 3.2047ms 312.0444 Ops/s 306.9339 Ops/s $\color{#35bf28}+1.67\%$
test_compile_indexing[tensor-tensordict-compile] 0.3045ms 0.1244ms 8.0413 KOps/s 8.0439 KOps/s $\color{#d91a1a}-0.03\%$
test_compile_indexing[tensor-tensordict-eager] 0.4594ms 62.2380μs 16.0674 KOps/s 16.2546 KOps/s $\color{#d91a1a}-1.15\%$
test_compile_indexing[tensor-tensorclass-compile] 0.2688ms 0.1235ms 8.0973 KOps/s 8.4906 KOps/s $\color{#d91a1a}-4.63\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2255ms 47.1888μs 21.1915 KOps/s 23.4059 KOps/s $\textbf{\color{#d91a1a}-9.46\%}$
test_compile_indexing[tensor-pytree-compile] 0.2803ms 0.1231ms 8.1213 KOps/s 8.4178 KOps/s $\color{#d91a1a}-3.52\%$
test_compile_indexing[tensor-pytree-eager] 0.4351ms 45.1487μs 22.1490 KOps/s 23.5004 KOps/s $\textbf{\color{#d91a1a}-5.75\%}$
test_compile_indexing[slice-tensordict-compile] 0.5608ms 0.1542ms 6.4850 KOps/s 6.5438 KOps/s $\color{#d91a1a}-0.90\%$
test_compile_indexing[slice-tensordict-eager] 0.4180ms 26.8091μs 37.3007 KOps/s 37.1050 KOps/s $\color{#35bf28}+0.53\%$
test_compile_indexing[slice-tensorclass-compile] 0.2960ms 0.1449ms 6.9009 KOps/s 6.8299 KOps/s $\color{#35bf28}+1.04\%$
test_compile_indexing[slice-tensorclass-eager] 0.4257ms 21.7490μs 45.9792 KOps/s 45.3865 KOps/s $\color{#35bf28}+1.31\%$
test_compile_indexing[slice-pytree-compile] 0.5658ms 0.1458ms 6.8584 KOps/s 6.8147 KOps/s $\color{#35bf28}+0.64\%$
test_compile_indexing[slice-pytree-eager] 0.4097ms 25.1111μs 39.8230 KOps/s 45.0698 KOps/s $\textbf{\color{#d91a1a}-11.64\%}$
test_compile_indexing[int-tensordict-compile] 0.5449ms 0.1527ms 6.5495 KOps/s 6.5138 KOps/s $\color{#35bf28}+0.55\%$
test_compile_indexing[int-tensordict-eager] 0.4630ms 26.6045μs 37.5876 KOps/s 36.7807 KOps/s $\color{#35bf28}+2.19\%$
test_compile_indexing[int-tensorclass-compile] 0.5335ms 0.1461ms 6.8449 KOps/s 6.8245 KOps/s $\color{#35bf28}+0.30\%$
test_compile_indexing[int-tensorclass-eager] 0.4103ms 21.6698μs 46.1471 KOps/s 45.7231 KOps/s $\color{#35bf28}+0.93\%$
test_compile_indexing[int-pytree-compile] 0.5426ms 0.1460ms 6.8507 KOps/s 6.8249 KOps/s $\color{#35bf28}+0.38\%$
test_compile_indexing[int-pytree-eager] 0.1480ms 21.5909μs 46.3158 KOps/s 45.7294 KOps/s $\color{#35bf28}+1.28\%$
test_mod_add[eager] 0.2173ms 33.4338μs 29.9099 KOps/s 29.1580 KOps/s $\color{#35bf28}+2.58\%$
test_mod_add[compile] 0.2397ms 77.8976μs 12.8374 KOps/s 12.8783 KOps/s $\color{#d91a1a}-0.32\%$
test_mod_add[compile-overhead] 0.3102ms 0.1540ms 6.4954 KOps/s 5.6300 KOps/s $\textbf{\color{#35bf28}+15.37\%}$
test_mod_wrap[eager] 0.4464ms 0.2537ms 3.9409 KOps/s 4.1129 KOps/s $\color{#d91a1a}-4.18\%$
test_mod_wrap[compile] 0.4338ms 0.2834ms 3.5286 KOps/s 3.5027 KOps/s $\color{#35bf28}+0.74\%$
test_mod_wrap[compile-overhead] 7.7923ms 4.1204ms 242.6965 Ops/s 243.7769 Ops/s $\color{#d91a1a}-0.44\%$
test_mod_wrap_and_backward[eager] 1.6594ms 1.3365ms 748.2189 Ops/s 705.5210 Ops/s $\textbf{\color{#35bf28}+6.05\%}$
test_mod_wrap_and_backward[compile] 1.5125ms 1.2576ms 795.1582 Ops/s 719.7293 Ops/s $\textbf{\color{#35bf28}+10.48\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3438ms 0.9116ms 1.0970 KOps/s 955.8758 Ops/s $\textbf{\color{#35bf28}+14.76\%}$
test_seq_add[eager] 0.2458ms 97.9066μs 10.2138 KOps/s 9.8975 KOps/s $\color{#35bf28}+3.20\%$
test_seq_add[compile] 0.2380ms 87.1892μs 11.4693 KOps/s 11.3944 KOps/s $\color{#35bf28}+0.66\%$
test_seq_add[compile-overhead] 0.2740ms 0.1257ms 7.9563 KOps/s 7.7745 KOps/s $\color{#35bf28}+2.34\%$
test_seq_wrap[eager] 0.5166ms 0.3756ms 2.6627 KOps/s 2.5319 KOps/s $\textbf{\color{#35bf28}+5.17\%}$
test_seq_wrap[compile] 0.4500ms 0.3008ms 3.3248 KOps/s 3.2818 KOps/s $\color{#35bf28}+1.31\%$
test_seq_wrap[compile-overhead] 0.3734ms 0.2226ms 4.4915 KOps/s 4.4314 KOps/s $\color{#35bf28}+1.36\%$
test_func_call_runtime[False-eager] 0.8984ms 0.7251ms 1.3792 KOps/s 1.2802 KOps/s $\textbf{\color{#35bf28}+7.73\%}$
test_func_call_runtime[False-compile] 0.9765ms 0.7573ms 1.3204 KOps/s 1.2909 KOps/s $\color{#35bf28}+2.29\%$
test_func_call_runtime[False-compile-overhead] 0.5438ms 0.3649ms 2.7403 KOps/s 2.7358 KOps/s $\color{#35bf28}+0.16\%$
test_func_call_runtime[True-eager] 1.0699ms 0.8826ms 1.1330 KOps/s 1.1198 KOps/s $\color{#35bf28}+1.18\%$
test_func_call_runtime[True-compile] 0.9377ms 0.7799ms 1.2823 KOps/s 1.2773 KOps/s $\color{#35bf28}+0.39\%$
test_func_call_runtime[True-compile-overhead] 0.5759ms 0.3825ms 2.6141 KOps/s 2.6055 KOps/s $\color{#35bf28}+0.33\%$
test_func_call_cm_runtime[False-eager] 0.8678ms 0.7230ms 1.3831 KOps/s 1.3188 KOps/s $\color{#35bf28}+4.87\%$
test_func_call_cm_runtime[False-compile] 0.9123ms 0.7590ms 1.3175 KOps/s 1.2954 KOps/s $\color{#35bf28}+1.71\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5045ms 0.3639ms 2.7478 KOps/s 2.7318 KOps/s $\color{#35bf28}+0.58\%$
test_func_call_cm_runtime[True-eager] 1.1897ms 0.9938ms 1.0062 KOps/s 1.0035 KOps/s $\color{#35bf28}+0.27\%$
test_func_call_cm_runtime[True-compile] 0.9863ms 0.8077ms 1.2380 KOps/s 1.2180 KOps/s $\color{#35bf28}+1.64\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5667ms 0.4093ms 2.4434 KOps/s 2.4200 KOps/s $\color{#35bf28}+0.96\%$
test_vmap_func_call_cm_runtime[eager] 2.6249ms 2.0356ms 491.2613 Ops/s 489.1744 Ops/s $\color{#35bf28}+0.43\%$
test_vmap_func_call_cm_runtime[compile] 1.0223ms 0.8263ms 1.2101 KOps/s 1.2035 KOps/s $\color{#35bf28}+0.55\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5308ms 0.4163ms 2.4019 KOps/s 2.4138 KOps/s $\color{#d91a1a}-0.49\%$
test_distributed 6.4762ms 0.1791ms 5.5831 KOps/s 8.7140 KOps/s $\textbf{\color{#d91a1a}-35.93\%}$
test_tdmodule 45.6110μs 14.8708μs 67.2459 KOps/s 60.1476 KOps/s $\textbf{\color{#35bf28}+11.80\%}$
test_tdmodule_dispatch 62.0510μs 29.6577μs 33.7181 KOps/s 31.3906 KOps/s $\textbf{\color{#35bf28}+7.41\%}$
test_tdseq 35.3800μs 16.3612μs 61.1201 KOps/s 56.8305 KOps/s $\textbf{\color{#35bf28}+7.55\%}$
test_tdseq_dispatch 60.7020μs 32.6996μs 30.5814 KOps/s 28.3619 KOps/s $\textbf{\color{#35bf28}+7.83\%}$
test_instantiation_functorch 2.0698ms 1.8945ms 527.8395 Ops/s 520.0393 Ops/s $\color{#35bf28}+1.50\%$
test_exec_functorch 0.3547ms 0.2093ms 4.7781 KOps/s 4.6817 KOps/s $\color{#35bf28}+2.06\%$
test_exec_functional_call 0.3963ms 0.2077ms 4.8143 KOps/s 4.6846 KOps/s $\color{#35bf28}+2.77\%$
test_exec_td_decorator 0.4463ms 0.2543ms 3.9326 KOps/s 3.8467 KOps/s $\color{#35bf28}+2.23\%$
test_vmap_mlp_speed_decorator[True-True] 0.8523ms 0.6628ms 1.5086 KOps/s 1.5021 KOps/s $\color{#35bf28}+0.44\%$
test_vmap_mlp_speed_decorator[True-False] 0.8608ms 0.6610ms 1.5129 KOps/s 1.5079 KOps/s $\color{#35bf28}+0.33\%$
test_vmap_mlp_speed_decorator[False-True] 0.7589ms 0.5783ms 1.7291 KOps/s 1.7279 KOps/s $\color{#35bf28}+0.07\%$
test_vmap_mlp_speed_decorator[False-False] 0.7810ms 0.5821ms 1.7178 KOps/s 1.7266 KOps/s $\color{#d91a1a}-0.51\%$
test_vmap_transformer_speed_decorator[True-True] 19.4123ms 19.0125ms 52.5971 Ops/s 52.6883 Ops/s $\color{#d91a1a}-0.17\%$
test_vmap_transformer_speed_decorator[True-False] 19.2397ms 19.0099ms 52.6042 Ops/s 52.5981 Ops/s $\color{#35bf28}+0.01\%$
test_vmap_transformer_speed_decorator[False-True] 19.1731ms 18.8795ms 52.9676 Ops/s 53.1178 Ops/s $\color{#d91a1a}-0.28\%$
test_vmap_transformer_speed_decorator[False-False] 19.2046ms 18.8750ms 52.9802 Ops/s 52.6553 Ops/s $\color{#35bf28}+0.62\%$
test_to_module_speed[True] 1.4757ms 0.9570ms 1.0449 KOps/s 1.0457 KOps/s $\color{#d91a1a}-0.07\%$
test_to_module_speed[False] 1.3281ms 0.9462ms 1.0569 KOps/s 1.0639 KOps/s $\color{#d91a1a}-0.65\%$
test_tc_init 71.2210μs 34.2584μs 29.1900 KOps/s 25.8693 KOps/s $\textbf{\color{#35bf28}+12.84\%}$
test_tc_init_nested 0.2539ms 69.6717μs 14.3530 KOps/s 12.5843 KOps/s $\textbf{\color{#35bf28}+14.05\%}$
test_tc_first_layer_tensor 27.2861μs 0.7179μs 1.3929 MOps/s 1.3959 MOps/s $\color{#d91a1a}-0.22\%$
test_tc_first_layer_nontensor 18.9310μs 2.4453μs 408.9441 KOps/s 436.9755 KOps/s $\textbf{\color{#d91a1a}-6.41\%}$
test_tc_second_layer_tensor 9.1527μs 1.4537μs 687.9199 KOps/s 683.2213 KOps/s $\color{#35bf28}+0.69\%$
test_tc_second_layer_nontensor 41.4810μs 3.1917μs 313.3079 KOps/s 322.5501 KOps/s $\color{#d91a1a}-2.87\%$
test_unbind 0.2043s 11.1398ms 89.7684 Ops/s 88.3935 Ops/s $\color{#35bf28}+1.56\%$
test_full_like 0.7905ms 0.5763ms 1.7353 KOps/s 1.7388 KOps/s $\color{#d91a1a}-0.20\%$
test_zeros_like 0.3624ms 0.1982ms 5.0450 KOps/s 5.0438 KOps/s $\color{#35bf28}+0.02\%$
test_ones_like 0.3469ms 0.1981ms 5.0483 KOps/s 5.0468 KOps/s $\color{#35bf28}+0.03\%$
test_clone 0.5550ms 0.4153ms 2.4080 KOps/s 2.4067 KOps/s $\color{#35bf28}+0.06\%$
test_squeeze 0.1228ms 9.4100μs 106.2700 KOps/s 106.0284 KOps/s $\color{#35bf28}+0.23\%$
test_unsqueeze 0.2168ms 72.3942μs 13.8133 KOps/s 13.5006 KOps/s $\color{#35bf28}+2.32\%$
test_split 0.4019ms 0.1647ms 6.0712 KOps/s 5.9016 KOps/s $\color{#35bf28}+2.87\%$
test_permute 0.2822ms 0.1754ms 5.7007 KOps/s 5.6767 KOps/s $\color{#35bf28}+0.42\%$
test_stack 1.2757ms 0.8697ms 1.1498 KOps/s 1.2025 KOps/s $\color{#d91a1a}-4.38\%$
test_cat 1.3439ms 1.2317ms 811.8582 Ops/s 811.9529 Ops/s $\color{#d91a1a}-0.01\%$

vmoens added a commit that referenced this pull request Nov 4, 2024
ghstack-source-id: 5e9c5a974e5a5c73b033e5b85c3eb70c2f433512
Pull Request resolved: #1069

(cherry picked from commit 082d542)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants