Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Quality] non_blocking_pin instead of pin_memory #915

Merged
merged 2 commits into from
Jul 24, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jul 24, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 24, 2024
Copy link

github-actions bot commented Jul 24, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 213. Improved: $\large\color{#35bf28}11$. Worsened: $\large\color{#d91a1a}37$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 72.6850μs 23.1171μs 43.2581 KOps/s 45.4790 KOps/s $\color{#d91a1a}-4.88\%$
test_plain_set_stack_nested 0.2744ms 23.6661μs 42.2546 KOps/s 45.4808 KOps/s $\textbf{\color{#d91a1a}-7.09\%}$
test_plain_set_nested_inplace 0.1292ms 25.2768μs 39.5620 KOps/s 41.9194 KOps/s $\textbf{\color{#d91a1a}-5.62\%}$
test_plain_set_stack_nested_inplace 73.1860μs 25.2273μs 39.6396 KOps/s 41.3173 KOps/s $\color{#d91a1a}-4.06\%$
test_items 23.8950μs 2.6734μs 374.0508 KOps/s 383.3853 KOps/s $\color{#d91a1a}-2.43\%$
test_items_nested 0.5005ms 0.3731ms 2.6803 KOps/s 2.7111 KOps/s $\color{#d91a1a}-1.14\%$
test_items_nested_locked 0.9105ms 0.3756ms 2.6624 KOps/s 2.6959 KOps/s $\color{#d91a1a}-1.24\%$
test_items_nested_leaf 0.1554ms 91.9657μs 10.8736 KOps/s 11.1883 KOps/s $\color{#d91a1a}-2.81\%$
test_items_stack_nested 1.2655ms 0.3874ms 2.5813 KOps/s 2.6879 KOps/s $\color{#d91a1a}-3.97\%$
test_items_stack_nested_leaf 0.1945ms 90.0090μs 11.1100 KOps/s 11.0071 KOps/s $\color{#35bf28}+0.93\%$
test_items_stack_nested_locked 0.9289ms 0.3764ms 2.6567 KOps/s 2.6918 KOps/s $\color{#d91a1a}-1.30\%$
test_keys 24.3750μs 3.8789μs 257.8019 KOps/s 259.7033 KOps/s $\color{#d91a1a}-0.73\%$
test_keys_nested 0.2496ms 0.1431ms 6.9894 KOps/s 6.9935 KOps/s $\color{#d91a1a}-0.06\%$
test_keys_nested_locked 0.7100ms 0.1487ms 6.7247 KOps/s 6.7111 KOps/s $\color{#35bf28}+0.20\%$
test_keys_nested_leaf 0.2253ms 0.1250ms 8.0020 KOps/s 8.1352 KOps/s $\color{#d91a1a}-1.64\%$
test_keys_stack_nested 0.5197ms 0.1424ms 7.0237 KOps/s 6.9533 KOps/s $\color{#35bf28}+1.01\%$
test_keys_stack_nested_leaf 0.2113ms 0.1204ms 8.3053 KOps/s 8.0869 KOps/s $\color{#35bf28}+2.70\%$
test_keys_stack_nested_locked 0.2542ms 0.1474ms 6.7827 KOps/s 6.6492 KOps/s $\color{#35bf28}+2.01\%$
test_values 8.5083μs 1.1738μs 851.9238 KOps/s 808.6286 KOps/s $\textbf{\color{#35bf28}+5.35\%}$
test_values_nested 0.1459ms 50.3932μs 19.8440 KOps/s 19.5078 KOps/s $\color{#35bf28}+1.72\%$
test_values_nested_locked 98.5150μs 50.2357μs 19.9062 KOps/s 20.0102 KOps/s $\color{#d91a1a}-0.52\%$
test_values_nested_leaf 90.4680μs 44.8729μs 22.2852 KOps/s 22.2430 KOps/s $\color{#35bf28}+0.19\%$
test_values_stack_nested 95.1770μs 51.0976μs 19.5704 KOps/s 19.6871 KOps/s $\color{#d91a1a}-0.59\%$
test_values_stack_nested_leaf 90.7990μs 44.6199μs 22.4115 KOps/s 19.7015 KOps/s $\textbf{\color{#35bf28}+13.76\%}$
test_values_stack_nested_locked 94.3250μs 51.8332μs 19.2927 KOps/s 19.7406 KOps/s $\color{#d91a1a}-2.27\%$
test_membership 2.3284μs 0.7311μs 1.3677 MOps/s 1.0929 MOps/s $\textbf{\color{#35bf28}+25.14\%}$
test_membership_nested 28.0030μs 2.7876μs 358.7261 KOps/s 370.8150 KOps/s $\color{#d91a1a}-3.26\%$
test_membership_nested_leaf 33.5520μs 2.7954μs 357.7324 KOps/s 372.0273 KOps/s $\color{#d91a1a}-3.84\%$
test_membership_stacked_nested 18.0240μs 2.7851μs 359.0501 KOps/s 369.6890 KOps/s $\color{#d91a1a}-2.88\%$
test_membership_stacked_nested_leaf 28.7330μs 2.8359μs 352.6188 KOps/s 369.3270 KOps/s $\color{#d91a1a}-4.52\%$
test_membership_nested_last 21.2200μs 4.1291μs 242.1856 KOps/s 249.7053 KOps/s $\color{#d91a1a}-3.01\%$
test_membership_nested_leaf_last 36.9490μs 4.1888μs 238.7326 KOps/s 247.0728 KOps/s $\color{#d91a1a}-3.38\%$
test_membership_stacked_nested_last 46.4260μs 12.9214μs 77.3908 KOps/s 252.3698 KOps/s $\textbf{\color{#d91a1a}-69.33\%}$
test_membership_stacked_nested_leaf_last 39.3530μs 13.0380μs 76.6989 KOps/s 251.5317 KOps/s $\textbf{\color{#d91a1a}-69.51\%}$
test_nested_getleaf 40.0350μs 10.7272μs 93.2212 KOps/s 94.7051 KOps/s $\color{#d91a1a}-1.57\%$
test_nested_get 43.2010μs 10.1284μs 98.7320 KOps/s 99.7340 KOps/s $\color{#d91a1a}-1.00\%$
test_stacked_getleaf 33.4720μs 10.5960μs 94.3752 KOps/s 96.3329 KOps/s $\color{#d91a1a}-2.03\%$
test_stacked_get 38.1310μs 10.0540μs 99.4625 KOps/s 100.8745 KOps/s $\color{#d91a1a}-1.40\%$
test_nested_getitemleaf 30.3560μs 11.2627μs 88.7885 KOps/s 85.7223 KOps/s $\color{#35bf28}+3.58\%$
test_nested_getitem 29.9260μs 10.3586μs 96.5378 KOps/s 96.3731 KOps/s $\color{#35bf28}+0.17\%$
test_stacked_getitemleaf 51.9060μs 11.0655μs 90.3712 KOps/s 92.1522 KOps/s $\color{#d91a1a}-1.93\%$
test_stacked_getitem 28.4130μs 10.2534μs 97.5286 KOps/s 99.0718 KOps/s $\color{#d91a1a}-1.56\%$
test_lock_nested 78.8765ms 0.5824ms 1.7170 KOps/s 1.9838 KOps/s $\textbf{\color{#d91a1a}-13.45\%}$
test_lock_stack_nested 0.7124ms 0.4585ms 2.1810 KOps/s 2.1209 KOps/s $\color{#35bf28}+2.83\%$
test_unlock_nested 91.7708ms 0.5212ms 1.9187 KOps/s 2.3560 KOps/s $\textbf{\color{#d91a1a}-18.56\%}$
test_unlock_stack_nested 0.6537ms 0.3722ms 2.6869 KOps/s 2.5850 KOps/s $\color{#35bf28}+3.94\%$
test_flatten_speed 0.5695ms 0.1121ms 8.9234 KOps/s 9.2027 KOps/s $\color{#d91a1a}-3.04\%$
test_unflatten_speed 0.7445ms 0.4438ms 2.2532 KOps/s 2.2756 KOps/s $\color{#d91a1a}-0.98\%$
test_common_ops 4.8276ms 1.1587ms 863.0460 Ops/s 918.3347 Ops/s $\textbf{\color{#d91a1a}-6.02\%}$
test_creation 55.7340μs 2.5727μs 388.6928 KOps/s 398.2948 KOps/s $\color{#d91a1a}-2.41\%$
test_creation_empty 64.1190μs 21.4007μs 46.7274 KOps/s 54.5153 KOps/s $\textbf{\color{#d91a1a}-14.29\%}$
test_creation_nested_1 69.1390μs 25.2293μs 39.6365 KOps/s 45.5032 KOps/s $\textbf{\color{#d91a1a}-12.89\%}$
test_creation_nested_2 74.7890μs 28.7641μs 34.7655 KOps/s 38.2463 KOps/s $\textbf{\color{#d91a1a}-9.10\%}$
test_clone 64.6500μs 16.9693μs 58.9301 KOps/s 59.9937 KOps/s $\color{#d91a1a}-1.77\%$
test_getitem[int] 0.8396ms 16.7495μs 59.7032 KOps/s 59.1465 KOps/s $\color{#35bf28}+0.94\%$
test_getitem[slice_int] 0.1476ms 31.7649μs 31.4813 KOps/s 32.3158 KOps/s $\color{#d91a1a}-2.58\%$
test_getitem[range] 0.1615ms 57.3833μs 17.4267 KOps/s 17.0382 KOps/s $\color{#35bf28}+2.28\%$
test_getitem[tuple] 0.1366ms 26.2972μs 38.0269 KOps/s 39.3123 KOps/s $\color{#d91a1a}-3.27\%$
test_getitem[list] 0.2057ms 52.2755μs 19.1294 KOps/s 18.9348 KOps/s $\color{#35bf28}+1.03\%$
test_setitem_dim[int] 99.4960μs 45.5916μs 21.9339 KOps/s 24.4416 KOps/s $\textbf{\color{#d91a1a}-10.26\%}$
test_setitem_dim[slice_int] 0.1343ms 73.5796μs 13.5907 KOps/s 14.1870 KOps/s $\color{#d91a1a}-4.20\%$
test_setitem_dim[range] 0.1519ms 93.1465μs 10.7358 KOps/s 10.7177 KOps/s $\color{#35bf28}+0.17\%$
test_setitem_dim[tuple] 0.1025ms 60.9146μs 16.4164 KOps/s 17.2699 KOps/s $\color{#d91a1a}-4.94\%$
test_setitem 76.8730μs 30.7217μs 32.5503 KOps/s 34.6074 KOps/s $\textbf{\color{#d91a1a}-5.94\%}$
test_set 0.1524ms 30.3294μs 32.9713 KOps/s 35.4208 KOps/s $\textbf{\color{#d91a1a}-6.92\%}$
test_set_shared 1.3283ms 0.2133ms 4.6873 KOps/s 4.6865 KOps/s $\color{#35bf28}+0.02\%$
test_update 0.1533ms 38.4518μs 26.0066 KOps/s 28.0460 KOps/s $\textbf{\color{#d91a1a}-7.27\%}$
test_update_nested 0.1317ms 48.5275μs 20.6069 KOps/s 22.0563 KOps/s $\textbf{\color{#d91a1a}-6.57\%}$
test_update__nested 92.1720μs 34.0698μs 29.3515 KOps/s 30.1909 KOps/s $\color{#d91a1a}-2.78\%$
test_set_nested 0.1271ms 32.3943μs 30.8697 KOps/s 32.5786 KOps/s $\textbf{\color{#d91a1a}-5.25\%}$
test_set_nested_new 0.1461ms 38.0070μs 26.3110 KOps/s 27.8168 KOps/s $\textbf{\color{#d91a1a}-5.41\%}$
test_select 0.1964ms 55.9677μs 17.8675 KOps/s 19.1855 KOps/s $\textbf{\color{#d91a1a}-6.87\%}$
test_select_nested 0.1120ms 61.1622μs 16.3500 KOps/s 16.6123 KOps/s $\color{#d91a1a}-1.58\%$
test_exclude_nested 0.1813ms 80.4336μs 12.4326 KOps/s 12.4297 KOps/s $\color{#35bf28}+0.02\%$
test_empty[True] 0.8709ms 0.3449ms 2.8992 KOps/s 2.9314 KOps/s $\color{#d91a1a}-1.10\%$
test_empty[False] 6.9253μs 1.2593μs 794.0752 KOps/s 792.5370 KOps/s $\color{#35bf28}+0.19\%$
test_unbind_speed 0.6425ms 0.3185ms 3.1394 KOps/s 3.1269 KOps/s $\color{#35bf28}+0.40\%$
test_unbind_speed_stack0 0.4320ms 0.3001ms 3.3321 KOps/s 3.2193 KOps/s $\color{#35bf28}+3.51\%$
test_unbind_speed_stack1 80.3194ms 0.7788ms 1.2840 KOps/s 1.3500 KOps/s $\color{#d91a1a}-4.89\%$
test_split 84.6066ms 2.1662ms 461.6304 Ops/s 440.8178 Ops/s $\color{#35bf28}+4.72\%$
test_chunk 82.1525ms 2.1548ms 464.0790 Ops/s 463.4854 Ops/s $\color{#35bf28}+0.13\%$
test_creation[device0] 0.2543ms 0.1180ms 8.4759 KOps/s 8.5082 KOps/s $\color{#d91a1a}-0.38\%$
test_creation_from_tensor 4.5680ms 0.1209ms 8.2723 KOps/s 8.3707 KOps/s $\color{#d91a1a}-1.18\%$
test_add_one[memmap_tensor0] 0.1430ms 7.7735μs 128.6422 KOps/s 128.7025 KOps/s $\color{#d91a1a}-0.05\%$
test_contiguous[memmap_tensor0] 24.9160μs 2.0373μs 490.8554 KOps/s 498.0471 KOps/s $\color{#d91a1a}-1.44\%$
test_stack[memmap_tensor0] 39.9540μs 5.6732μs 176.2662 KOps/s 168.8984 KOps/s $\color{#35bf28}+4.36\%$
test_memmaptd_index 1.0746ms 0.4084ms 2.4486 KOps/s 2.4436 KOps/s $\color{#35bf28}+0.21\%$
test_memmaptd_index_astensor 0.9323ms 0.4904ms 2.0390 KOps/s 2.0527 KOps/s $\color{#d91a1a}-0.67\%$
test_memmaptd_index_op 1.5276ms 1.0762ms 929.2380 Ops/s 955.4818 Ops/s $\color{#d91a1a}-2.75\%$
test_serialize_model 0.1324s 0.1253s 7.9780 Ops/s 7.1449 Ops/s $\textbf{\color{#35bf28}+11.66\%}$
test_serialize_model_pickle 0.5532s 0.4167s 2.4000 Ops/s 2.5052 Ops/s $\color{#d91a1a}-4.20\%$
test_serialize_weights 0.1328s 0.1254s 7.9767 Ops/s 8.0777 Ops/s $\color{#d91a1a}-1.25\%$
test_serialize_weights_returnearly 0.1805s 0.1680s 5.9536 Ops/s 5.9845 Ops/s $\color{#d91a1a}-0.52\%$
test_serialize_weights_pickle 0.4842s 0.4082s 2.4495 Ops/s 2.4032 Ops/s $\color{#35bf28}+1.93\%$
test_serialize_weights_filesystem 0.1511s 0.1435s 6.9703 Ops/s 6.2678 Ops/s $\textbf{\color{#35bf28}+11.21\%}$
test_serialize_model_filesystem 0.1579s 0.1526s 6.5517 Ops/s 6.3966 Ops/s $\color{#35bf28}+2.43\%$
test_reshape_pytree 0.1010ms 40.8717μs 24.4668 KOps/s 24.8278 KOps/s $\color{#d91a1a}-1.45\%$
test_reshape_td 0.1171ms 47.4002μs 21.0970 KOps/s 20.7631 KOps/s $\color{#35bf28}+1.61\%$
test_view_pytree 0.1184ms 40.4558μs 24.7184 KOps/s 24.8316 KOps/s $\color{#d91a1a}-0.46\%$
test_view_td 0.1555ms 53.0295μs 18.8574 KOps/s 18.5887 KOps/s $\color{#35bf28}+1.45\%$
test_unbind_pytree 79.9280μs 37.6129μs 26.5866 KOps/s 26.6223 KOps/s $\color{#d91a1a}-0.13\%$
test_unbind_td 0.3628ms 47.0993μs 21.2317 KOps/s 20.8696 KOps/s $\color{#35bf28}+1.74\%$
test_split_pytree 89.0550μs 41.2575μs 24.2380 KOps/s 24.7232 KOps/s $\color{#d91a1a}-1.96\%$
test_split_td 0.1816ms 60.5937μs 16.5034 KOps/s 16.6568 KOps/s $\color{#d91a1a}-0.92\%$
test_add_pytree 98.5630μs 48.7052μs 20.5317 KOps/s 20.9865 KOps/s $\color{#d91a1a}-2.17\%$
test_add_td 0.2461ms 87.3539μs 11.4477 KOps/s 11.1907 KOps/s $\color{#35bf28}+2.30\%$
test_compile_add_one_nested[tensordict-compile] 0.1063ms 54.9636μs 18.1939 KOps/s 17.1245 KOps/s $\textbf{\color{#35bf28}+6.24\%}$
test_compile_add_one_nested[tensordict-eager] 0.9326ms 0.2079ms 4.8097 KOps/s 4.4011 KOps/s $\textbf{\color{#35bf28}+9.28\%}$
test_compile_add_one_nested[pytree-compile] 0.1372ms 55.5384μs 18.0055 KOps/s 18.3109 KOps/s $\color{#d91a1a}-1.67\%$
test_compile_add_one_nested[pytree-eager] 0.2905ms 0.1493ms 6.6996 KOps/s 6.7716 KOps/s $\color{#d91a1a}-1.06\%$
test_compile_copy_nested[tensordict-compile] 98.6620μs 20.5330μs 48.7021 KOps/s 49.8713 KOps/s $\color{#d91a1a}-2.34\%$
test_compile_copy_nested[tensordict-eager] 0.1741ms 66.3523μs 15.0711 KOps/s 15.1215 KOps/s $\color{#d91a1a}-0.33\%$
test_compile_copy_nested[pytree-compile] 0.1664ms 82.6231μs 12.1032 KOps/s 12.5155 KOps/s $\color{#d91a1a}-3.29\%$
test_compile_copy_nested[pytree-eager] 0.1412ms 73.8880μs 13.5340 KOps/s 13.7865 KOps/s $\color{#d91a1a}-1.83\%$
test_compile_add_one_flat[tensordict-compile] 0.7925ms 0.1772ms 5.6440 KOps/s 5.7497 KOps/s $\color{#d91a1a}-1.84\%$
test_compile_add_one_flat[tensordict-eager] 0.4416ms 0.1974ms 5.0655 KOps/s 5.2091 KOps/s $\color{#d91a1a}-2.76\%$
test_compile_add_one_flat[tensorclass-compile] 0.2344ms 39.0026μs 25.6393 KOps/s 25.4756 KOps/s $\color{#35bf28}+0.64\%$
test_compile_add_one_flat[tensorclass-eager] 0.4648ms 70.6922μs 14.1458 KOps/s 14.2954 KOps/s $\color{#d91a1a}-1.05\%$
test_compile_add_one_flat[pytree-compile] 0.3861ms 0.1740ms 5.7459 KOps/s 5.8047 KOps/s $\color{#d91a1a}-1.01\%$
test_compile_add_one_flat[pytree-eager] 0.4456ms 0.2946ms 3.3947 KOps/s 3.4027 KOps/s $\color{#d91a1a}-0.24\%$
test_compile_add_self_flat[tensordict-eager] 0.3625ms 0.2125ms 4.7067 KOps/s 4.7715 KOps/s $\color{#d91a1a}-1.36\%$
test_compile_add_self_flat[tensordict-compile] 0.7514ms 0.1820ms 5.4953 KOps/s 5.6821 KOps/s $\color{#d91a1a}-3.29\%$
test_compile_add_self_flat[tensorclass-eager] 0.2095ms 63.9818μs 15.6294 KOps/s 15.8999 KOps/s $\color{#d91a1a}-1.70\%$
test_compile_add_self_flat[tensorclass-compile] 0.1044ms 40.9687μs 24.4089 KOps/s 25.1996 KOps/s $\color{#d91a1a}-3.14\%$
test_compile_add_self_flat[pytree-eager] 0.3896ms 0.2419ms 4.1341 KOps/s 4.2004 KOps/s $\color{#d91a1a}-1.58\%$
test_compile_add_self_flat[pytree-compile] 0.2615ms 0.1743ms 5.7358 KOps/s 5.7079 KOps/s $\color{#35bf28}+0.49\%$
test_compile_copy_flat[tensordict-compile] 0.2002ms 0.1081ms 9.2529 KOps/s 9.2457 KOps/s $\color{#35bf28}+0.08\%$
test_compile_copy_flat[tensordict-eager] 0.1186ms 55.5373μs 18.0059 KOps/s 17.1389 KOps/s $\textbf{\color{#35bf28}+5.06\%}$
test_compile_copy_flat[pytree-compile] 0.1764ms 82.9818μs 12.0508 KOps/s 12.5652 KOps/s $\color{#d91a1a}-4.09\%$
test_compile_copy_flat[pytree-eager] 0.1624ms 74.0538μs 13.5037 KOps/s 14.2644 KOps/s $\textbf{\color{#d91a1a}-5.33\%}$
test_compile_assign_and_add[tensordict-compile] 0.2903ms 0.1931ms 5.1788 KOps/s 5.2176 KOps/s $\color{#d91a1a}-0.74\%$
test_compile_assign_and_add[tensordict-eager] 2.8627ms 1.6450ms 607.8848 Ops/s 599.6552 Ops/s $\color{#35bf28}+1.37\%$
test_compile_assign_and_add[pytree-compile] 0.3887ms 0.1921ms 5.2043 KOps/s 5.4138 KOps/s $\color{#d91a1a}-3.87\%$
test_compile_assign_and_add[pytree-eager] 1.2751ms 1.0838ms 922.6737 Ops/s 923.4705 Ops/s $\color{#d91a1a}-0.09\%$
test_compile_assign_and_add_stack[compile] 0.6261ms 0.4276ms 2.3385 KOps/s 2.3542 KOps/s $\color{#d91a1a}-0.67\%$
test_compile_assign_and_add_stack[eager] 5.7238ms 4.0200ms 248.7543 Ops/s 262.5470 Ops/s $\textbf{\color{#d91a1a}-5.25\%}$
test_compile_indexing[tensor-tensordict-compile] 0.1017ms 33.2804μs 30.0477 KOps/s 30.7065 KOps/s $\color{#d91a1a}-2.15\%$
test_compile_indexing[tensor-tensordict-eager] 1.4593ms 50.6430μs 19.7461 KOps/s 21.4459 KOps/s $\textbf{\color{#d91a1a}-7.93\%}$
test_compile_indexing[tensor-tensorclass-compile] 0.1513ms 28.9768μs 34.5104 KOps/s 35.4138 KOps/s $\color{#d91a1a}-2.55\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1298ms 32.0266μs 31.2241 KOps/s 32.5706 KOps/s $\color{#d91a1a}-4.13\%$
test_compile_indexing[tensor-pytree-compile] 0.1051ms 29.0704μs 34.3993 KOps/s 35.6812 KOps/s $\color{#d91a1a}-3.59\%$
test_compile_indexing[tensor-pytree-eager] 0.2122ms 31.8328μs 31.4142 KOps/s 32.4688 KOps/s $\color{#d91a1a}-3.25\%$
test_compile_indexing[slice-tensordict-compile] 0.1955ms 73.9599μs 13.5208 KOps/s 13.8202 KOps/s $\color{#d91a1a}-2.17\%$
test_compile_indexing[slice-tensordict-eager] 0.5052ms 28.8945μs 34.6087 KOps/s 35.5291 KOps/s $\color{#d91a1a}-2.59\%$
test_compile_indexing[slice-tensorclass-compile] 0.1340ms 69.8491μs 14.3166 KOps/s 14.6516 KOps/s $\color{#d91a1a}-2.29\%$
test_compile_indexing[slice-tensorclass-eager] 0.1329ms 25.4516μs 39.2902 KOps/s 41.5088 KOps/s $\textbf{\color{#d91a1a}-5.34\%}$
test_compile_indexing[slice-pytree-compile] 0.1406ms 68.4289μs 14.6137 KOps/s 14.7512 KOps/s $\color{#d91a1a}-0.93\%$
test_compile_indexing[slice-pytree-eager] 0.1153ms 25.3096μs 39.5107 KOps/s 41.3837 KOps/s $\color{#d91a1a}-4.53\%$
test_compile_indexing[int-tensordict-compile] 0.1615ms 73.0813μs 13.6834 KOps/s 13.8713 KOps/s $\color{#d91a1a}-1.35\%$
test_compile_indexing[int-tensordict-eager] 0.8410ms 28.4558μs 35.1422 KOps/s 36.5513 KOps/s $\color{#d91a1a}-3.86\%$
test_compile_indexing[int-tensorclass-compile] 0.1545ms 68.2315μs 14.6560 KOps/s 14.7760 KOps/s $\color{#d91a1a}-0.81\%$
test_compile_indexing[int-tensorclass-eager] 72.7250μs 25.0734μs 39.8829 KOps/s 40.8672 KOps/s $\color{#d91a1a}-2.41\%$
test_compile_indexing[int-pytree-compile] 0.1464ms 67.8704μs 14.7340 KOps/s 14.7713 KOps/s $\color{#d91a1a}-0.25\%$
test_compile_indexing[int-pytree-eager] 94.5560μs 25.1650μs 39.7377 KOps/s 40.7010 KOps/s $\color{#d91a1a}-2.37\%$
test_mod_add[eager] 0.1097ms 26.9722μs 37.0752 KOps/s 41.0227 KOps/s $\textbf{\color{#d91a1a}-9.62\%}$
test_mod_add[compile] 0.1166ms 39.3980μs 25.3820 KOps/s 27.1325 KOps/s $\textbf{\color{#d91a1a}-6.45\%}$
test_mod_add[compile-overhead] 0.1101ms 38.7653μs 25.7963 KOps/s 27.0527 KOps/s $\color{#d91a1a}-4.64\%$
test_mod_wrap[eager] 0.4382ms 0.2132ms 4.6909 KOps/s 4.7003 KOps/s $\color{#d91a1a}-0.20\%$
test_mod_wrap[compile] 1.8607ms 0.2348ms 4.2590 KOps/s 4.2586 KOps/s $+0.01\%$
test_mod_wrap[compile-overhead] 0.3798ms 0.2277ms 4.3927 KOps/s 4.3514 KOps/s $\color{#35bf28}+0.95\%$
test_mod_wrap_and_backward[eager] 16.8025ms 12.1059ms 82.6041 Ops/s 92.9324 Ops/s $\textbf{\color{#d91a1a}-11.11\%}$
test_mod_wrap_and_backward[compile] 23.2017ms 13.0002ms 76.9217 Ops/s 88.2780 Ops/s $\textbf{\color{#d91a1a}-12.86\%}$
test_mod_wrap_and_backward[compile-overhead] 21.0767ms 13.0030ms 76.9053 Ops/s 89.5144 Ops/s $\textbf{\color{#d91a1a}-14.09\%}$
test_seq_add[eager] 0.4797ms 94.0379μs 10.6340 KOps/s 11.1209 KOps/s $\color{#d91a1a}-4.38\%$
test_seq_add[compile] 0.1625ms 63.5544μs 15.7346 KOps/s 16.4757 KOps/s $\color{#d91a1a}-4.50\%$
test_seq_add[compile-overhead] 0.1468ms 61.0813μs 16.3716 KOps/s 16.8678 KOps/s $\color{#d91a1a}-2.94\%$
test_seq_wrap[eager] 0.6686ms 0.3908ms 2.5588 KOps/s 2.6615 KOps/s $\color{#d91a1a}-3.86\%$
test_seq_wrap[compile] 0.6437ms 0.2639ms 3.7893 KOps/s 3.7685 KOps/s $\color{#35bf28}+0.55\%$
test_seq_wrap[compile-overhead] 0.3668ms 0.2638ms 3.7911 KOps/s 3.8182 KOps/s $\color{#d91a1a}-0.71\%$
test_func_call_runtime[False-eager] 0.6236ms 0.5159ms 1.9382 KOps/s 1.9054 KOps/s $\color{#35bf28}+1.72\%$
test_func_call_runtime[False-compile] 0.6761ms 0.5004ms 1.9985 KOps/s 2.0128 KOps/s $\color{#d91a1a}-0.71\%$
test_func_call_runtime[False-compile-overhead] 0.9958ms 0.4992ms 2.0033 KOps/s 2.0100 KOps/s $\color{#d91a1a}-0.34\%$
test_func_call_runtime[True-eager] 1.2111ms 0.8283ms 1.2073 KOps/s 1.1889 KOps/s $\color{#35bf28}+1.54\%$
test_func_call_runtime[True-compile] 0.8411ms 0.5157ms 1.9390 KOps/s 1.9135 KOps/s $\color{#35bf28}+1.33\%$
test_func_call_runtime[True-compile-overhead] 1.0435ms 0.5141ms 1.9453 KOps/s 1.9649 KOps/s $\color{#d91a1a}-0.99\%$
test_distributed 0.2656ms 0.1319ms 7.5842 KOps/s 7.6153 KOps/s $\color{#d91a1a}-0.41\%$
test_tdmodule 98.4330μs 19.1929μs 52.1025 KOps/s 57.5384 KOps/s $\textbf{\color{#d91a1a}-9.45\%}$
test_tdmodule_dispatch 70.0310μs 39.7864μs 25.1342 KOps/s 27.3585 KOps/s $\textbf{\color{#d91a1a}-8.13\%}$
test_tdseq 35.6460μs 21.3566μs 46.8239 KOps/s 51.8056 KOps/s $\textbf{\color{#d91a1a}-9.62\%}$
test_tdseq_dispatch 65.6130μs 44.9770μs 22.2336 KOps/s 25.1044 KOps/s $\textbf{\color{#d91a1a}-11.44\%}$
test_instantiation_functorch 2.2478ms 1.6530ms 604.9705 Ops/s 618.2678 Ops/s $\color{#d91a1a}-2.15\%$
test_instantiation_td 2.0369ms 1.2182ms 820.8681 Ops/s 852.0128 Ops/s $\color{#d91a1a}-3.66\%$
test_exec_functorch 0.4365ms 0.1829ms 5.4689 KOps/s 5.5808 KOps/s $\color{#d91a1a}-2.01\%$
test_exec_functional_call 0.3152ms 0.1711ms 5.8450 KOps/s 5.8754 KOps/s $\color{#d91a1a}-0.52\%$
test_exec_td 0.3301ms 0.1761ms 5.6790 KOps/s 5.7906 KOps/s $\color{#d91a1a}-1.93\%$
test_exec_td_decorator 1.1452ms 0.2568ms 3.8944 KOps/s 3.9232 KOps/s $\color{#d91a1a}-0.73\%$
test_vmap_mlp_speed[True-True] 1.1144ms 0.5973ms 1.6742 KOps/s 1.6856 KOps/s $\color{#d91a1a}-0.68\%$
test_vmap_mlp_speed[True-False] 1.2537ms 0.5949ms 1.6809 KOps/s 1.6789 KOps/s $\color{#35bf28}+0.11\%$
test_vmap_mlp_speed[False-True] 0.9174ms 0.4839ms 2.0664 KOps/s 2.0459 KOps/s $\color{#35bf28}+1.00\%$
test_vmap_mlp_speed[False-False] 0.7718ms 0.4856ms 2.0592 KOps/s 1.9842 KOps/s $\color{#35bf28}+3.78\%$
test_vmap_mlp_speed_decorator[True-True] 0.9233ms 0.6917ms 1.4458 KOps/s 1.4674 KOps/s $\color{#d91a1a}-1.47\%$
test_vmap_mlp_speed_decorator[True-False] 1.3053ms 0.6937ms 1.4415 KOps/s 1.4700 KOps/s $\color{#d91a1a}-1.94\%$
test_vmap_mlp_speed_decorator[False-True] 1.0385ms 0.5767ms 1.7340 KOps/s 1.7725 KOps/s $\color{#d91a1a}-2.17\%$
test_vmap_mlp_speed_decorator[False-False] 0.8010ms 0.5672ms 1.7630 KOps/s 1.7555 KOps/s $\color{#35bf28}+0.43\%$
test_to_module_speed[True] 2.5846ms 1.8328ms 545.6119 Ops/s 546.5551 Ops/s $\color{#d91a1a}-0.17\%$
test_to_module_speed[False] 2.9120ms 1.8347ms 545.0405 Ops/s 561.1067 Ops/s $\color{#d91a1a}-2.86\%$
test_tc_init 0.1036ms 49.9522μs 20.0191 KOps/s 22.4928 KOps/s $\textbf{\color{#d91a1a}-11.00\%}$
test_tc_init_nested 0.1941ms 98.1005μs 10.1936 KOps/s 11.2468 KOps/s $\textbf{\color{#d91a1a}-9.36\%}$
test_tc_first_layer_tensor 30.8580μs 1.4309μs 698.8725 KOps/s 703.3218 KOps/s $\color{#d91a1a}-0.63\%$
test_tc_first_layer_nontensor 45.2340μs 4.3007μs 232.5186 KOps/s 236.2398 KOps/s $\color{#d91a1a}-1.58\%$
test_tc_second_layer_tensor 28.6030μs 2.6915μs 371.5437 KOps/s 376.3142 KOps/s $\color{#d91a1a}-1.27\%$
test_tc_second_layer_nontensor 53.9080μs 5.3980μs 185.2554 KOps/s 182.4829 KOps/s $\color{#35bf28}+1.52\%$
test_unbind 0.4377s 13.6472ms 73.2750 Ops/s 73.1688 Ops/s $\color{#35bf28}+0.15\%$
test_full_like 8.5656ms 7.0216ms 142.4179 Ops/s 126.3692 Ops/s $\textbf{\color{#35bf28}+12.70\%}$
test_zeros_like 14.1919ms 5.9166ms 169.0149 Ops/s 335.5140 Ops/s $\textbf{\color{#d91a1a}-49.63\%}$
test_ones_like 13.4817ms 7.3000ms 136.9863 Ops/s 319.9674 Ops/s $\textbf{\color{#d91a1a}-57.19\%}$
test_clone 12.6724ms 8.6675ms 115.3736 Ops/s 203.2316 Ops/s $\textbf{\color{#d91a1a}-43.23\%}$
test_squeeze 94.0650μs 13.0143μs 76.8385 KOps/s 78.3923 KOps/s $\color{#d91a1a}-1.98\%$
test_unsqueeze 0.1840ms 94.7740μs 10.5514 KOps/s 10.8098 KOps/s $\color{#d91a1a}-2.39\%$
test_split 0.4660ms 0.2055ms 4.8658 KOps/s 5.1271 KOps/s $\textbf{\color{#d91a1a}-5.10\%}$
test_permute 0.4399ms 0.2226ms 4.4930 KOps/s 4.5585 KOps/s $\color{#d91a1a}-1.44\%$
test_stack 31.4380ms 23.5875ms 42.3954 Ops/s 35.9755 Ops/s $\textbf{\color{#35bf28}+17.85\%}$
test_cat 32.2794ms 23.2648ms 42.9833 Ops/s 36.0954 Ops/s $\textbf{\color{#35bf28}+19.08\%}$

Copy link

github-actions bot commented Jul 24, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 219. Improved: $\large\color{#35bf28}20$. Worsened: $\large\color{#d91a1a}12$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 32.9610μs 16.9581μs 58.9688 KOps/s 57.2030 KOps/s $\color{#35bf28}+3.09\%$
test_plain_set_stack_nested 37.2310μs 16.7870μs 59.5697 KOps/s 57.2888 KOps/s $\color{#35bf28}+3.98\%$
test_plain_set_nested_inplace 0.1617ms 17.7836μs 56.2316 KOps/s 53.7027 KOps/s $\color{#35bf28}+4.71\%$
test_plain_set_stack_nested_inplace 38.7810μs 17.8109μs 56.1455 KOps/s 53.7147 KOps/s $\color{#35bf28}+4.53\%$
test_items 61.1210μs 4.6461μs 215.2343 KOps/s 216.3742 KOps/s $\color{#d91a1a}-0.53\%$
test_items_nested 0.4177ms 0.3954ms 2.5290 KOps/s 2.5349 KOps/s $\color{#d91a1a}-0.23\%$
test_items_nested_locked 0.4344ms 0.3996ms 2.5027 KOps/s 2.5047 KOps/s $\color{#d91a1a}-0.08\%$
test_items_nested_leaf 0.1247ms 86.1198μs 11.6117 KOps/s 11.6570 KOps/s $\color{#d91a1a}-0.39\%$
test_items_stack_nested 0.4243ms 0.3985ms 2.5093 KOps/s 2.5356 KOps/s $\color{#d91a1a}-1.04\%$
test_items_stack_nested_leaf 0.1122ms 86.1150μs 11.6124 KOps/s 11.7217 KOps/s $\color{#d91a1a}-0.93\%$
test_items_stack_nested_locked 0.4525ms 0.4007ms 2.4954 KOps/s 2.5248 KOps/s $\color{#d91a1a}-1.16\%$
test_keys 0.1678ms 4.3515μs 229.8048 KOps/s 230.1496 KOps/s $\color{#d91a1a}-0.15\%$
test_keys_nested 83.8920μs 67.1046μs 14.9021 KOps/s 15.1166 KOps/s $\color{#d91a1a}-1.42\%$
test_keys_nested_locked 0.7011ms 72.7002μs 13.7551 KOps/s 13.6721 KOps/s $\color{#35bf28}+0.61\%$
test_keys_nested_leaf 75.9410μs 57.4921μs 17.3937 KOps/s 17.7474 KOps/s $\color{#d91a1a}-1.99\%$
test_keys_stack_nested 0.1110ms 66.6265μs 15.0090 KOps/s 14.8659 KOps/s $\color{#35bf28}+0.96\%$
test_keys_stack_nested_leaf 0.2007ms 58.0098μs 17.2385 KOps/s 17.2443 KOps/s $\color{#d91a1a}-0.03\%$
test_keys_stack_nested_locked 0.1424ms 70.3636μs 14.2119 KOps/s 13.8102 KOps/s $\color{#35bf28}+2.91\%$
test_values 61.9547μs 1.7560μs 569.4627 KOps/s 570.0057 KOps/s $\color{#d91a1a}-0.10\%$
test_values_nested 0.2317ms 34.4546μs 29.0237 KOps/s 29.4662 KOps/s $\color{#d91a1a}-1.50\%$
test_values_nested_locked 57.9110μs 36.1826μs 27.6376 KOps/s 28.0650 KOps/s $\color{#d91a1a}-1.52\%$
test_values_nested_leaf 0.1528ms 30.5245μs 32.7606 KOps/s 33.2466 KOps/s $\color{#d91a1a}-1.46\%$
test_values_stack_nested 58.0810μs 35.2409μs 28.3761 KOps/s 29.1166 KOps/s $\color{#d91a1a}-2.54\%$
test_values_stack_nested_leaf 0.1255ms 31.3657μs 31.8820 KOps/s 32.2402 KOps/s $\color{#d91a1a}-1.11\%$
test_values_stack_nested_locked 64.4820μs 36.9167μs 27.0880 KOps/s 27.7317 KOps/s $\color{#d91a1a}-2.32\%$
test_membership 1.4806μs 0.5443μs 1.8373 MOps/s 1.8409 MOps/s $\color{#d91a1a}-0.20\%$
test_membership_nested 17.6700μs 2.0884μs 478.8411 KOps/s 490.5438 KOps/s $\color{#d91a1a}-2.39\%$
test_membership_nested_leaf 14.1355μs 2.0524μs 487.2334 KOps/s 486.1398 KOps/s $\color{#35bf28}+0.22\%$
test_membership_stacked_nested 20.3210μs 2.1144μs 472.9463 KOps/s 463.4576 KOps/s $\color{#35bf28}+2.05\%$
test_membership_stacked_nested_leaf 16.5910μs 2.0853μs 479.5459 KOps/s 473.0334 KOps/s $\color{#35bf28}+1.38\%$
test_membership_nested_last 26.1100μs 3.0424μs 328.6928 KOps/s 324.8163 KOps/s $\color{#35bf28}+1.19\%$
test_membership_nested_leaf_last 23.5510μs 3.0571μs 327.1086 KOps/s 327.8999 KOps/s $\color{#d91a1a}-0.24\%$
test_membership_stacked_nested_last 18.7710μs 4.0604μs 246.2838 KOps/s 108.0536 KOps/s $\textbf{\color{#35bf28}+127.93\%}$
test_membership_stacked_nested_leaf_last 22.8800μs 4.0947μs 244.2208 KOps/s 108.2886 KOps/s $\textbf{\color{#35bf28}+125.53\%}$
test_nested_getleaf 29.5110μs 8.0675μs 123.9547 KOps/s 124.3664 KOps/s $\color{#d91a1a}-0.33\%$
test_nested_get 24.7710μs 7.5667μs 132.1581 KOps/s 132.5340 KOps/s $\color{#d91a1a}-0.28\%$
test_stacked_getleaf 26.2600μs 8.0592μs 124.0824 KOps/s 124.0039 KOps/s $\color{#35bf28}+0.06\%$
test_stacked_get 29.3210μs 7.5625μs 132.2317 KOps/s 132.5493 KOps/s $\color{#d91a1a}-0.24\%$
test_nested_getitemleaf 23.0000μs 8.2024μs 121.9153 KOps/s 122.3617 KOps/s $\color{#d91a1a}-0.36\%$
test_nested_getitem 22.8500μs 7.6874μs 130.0824 KOps/s 129.9240 KOps/s $\color{#35bf28}+0.12\%$
test_stacked_getitemleaf 26.7400μs 8.1811μs 122.2332 KOps/s 122.0609 KOps/s $\color{#35bf28}+0.14\%$
test_stacked_getitem 21.8200μs 7.6651μs 130.4607 KOps/s 130.3420 KOps/s $\color{#35bf28}+0.09\%$
test_lock_nested 0.9498ms 0.4758ms 2.1018 KOps/s 2.0795 KOps/s $\color{#35bf28}+1.07\%$
test_lock_stack_nested 0.5442ms 0.4370ms 2.2884 KOps/s 2.2859 KOps/s $\color{#35bf28}+0.11\%$
test_unlock_nested 0.8783ms 0.3914ms 2.5546 KOps/s 2.4963 KOps/s $\color{#35bf28}+2.34\%$
test_unlock_stack_nested 0.4783ms 0.3542ms 2.8233 KOps/s 2.7969 KOps/s $\color{#35bf28}+0.94\%$
test_flatten_speed 91.9856ms 0.1179ms 8.4841 KOps/s 9.4564 KOps/s $\textbf{\color{#d91a1a}-10.28\%}$
test_unflatten_speed 0.3246ms 0.2943ms 3.3984 KOps/s 3.4116 KOps/s $\color{#d91a1a}-0.39\%$
test_common_ops 1.5121ms 1.2897ms 775.3876 Ops/s 757.1727 Ops/s $\color{#35bf28}+2.41\%$
test_creation 17.3900μs 2.0089μs 497.7780 KOps/s 493.4641 KOps/s $\color{#35bf28}+0.87\%$
test_creation_empty 33.7810μs 17.2351μs 58.0213 KOps/s 54.2045 KOps/s $\textbf{\color{#35bf28}+7.04\%}$
test_creation_nested_1 0.1945ms 19.0118μs 52.5990 KOps/s 48.8380 KOps/s $\textbf{\color{#35bf28}+7.70\%}$
test_creation_nested_2 0.1828ms 21.7574μs 45.9614 KOps/s 42.9114 KOps/s $\textbf{\color{#35bf28}+7.11\%}$
test_clone 0.1785ms 28.9776μs 34.5094 KOps/s 33.8107 KOps/s $\color{#35bf28}+2.07\%$
test_getitem[int] 1.1464ms 17.2065μs 58.1174 KOps/s 56.6441 KOps/s $\color{#35bf28}+2.60\%$
test_getitem[slice_int] 0.2292ms 29.8706μs 33.4777 KOps/s 33.5567 KOps/s $\color{#d91a1a}-0.24\%$
test_getitem[range] 0.2598ms 0.1147ms 8.7154 KOps/s 8.7642 KOps/s $\color{#d91a1a}-0.56\%$
test_getitem[tuple] 0.2221ms 25.8013μs 38.7577 KOps/s 38.3312 KOps/s $\color{#35bf28}+1.11\%$
test_getitem[list] 0.2997ms 0.1040ms 9.6196 KOps/s 9.6793 KOps/s $\color{#d91a1a}-0.62\%$
test_setitem_dim[int] 72.0910μs 53.3835μs 18.7324 KOps/s 17.8271 KOps/s $\textbf{\color{#35bf28}+5.08\%}$
test_setitem_dim[slice_int] 0.2197ms 78.9795μs 12.6615 KOps/s 12.4025 KOps/s $\color{#35bf28}+2.09\%$
test_setitem_dim[range] 0.3117ms 0.1416ms 7.0616 KOps/s 6.9571 KOps/s $\color{#35bf28}+1.50\%$
test_setitem_dim[tuple] 0.1936ms 69.9242μs 14.3012 KOps/s 13.5340 KOps/s $\textbf{\color{#35bf28}+5.67\%}$
test_setitem 0.1940ms 43.0945μs 23.2048 KOps/s 22.0117 KOps/s $\textbf{\color{#35bf28}+5.42\%}$
test_set 0.2019ms 41.7215μs 23.9685 KOps/s 21.9050 KOps/s $\textbf{\color{#35bf28}+9.42\%}$
test_set_shared 0.3630ms 53.0655μs 18.8446 KOps/s 18.5815 KOps/s $\color{#35bf28}+1.42\%$
test_update 0.2018ms 51.0798μs 19.5772 KOps/s 19.2024 KOps/s $\color{#35bf28}+1.95\%$
test_update_nested 0.2095ms 59.2452μs 16.8790 KOps/s 16.5921 KOps/s $\color{#35bf28}+1.73\%$
test_update__nested 0.2442ms 60.9239μs 16.4139 KOps/s 16.5288 KOps/s $\color{#d91a1a}-0.69\%$
test_set_nested 0.1965ms 44.7688μs 22.3370 KOps/s 21.8830 KOps/s $\color{#35bf28}+2.07\%$
test_set_nested_new 0.2004ms 49.2662μs 20.2979 KOps/s 20.0343 KOps/s $\color{#35bf28}+1.32\%$
test_select 0.2028ms 63.8062μs 15.6725 KOps/s 14.8236 KOps/s $\textbf{\color{#35bf28}+5.73\%}$
test_select_nested 75.5120μs 52.3454μs 19.1039 KOps/s 19.0261 KOps/s $\color{#35bf28}+0.41\%$
test_exclude_nested 0.1038ms 72.2332μs 13.8440 KOps/s 14.1250 KOps/s $\color{#d91a1a}-1.99\%$
test_empty[True] 0.3222ms 0.2962ms 3.3759 KOps/s 3.3158 KOps/s $\color{#35bf28}+1.81\%$
test_empty[False] 2.7100μs 0.9319μs 1.0731 MOps/s 1.0448 MOps/s $\color{#35bf28}+2.71\%$
test_to 0.1277ms 38.2025μs 26.1763 KOps/s 24.9746 KOps/s $\color{#35bf28}+4.81\%$
test_to_nonblocking 44.0810μs 23.9348μs 41.7802 KOps/s 41.6520 KOps/s $\color{#35bf28}+0.31\%$
test_unbind_speed 0.3784ms 0.3088ms 3.2381 KOps/s 3.1913 KOps/s $\color{#35bf28}+1.47\%$
test_unbind_speed_stack0 0.3394ms 0.3045ms 3.2844 KOps/s 3.2384 KOps/s $\color{#35bf28}+1.42\%$
test_unbind_speed_stack1 93.3476ms 0.7800ms 1.2820 KOps/s 1.2693 KOps/s $\color{#35bf28}+1.00\%$
test_split 95.2664ms 2.3395ms 427.4358 Ops/s 431.3142 Ops/s $\color{#d91a1a}-0.90\%$
test_chunk 95.4299ms 2.3374ms 427.8322 Ops/s 471.0555 Ops/s $\textbf{\color{#d91a1a}-9.18\%}$
test_creation[device0] 0.2524ms 0.1066ms 9.3847 KOps/s 9.4162 KOps/s $\color{#d91a1a}-0.33\%$
test_creation_from_tensor 0.2500ms 0.1027ms 9.7398 KOps/s 9.2538 KOps/s $\textbf{\color{#35bf28}+5.25\%}$
test_add_one[memmap_tensor0] 0.1086ms 8.6150μs 116.0761 KOps/s 113.7286 KOps/s $\color{#35bf28}+2.06\%$
test_contiguous[memmap_tensor0] 32.9500μs 2.2560μs 443.2693 KOps/s 451.8587 KOps/s $\color{#d91a1a}-1.90\%$
test_stack[memmap_tensor0] 33.7700μs 6.8958μs 145.0166 KOps/s 147.5274 KOps/s $\color{#d91a1a}-1.70\%$
test_memmaptd_index 1.1673ms 0.4412ms 2.2665 KOps/s 1.9972 KOps/s $\textbf{\color{#35bf28}+13.49\%}$
test_memmaptd_index_astensor 0.8156ms 0.5057ms 1.9774 KOps/s 1.9829 KOps/s $\color{#d91a1a}-0.28\%$
test_memmaptd_index_op 1.4552ms 1.0456ms 956.4040 Ops/s 930.3521 Ops/s $\color{#35bf28}+2.80\%$
test_serialize_model 0.1005s 97.0979ms 10.2989 Ops/s 9.9490 Ops/s $\color{#35bf28}+3.52\%$
test_serialize_model_pickle 1.3523s 1.2372s 0.8083 Ops/s 0.8079 Ops/s $\color{#35bf28}+0.05\%$
test_serialize_weights 0.2019s 0.1054s 9.4853 Ops/s 10.3313 Ops/s $\textbf{\color{#d91a1a}-8.19\%}$
test_serialize_weights_returnearly 0.3098s 90.0652ms 11.1031 Ops/s 14.0801 Ops/s $\textbf{\color{#d91a1a}-21.14\%}$
test_serialize_weights_pickle 1.3987s 1.2434s 0.8042 Ops/s 0.8082 Ops/s $\color{#d91a1a}-0.50\%$
test_reshape_pytree 0.1476ms 39.2840μs 25.4557 KOps/s 25.3309 KOps/s $\color{#35bf28}+0.49\%$
test_reshape_td 0.1554ms 45.3305μs 22.0602 KOps/s 22.3781 KOps/s $\color{#d91a1a}-1.42\%$
test_view_pytree 0.1875ms 38.6689μs 25.8606 KOps/s 25.7415 KOps/s $\color{#35bf28}+0.46\%$
test_view_td 0.1849ms 52.6960μs 18.9768 KOps/s 19.4894 KOps/s $\color{#d91a1a}-2.63\%$
test_unbind_pytree 0.1696ms 37.1538μs 26.9151 KOps/s 26.8087 KOps/s $\color{#35bf28}+0.40\%$
test_unbind_td 0.3877ms 45.5799μs 21.9395 KOps/s 20.5235 KOps/s $\textbf{\color{#35bf28}+6.90\%}$
test_split_pytree 0.3352ms 49.6094μs 20.1575 KOps/s 19.8230 KOps/s $\color{#35bf28}+1.69\%$
test_split_td 0.1716ms 60.4171μs 16.5516 KOps/s 14.3782 KOps/s $\textbf{\color{#35bf28}+15.12\%}$
test_add_pytree 0.2225ms 58.3681μs 17.1326 KOps/s 16.7616 KOps/s $\color{#35bf28}+2.21\%$
test_add_td 0.3204ms 97.7244μs 10.2329 KOps/s 9.9390 KOps/s $\color{#35bf28}+2.96\%$
test_compile_add_one_nested[tensordict-compile] 0.4300ms 0.2175ms 4.5978 KOps/s 4.6788 KOps/s $\color{#d91a1a}-1.73\%$
test_compile_add_one_nested[tensordict-eager] 0.3758ms 0.1761ms 5.6781 KOps/s 5.7381 KOps/s $\color{#d91a1a}-1.04\%$
test_compile_add_one_nested[pytree-compile] 0.3001ms 0.1485ms 6.7348 KOps/s 6.8363 KOps/s $\color{#d91a1a}-1.48\%$
test_compile_add_one_nested[pytree-eager] 0.3866ms 0.1935ms 5.1677 KOps/s 5.0909 KOps/s $\color{#35bf28}+1.51\%$
test_compile_copy_nested[tensordict-compile] 0.1804ms 22.9103μs 43.6484 KOps/s 46.0804 KOps/s $\textbf{\color{#d91a1a}-5.28\%}$
test_compile_copy_nested[tensordict-eager] 0.1955ms 49.8941μs 20.0425 KOps/s 20.6256 KOps/s $\color{#d91a1a}-2.83\%$
test_compile_copy_nested[pytree-compile] 0.2305ms 72.3454μs 13.8226 KOps/s 13.8022 KOps/s $\color{#35bf28}+0.15\%$
test_compile_copy_nested[pytree-eager] 85.8420μs 59.2278μs 16.8840 KOps/s 16.8024 KOps/s $\color{#35bf28}+0.49\%$
test_compile_add_one_flat[tensordict-compile] 0.5135ms 0.3346ms 2.9890 KOps/s 3.0129 KOps/s $\color{#d91a1a}-0.79\%$
test_compile_add_one_flat[tensordict-eager] 0.3691ms 0.2234ms 4.4764 KOps/s 4.4972 KOps/s $\color{#d91a1a}-0.46\%$
test_compile_add_one_flat[tensorclass-compile] 0.2850ms 0.1370ms 7.2993 KOps/s 7.5961 KOps/s $\color{#d91a1a}-3.91\%$
test_compile_add_one_flat[tensorclass-eager] 0.2312ms 66.1144μs 15.1253 KOps/s 14.9705 KOps/s $\color{#35bf28}+1.03\%$
test_compile_add_one_flat[pytree-compile] 0.4864ms 0.3329ms 3.0041 KOps/s 3.0298 KOps/s $\color{#d91a1a}-0.85\%$
test_compile_add_one_flat[pytree-eager] 0.7988ms 0.6250ms 1.6001 KOps/s 1.5562 KOps/s $\color{#35bf28}+2.82\%$
test_compile_add_self_flat[tensordict-eager] 0.4199ms 0.2712ms 3.6867 KOps/s 3.6685 KOps/s $\color{#35bf28}+0.49\%$
test_compile_add_self_flat[tensordict-compile] 0.4401ms 0.3345ms 2.9897 KOps/s 3.0052 KOps/s $\color{#d91a1a}-0.52\%$
test_compile_add_self_flat[tensorclass-eager] 0.2287ms 78.0022μs 12.8201 KOps/s 12.4903 KOps/s $\color{#35bf28}+2.64\%$
test_compile_add_self_flat[tensorclass-compile] 0.2890ms 0.1378ms 7.2592 KOps/s 7.4884 KOps/s $\color{#d91a1a}-3.06\%$
test_compile_add_self_flat[pytree-eager] 0.7117ms 0.5336ms 1.8742 KOps/s 1.8337 KOps/s $\color{#35bf28}+2.21\%$
test_compile_add_self_flat[pytree-compile] 0.4315ms 0.3311ms 3.0201 KOps/s 3.0394 KOps/s $\color{#d91a1a}-0.64\%$
test_compile_copy_flat[tensordict-compile] 0.1658ms 18.3769μs 54.4161 KOps/s 51.4993 KOps/s $\textbf{\color{#35bf28}+5.66\%}$
test_compile_copy_flat[tensordict-eager] 54.3610μs 31.8989μs 31.3490 KOps/s 31.0217 KOps/s $\color{#35bf28}+1.06\%$
test_compile_copy_flat[pytree-compile] 0.2094ms 75.4900μs 13.2468 KOps/s 13.2132 KOps/s $\color{#35bf28}+0.25\%$
test_compile_copy_flat[pytree-eager] 0.1248ms 59.9674μs 16.6757 KOps/s 16.4802 KOps/s $\color{#35bf28}+1.19\%$
test_compile_assign_and_add[tensordict-compile] 2.5909ms 0.9453ms 1.0579 KOps/s 1.0728 KOps/s $\color{#d91a1a}-1.38\%$
test_compile_assign_and_add[tensordict-eager] 3.6011ms 3.3590ms 297.7096 Ops/s 289.8693 Ops/s $\color{#35bf28}+2.70\%$
test_compile_assign_and_add[pytree-compile] 2.5652ms 0.9413ms 1.0624 KOps/s 1.0726 KOps/s $\color{#d91a1a}-0.96\%$
test_compile_assign_and_add[pytree-eager] 3.6348ms 3.2800ms 304.8748 Ops/s 300.4275 Ops/s $\color{#35bf28}+1.48\%$
test_compile_indexing[tensor-tensordict-compile] 0.2777ms 0.1131ms 8.8441 KOps/s 8.9435 KOps/s $\color{#d91a1a}-1.11\%$
test_compile_indexing[tensor-tensordict-eager] 0.2427ms 64.9765μs 15.3902 KOps/s 15.8714 KOps/s $\color{#d91a1a}-3.03\%$
test_compile_indexing[tensor-tensorclass-compile] 0.2842ms 0.1060ms 9.4316 KOps/s 9.3940 KOps/s $\color{#35bf28}+0.40\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2333ms 48.1609μs 20.7637 KOps/s 22.0798 KOps/s $\textbf{\color{#d91a1a}-5.96\%}$
test_compile_indexing[tensor-pytree-compile] 0.2931ms 0.1111ms 8.9973 KOps/s 9.5411 KOps/s $\textbf{\color{#d91a1a}-5.70\%}$
test_compile_indexing[tensor-pytree-eager] 0.1991ms 48.6852μs 20.5401 KOps/s 21.9539 KOps/s $\textbf{\color{#d91a1a}-6.44\%}$
test_compile_indexing[slice-tensordict-compile] 0.2926ms 0.1427ms 7.0073 KOps/s 7.0864 KOps/s $\color{#d91a1a}-1.12\%$
test_compile_indexing[slice-tensordict-eager] 0.2209ms 27.8308μs 35.9314 KOps/s 37.6451 KOps/s $\color{#d91a1a}-4.55\%$
test_compile_indexing[slice-tensorclass-compile] 0.2614ms 0.1342ms 7.4530 KOps/s 7.5039 KOps/s $\color{#d91a1a}-0.68\%$
test_compile_indexing[slice-tensorclass-eager] 0.1595ms 22.9269μs 43.6168 KOps/s 43.4175 KOps/s $\color{#35bf28}+0.46\%$
test_compile_indexing[slice-pytree-compile] 0.2894ms 0.1398ms 7.1508 KOps/s 7.5138 KOps/s $\color{#d91a1a}-4.83\%$
test_compile_indexing[slice-pytree-eager] 54.2620μs 23.0104μs 43.4586 KOps/s 43.1774 KOps/s $\color{#35bf28}+0.65\%$
test_compile_indexing[int-tensordict-compile] 0.3441ms 0.1413ms 7.0748 KOps/s 7.1352 KOps/s $\color{#d91a1a}-0.85\%$
test_compile_indexing[int-tensordict-eager] 0.5083ms 26.0872μs 38.3330 KOps/s 37.6662 KOps/s $\color{#35bf28}+1.77\%$
test_compile_indexing[int-tensorclass-compile] 0.2878ms 0.1368ms 7.3116 KOps/s 7.4022 KOps/s $\color{#d91a1a}-1.22\%$
test_compile_indexing[int-tensorclass-eager] 0.1678ms 22.9178μs 43.6342 KOps/s 44.0687 KOps/s $\color{#d91a1a}-0.99\%$
test_compile_indexing[int-pytree-compile] 0.3162ms 0.1333ms 7.5027 KOps/s 7.5127 KOps/s $\color{#d91a1a}-0.13\%$
test_compile_indexing[int-pytree-eager] 0.1767ms 23.2144μs 43.0767 KOps/s 43.6165 KOps/s $\color{#d91a1a}-1.24\%$
test_mod_add[eager] 0.2199ms 37.9965μs 26.3182 KOps/s 25.2394 KOps/s $\color{#35bf28}+4.27\%$
test_mod_add[compile] 0.2674ms 71.8161μs 13.9244 KOps/s 14.0165 KOps/s $\color{#d91a1a}-0.66\%$
test_mod_add[compile-overhead] 0.2822ms 0.1534ms 6.5187 KOps/s 6.8010 KOps/s $\color{#d91a1a}-4.15\%$
test_mod_wrap[eager] 0.4430ms 0.2742ms 3.6475 KOps/s 3.7599 KOps/s $\color{#d91a1a}-2.99\%$
test_mod_wrap[compile] 0.5042ms 0.3142ms 3.1829 KOps/s 3.2849 KOps/s $\color{#d91a1a}-3.10\%$
test_mod_wrap[compile-overhead] 7.7877ms 4.2103ms 237.5149 Ops/s 234.3184 Ops/s $\color{#35bf28}+1.36\%$
test_mod_wrap_and_backward[eager] 1.6831ms 1.4735ms 678.6382 Ops/s 726.3950 Ops/s $\textbf{\color{#d91a1a}-6.57\%}$
test_mod_wrap_and_backward[compile] 1.8121ms 1.4848ms 673.4713 Ops/s 727.9457 Ops/s $\textbf{\color{#d91a1a}-7.48\%}$
test_mod_wrap_and_backward[compile-overhead] 1.4702ms 1.0089ms 991.1714 Ops/s 1.1079 KOps/s $\textbf{\color{#d91a1a}-10.54\%}$
test_seq_add[eager] 0.2915ms 0.1112ms 8.9945 KOps/s 8.7271 KOps/s $\color{#35bf28}+3.06\%$
test_seq_add[compile] 0.2324ms 87.0117μs 11.4927 KOps/s 11.4563 KOps/s $\color{#35bf28}+0.32\%$
test_seq_add[compile-overhead] 0.2713ms 0.1248ms 8.0142 KOps/s 7.8408 KOps/s $\color{#35bf28}+2.21\%$
test_seq_wrap[eager] 0.6049ms 0.4261ms 2.3469 KOps/s 2.2143 KOps/s $\textbf{\color{#35bf28}+5.99\%}$
test_seq_wrap[compile] 0.6865ms 0.3347ms 2.9878 KOps/s 2.9670 KOps/s $\color{#35bf28}+0.70\%$
test_seq_wrap[compile-overhead] 0.3216s 0.1516s 6.5958 Ops/s 6.5842 Ops/s $\color{#35bf28}+0.18\%$
test_func_call_runtime[False-eager] 0.9333ms 0.7531ms 1.3279 KOps/s 1.2896 KOps/s $\color{#35bf28}+2.97\%$
test_func_call_runtime[False-compile] 1.0031ms 0.8261ms 1.2105 KOps/s 1.2108 KOps/s $\color{#d91a1a}-0.03\%$
test_func_call_runtime[False-compile-overhead] 0.5245ms 0.3685ms 2.7136 KOps/s 2.7223 KOps/s $\color{#d91a1a}-0.32\%$
test_func_call_runtime[True-eager] 1.1873ms 1.0011ms 998.9010 Ops/s 989.6971 Ops/s $\color{#35bf28}+0.93\%$
test_func_call_runtime[True-compile] 1.0244ms 0.8683ms 1.1517 KOps/s 1.1436 KOps/s $\color{#35bf28}+0.71\%$
test_func_call_runtime[True-compile-overhead] 0.5703ms 0.4110ms 2.4334 KOps/s 2.4242 KOps/s $\color{#35bf28}+0.38\%$
test_distributed 3.4709ms 73.9131μs 13.5294 KOps/s 13.4913 KOps/s $\color{#35bf28}+0.28\%$
test_tdmodule 69.6310μs 16.5187μs 60.5373 KOps/s 55.2892 KOps/s $\textbf{\color{#35bf28}+9.49\%}$
test_tdmodule_dispatch 0.1653ms 34.1451μs 29.2868 KOps/s 27.3010 KOps/s $\textbf{\color{#35bf28}+7.27\%}$
test_tdseq 33.4000μs 17.2030μs 58.1294 KOps/s 53.6287 KOps/s $\textbf{\color{#35bf28}+8.39\%}$
test_tdseq_dispatch 53.6210μs 36.0184μs 27.7636 KOps/s 26.2028 KOps/s $\textbf{\color{#35bf28}+5.96\%}$
test_instantiation_functorch 2.2321ms 2.0020ms 499.4995 Ops/s 498.5295 Ops/s $\color{#35bf28}+0.19\%$
test_instantiation_td 1.9937ms 1.3028ms 767.5906 Ops/s 769.4617 Ops/s $\color{#d91a1a}-0.24\%$
test_exec_functorch 0.4267ms 0.2344ms 4.2667 KOps/s 4.2554 KOps/s $\color{#35bf28}+0.27\%$
test_exec_functional_call 0.4188ms 0.2331ms 4.2892 KOps/s 4.5030 KOps/s $\color{#d91a1a}-4.75\%$
test_exec_td 0.3942ms 0.2348ms 4.2588 KOps/s 4.5516 KOps/s $\textbf{\color{#d91a1a}-6.43\%}$
test_exec_td_decorator 0.4682ms 0.3096ms 3.2298 KOps/s 3.3535 KOps/s $\color{#d91a1a}-3.69\%$
test_vmap_mlp_speed[True-True] 1.1183ms 0.6958ms 1.4373 KOps/s 1.4281 KOps/s $\color{#35bf28}+0.64\%$
test_vmap_mlp_speed[True-False] 0.8765ms 0.6827ms 1.4648 KOps/s 1.4124 KOps/s $\color{#35bf28}+3.71\%$
test_vmap_mlp_speed[False-True] 0.7966ms 0.6111ms 1.6363 KOps/s 1.6118 KOps/s $\color{#35bf28}+1.52\%$
test_vmap_mlp_speed[False-False] 0.7833ms 0.6127ms 1.6320 KOps/s 1.6111 KOps/s $\color{#35bf28}+1.30\%$
test_vmap_mlp_speed_decorator[True-True] 0.9692ms 0.7510ms 1.3315 KOps/s 1.3163 KOps/s $\color{#35bf28}+1.16\%$
test_vmap_mlp_speed_decorator[True-False] 0.9366ms 0.7523ms 1.3292 KOps/s 1.3295 KOps/s $\color{#d91a1a}-0.02\%$
test_vmap_mlp_speed_decorator[False-True] 0.8339ms 0.6563ms 1.5236 KOps/s 1.4897 KOps/s $\color{#35bf28}+2.27\%$
test_vmap_mlp_speed_decorator[False-False] 0.8317ms 0.6678ms 1.4974 KOps/s 1.4658 KOps/s $\color{#35bf28}+2.16\%$
test_vmap_transformer_speed[True-True] 9.2923ms 8.8940ms 112.4350 Ops/s 112.6453 Ops/s $\color{#d91a1a}-0.19\%$
test_vmap_transformer_speed[True-False] 9.3734ms 8.8979ms 112.3858 Ops/s 112.9544 Ops/s $\color{#d91a1a}-0.50\%$
test_vmap_transformer_speed[False-True] 9.1483ms 8.8012ms 113.6204 Ops/s 113.8446 Ops/s $\color{#d91a1a}-0.20\%$
test_vmap_transformer_speed[False-False] 9.0937ms 8.7397ms 114.4205 Ops/s 114.3445 Ops/s $\color{#35bf28}+0.07\%$
test_vmap_transformer_speed_decorator[True-True] 21.7856ms 21.1285ms 47.3295 Ops/s 47.1948 Ops/s $\color{#35bf28}+0.29\%$
test_vmap_transformer_speed_decorator[True-False] 21.6959ms 21.1444ms 47.2938 Ops/s 47.3874 Ops/s $\color{#d91a1a}-0.20\%$
test_vmap_transformer_speed_decorator[False-True] 21.5274ms 20.9142ms 47.8145 Ops/s 47.5928 Ops/s $\color{#35bf28}+0.47\%$
test_vmap_transformer_speed_decorator[False-False] 21.8792ms 21.0227ms 47.5676 Ops/s 47.5417 Ops/s $\color{#35bf28}+0.05\%$
test_to_module_speed[True] 2.9815ms 1.4997ms 666.8188 Ops/s 679.5471 Ops/s $\color{#d91a1a}-1.87\%$
test_to_module_speed[False] 1.9680ms 1.4796ms 675.8790 Ops/s 685.7575 Ops/s $\color{#d91a1a}-1.44\%$
test_tc_init 78.9720μs 38.1392μs 26.2197 KOps/s 25.3523 KOps/s $\color{#35bf28}+3.42\%$
test_tc_init_nested 0.1256ms 78.8239μs 12.6865 KOps/s 12.2926 KOps/s $\color{#35bf28}+3.20\%$
test_tc_first_layer_tensor 3.5450μs 0.7999μs 1.2502 MOps/s 1.2522 MOps/s $\color{#d91a1a}-0.16\%$
test_tc_first_layer_nontensor 15.7910μs 2.5748μs 388.3834 KOps/s 391.4653 KOps/s $\color{#d91a1a}-0.79\%$
test_tc_second_layer_tensor 6.9267μs 1.6326μs 612.5333 KOps/s 611.9309 KOps/s $\color{#35bf28}+0.10\%$
test_tc_second_layer_nontensor 23.5300μs 3.3868μs 295.2646 KOps/s 294.0078 KOps/s $\color{#35bf28}+0.43\%$
test_unbind 0.3276s 12.8506ms 77.8172 Ops/s 76.7024 Ops/s $\color{#35bf28}+1.45\%$
test_full_like 0.7646ms 0.5795ms 1.7255 KOps/s 1.7288 KOps/s $\color{#d91a1a}-0.19\%$
test_zeros_like 0.3511ms 0.1982ms 5.0462 KOps/s 5.0475 KOps/s $\color{#d91a1a}-0.03\%$
test_ones_like 0.3716ms 0.1979ms 5.0521 KOps/s 5.0538 KOps/s $\color{#d91a1a}-0.03\%$
test_clone 0.6143ms 0.4151ms 2.4090 KOps/s 2.4085 KOps/s $\color{#35bf28}+0.02\%$
test_squeeze 37.5010μs 11.3206μs 88.3348 KOps/s 87.8854 KOps/s $\color{#35bf28}+0.51\%$
test_unsqueeze 0.2559ms 80.3090μs 12.4519 KOps/s 12.4369 KOps/s $\color{#35bf28}+0.12\%$
test_split 0.5125ms 0.1795ms 5.5704 KOps/s 5.7861 KOps/s $\color{#d91a1a}-3.73\%$
test_permute 0.3860ms 0.1899ms 5.2672 KOps/s 5.1432 KOps/s $\color{#35bf28}+2.41\%$
test_stack 1.3593ms 0.9012ms 1.1096 KOps/s 1.1070 KOps/s $\color{#35bf28}+0.24\%$
test_cat 1.3832ms 1.2321ms 811.6428 Ops/s 811.1853 Ops/s $\color{#35bf28}+0.06\%$

@vmoens vmoens merged commit 27402a5 into main Jul 24, 2024
8 of 22 checks passed
@vmoens vmoens deleted the rename-pinmem branch July 24, 2024 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants