Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] remove inplace updates when using td as a decorator #796

Merged
merged 1 commit into from
May 27, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented May 27, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 27, 2024
@vmoens vmoens merged commit 85b0204 into main May 27, 2024
36 of 38 checks passed
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}11$. Worsened: $\large\color{#d91a1a}13$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 36.0770μs 16.6264μs 60.1454 KOps/s 61.3415 KOps/s $\color{#d91a1a}-1.95\%$
test_plain_set_stack_nested 51.3050μs 16.7642μs 59.6510 KOps/s 60.9553 KOps/s $\color{#d91a1a}-2.14\%$
test_plain_set_nested_inplace 49.5220μs 19.1098μs 52.3291 KOps/s 52.3965 KOps/s $\color{#d91a1a}-0.13\%$
test_plain_set_stack_nested_inplace 51.4560μs 19.0593μs 52.4678 KOps/s 53.9400 KOps/s $\color{#d91a1a}-2.73\%$
test_items 15.5590μs 2.5255μs 395.9598 KOps/s 393.8502 KOps/s $\color{#35bf28}+0.54\%$
test_items_nested 0.4215ms 0.2590ms 3.8613 KOps/s 3.7580 KOps/s $\color{#35bf28}+2.75\%$
test_items_nested_locked 1.1214ms 0.2637ms 3.7924 KOps/s 3.6869 KOps/s $\color{#35bf28}+2.86\%$
test_items_nested_leaf 0.1511ms 76.7021μs 13.0375 KOps/s 12.9575 KOps/s $\color{#35bf28}+0.62\%$
test_items_stack_nested 0.4011ms 0.2629ms 3.8031 KOps/s 3.6974 KOps/s $\color{#35bf28}+2.86\%$
test_items_stack_nested_leaf 0.1531ms 78.5717μs 12.7272 KOps/s 12.6484 KOps/s $\color{#35bf28}+0.62\%$
test_items_stack_nested_locked 0.4241ms 0.2663ms 3.7546 KOps/s 3.6911 KOps/s $\color{#35bf28}+1.72\%$
test_keys 19.1660μs 3.8554μs 259.3760 KOps/s 262.5280 KOps/s $\color{#d91a1a}-1.20\%$
test_keys_nested 0.2579ms 0.1358ms 7.3663 KOps/s 7.2424 KOps/s $\color{#35bf28}+1.71\%$
test_keys_nested_locked 0.6679ms 0.1405ms 7.1163 KOps/s 7.0466 KOps/s $\color{#35bf28}+0.99\%$
test_keys_nested_leaf 0.1763ms 0.1136ms 8.8001 KOps/s 8.6124 KOps/s $\color{#35bf28}+2.18\%$
test_keys_stack_nested 0.2504ms 0.1353ms 7.3889 KOps/s 7.3031 KOps/s $\color{#35bf28}+1.18\%$
test_keys_stack_nested_leaf 0.2006ms 0.1140ms 8.7753 KOps/s 8.5889 KOps/s $\color{#35bf28}+2.17\%$
test_keys_stack_nested_locked 0.3109ms 0.1412ms 7.0805 KOps/s 7.0513 KOps/s $\color{#35bf28}+0.41\%$
test_values 8.7613μs 1.1592μs 862.6760 KOps/s 863.5318 KOps/s $\color{#d91a1a}-0.10\%$
test_values_nested 0.1022ms 50.2421μs 19.9036 KOps/s 19.7791 KOps/s $\color{#35bf28}+0.63\%$
test_values_nested_locked 0.1081ms 50.0313μs 19.9875 KOps/s 19.8979 KOps/s $\color{#35bf28}+0.45\%$
test_values_nested_leaf 0.1031ms 45.1077μs 22.1691 KOps/s 21.8273 KOps/s $\color{#35bf28}+1.57\%$
test_values_stack_nested 0.1048ms 51.1514μs 19.5498 KOps/s 19.3183 KOps/s $\color{#35bf28}+1.20\%$
test_values_stack_nested_leaf 96.8600μs 44.7717μs 22.3355 KOps/s 22.0130 KOps/s $\color{#35bf28}+1.47\%$
test_values_stack_nested_locked 93.5540μs 50.7467μs 19.7057 KOps/s 19.4757 KOps/s $\color{#35bf28}+1.18\%$
test_membership 15.4990μs 1.3201μs 757.5261 KOps/s 742.5469 KOps/s $\color{#35bf28}+2.02\%$
test_membership_nested 37.0690μs 3.4351μs 291.1107 KOps/s 288.6640 KOps/s $\color{#35bf28}+0.85\%$
test_membership_nested_leaf 31.7590μs 3.4443μs 290.3306 KOps/s 285.6046 KOps/s $\color{#35bf28}+1.65\%$
test_membership_stacked_nested 35.3860μs 3.4527μs 289.6287 KOps/s 286.4016 KOps/s $\color{#35bf28}+1.13\%$
test_membership_stacked_nested_leaf 17.7420μs 3.4544μs 289.4859 KOps/s 286.9823 KOps/s $\color{#35bf28}+0.87\%$
test_membership_nested_last 34.1340μs 4.2214μs 236.8906 KOps/s 213.0825 KOps/s $\textbf{\color{#35bf28}+11.17\%}$
test_membership_nested_leaf_last 25.8880μs 4.1894μs 238.6962 KOps/s 236.8380 KOps/s $\color{#35bf28}+0.78\%$
test_membership_stacked_nested_last 37.0190μs 5.7662μs 173.4240 KOps/s 190.0648 KOps/s $\textbf{\color{#d91a1a}-8.76\%}$
test_membership_stacked_nested_leaf_last 40.5250μs 5.7609μs 173.5836 KOps/s 185.8135 KOps/s $\textbf{\color{#d91a1a}-6.58\%}$
test_nested_getleaf 41.1160μs 10.5584μs 94.7113 KOps/s 90.0767 KOps/s $\textbf{\color{#35bf28}+5.15\%}$
test_nested_get 41.7680μs 10.1035μs 98.9754 KOps/s 96.7879 KOps/s $\color{#35bf28}+2.26\%$
test_stacked_getleaf 31.1580μs 10.5290μs 94.9761 KOps/s 91.6780 KOps/s $\color{#35bf28}+3.60\%$
test_stacked_get 48.7310μs 10.0061μs 99.9391 KOps/s 97.4725 KOps/s $\color{#35bf28}+2.53\%$
test_nested_getitemleaf 40.5750μs 11.1149μs 89.9692 KOps/s 85.6543 KOps/s $\textbf{\color{#35bf28}+5.04\%}$
test_nested_getitem 52.0270μs 10.1638μs 98.3887 KOps/s 92.6856 KOps/s $\textbf{\color{#35bf28}+6.15\%}$
test_stacked_getitemleaf 45.5450μs 10.9513μs 91.3138 KOps/s 87.8238 KOps/s $\color{#35bf28}+3.97\%$
test_stacked_getitem 33.4220μs 10.2581μs 97.4843 KOps/s 94.8254 KOps/s $\color{#35bf28}+2.80\%$
test_lock_nested 45.3264ms 0.4020ms 2.4878 KOps/s 2.8712 KOps/s $\textbf{\color{#d91a1a}-13.35\%}$
test_lock_stack_nested 0.4732ms 0.3079ms 3.2477 KOps/s 3.2439 KOps/s $\color{#35bf28}+0.11\%$
test_unlock_nested 0.7559ms 0.3527ms 2.8351 KOps/s 2.4989 KOps/s $\textbf{\color{#35bf28}+13.45\%}$
test_unlock_stack_nested 0.5101ms 0.3158ms 3.1665 KOps/s 3.1604 KOps/s $\color{#35bf28}+0.19\%$
test_flatten_speed 0.1972ms 97.0291μs 10.3062 KOps/s 10.3264 KOps/s $\color{#d91a1a}-0.20\%$
test_unflatten_speed 0.7185ms 0.4105ms 2.4361 KOps/s 2.3394 KOps/s $\color{#35bf28}+4.13\%$
test_common_ops 1.4378ms 0.6918ms 1.4455 KOps/s 1.4764 KOps/s $\color{#d91a1a}-2.09\%$
test_creation 27.9520μs 1.9220μs 520.2855 KOps/s 532.4833 KOps/s $\color{#d91a1a}-2.29\%$
test_creation_empty 28.1720μs 9.9139μs 100.8683 KOps/s 112.9277 KOps/s $\textbf{\color{#d91a1a}-10.68\%}$
test_creation_nested_1 40.2440μs 12.6404μs 79.1112 KOps/s 85.2265 KOps/s $\textbf{\color{#d91a1a}-7.18\%}$
test_creation_nested_2 39.3230μs 16.0729μs 62.2167 KOps/s 66.9231 KOps/s $\textbf{\color{#d91a1a}-7.03\%}$
test_clone 89.8170μs 13.2114μs 75.6924 KOps/s 74.4858 KOps/s $\color{#35bf28}+1.62\%$
test_getitem[int] 30.8070μs 11.4969μs 86.9799 KOps/s 86.1730 KOps/s $\color{#35bf28}+0.94\%$
test_getitem[slice_int] 59.9210μs 22.9777μs 43.5206 KOps/s 42.6225 KOps/s $\color{#35bf28}+2.11\%$
test_getitem[range] 78.6860μs 58.2080μs 17.1798 KOps/s 16.5756 KOps/s $\color{#35bf28}+3.65\%$
test_getitem[tuple] 51.0240μs 18.8451μs 53.0641 KOps/s 50.5683 KOps/s $\color{#35bf28}+4.94\%$
test_getitem[list] 0.1056ms 40.6501μs 24.6002 KOps/s 24.1126 KOps/s $\color{#35bf28}+2.02\%$
test_setitem_dim[int] 60.5230μs 33.5250μs 29.8285 KOps/s 29.8343 KOps/s $\color{#d91a1a}-0.02\%$
test_setitem_dim[slice_int] 97.8810μs 59.9365μs 16.6843 KOps/s 15.4304 KOps/s $\textbf{\color{#35bf28}+8.13\%}$
test_setitem_dim[range] 0.1341ms 83.3738μs 11.9942 KOps/s 11.1501 KOps/s $\textbf{\color{#35bf28}+7.57\%}$
test_setitem_dim[tuple] 80.5400μs 48.2393μs 20.7300 KOps/s 19.8883 KOps/s $\color{#35bf28}+4.23\%$
test_setitem 56.9160μs 19.9282μs 50.1800 KOps/s 52.9062 KOps/s $\textbf{\color{#d91a1a}-5.15\%}$
test_set 52.8180μs 19.4589μs 51.3904 KOps/s 54.0419 KOps/s $\color{#d91a1a}-4.91\%$
test_set_shared 3.0270ms 0.1414ms 7.0717 KOps/s 6.9235 KOps/s $\color{#35bf28}+2.14\%$
test_update 0.1342ms 21.1766μs 47.2218 KOps/s 51.0061 KOps/s $\textbf{\color{#d91a1a}-7.42\%}$
test_update_nested 75.7500μs 29.5114μs 33.8852 KOps/s 36.0717 KOps/s $\textbf{\color{#d91a1a}-6.06\%}$
test_update__nested 60.5730μs 25.3453μs 39.4551 KOps/s 39.0802 KOps/s $\color{#35bf28}+0.96\%$
test_set_nested 58.7790μs 21.5264μs 46.4545 KOps/s 48.3434 KOps/s $\color{#d91a1a}-3.91\%$
test_set_nested_new 55.7840μs 25.1563μs 39.7514 KOps/s 39.9007 KOps/s $\color{#d91a1a}-0.37\%$
test_select 1.0543ms 40.5702μs 24.6486 KOps/s 24.7371 KOps/s $\color{#d91a1a}-0.36\%$
test_select_nested 0.1126ms 59.3928μs 16.8371 KOps/s 16.3309 KOps/s $\color{#35bf28}+3.10\%$
test_exclude_nested 0.2018ms 0.1196ms 8.3612 KOps/s 8.0589 KOps/s $\color{#35bf28}+3.75\%$
test_empty[True] 0.6177ms 0.3878ms 2.5789 KOps/s 2.5015 KOps/s $\color{#35bf28}+3.09\%$
test_empty[False] 9.3416μs 1.1444μs 873.8016 KOps/s 861.4894 KOps/s $\color{#35bf28}+1.43\%$
test_unbind_speed 0.4844ms 0.2613ms 3.8267 KOps/s 3.8765 KOps/s $\color{#d91a1a}-1.28\%$
test_unbind_speed_stack0 0.3930ms 0.2557ms 3.9110 KOps/s 3.9671 KOps/s $\color{#d91a1a}-1.41\%$
test_unbind_speed_stack1 64.6638ms 0.7585ms 1.3183 KOps/s 1.3097 KOps/s $\color{#35bf28}+0.66\%$
test_split 66.1914ms 1.5844ms 631.1513 Ops/s 618.3391 Ops/s $\color{#35bf28}+2.07\%$
test_chunk 60.9171ms 1.5793ms 633.1722 Ops/s 618.5913 Ops/s $\color{#35bf28}+2.36\%$
test_creation[device0] 3.5171ms 86.2278μs 11.5972 KOps/s 11.6813 KOps/s $\color{#d91a1a}-0.72\%$
test_creation_from_tensor 0.1680ms 85.6406μs 11.6767 KOps/s 11.5115 KOps/s $\color{#35bf28}+1.44\%$
test_add_one[memmap_tensor0] 60.9830μs 5.2965μs 188.8053 KOps/s 181.1228 KOps/s $\color{#35bf28}+4.24\%$
test_contiguous[memmap_tensor0] 6.1510μs 0.6364μs 1.5714 MOps/s 1.5520 MOps/s $\color{#35bf28}+1.25\%$
test_stack[memmap_tensor0] 21.7900μs 3.5177μs 284.2772 KOps/s 276.9000 KOps/s $\color{#35bf28}+2.66\%$
test_memmaptd_index 1.0506ms 0.2508ms 3.9869 KOps/s 3.8931 KOps/s $\color{#35bf28}+2.41\%$
test_memmaptd_index_astensor 0.5309ms 0.3243ms 3.0832 KOps/s 3.0569 KOps/s $\color{#35bf28}+0.86\%$
test_memmaptd_index_op 0.9303ms 0.5912ms 1.6914 KOps/s 1.7049 KOps/s $\color{#d91a1a}-0.79\%$
test_serialize_model 0.1698s 0.1155s 8.6578 Ops/s 8.3599 Ops/s $\color{#35bf28}+3.56\%$
test_serialize_model_pickle 0.4510s 0.3786s 2.6411 Ops/s 2.6549 Ops/s $\color{#d91a1a}-0.52\%$
test_serialize_weights 0.1662s 0.1116s 8.9569 Ops/s 8.8279 Ops/s $\color{#35bf28}+1.46\%$
test_serialize_weights_returnearly 0.1914s 0.1346s 7.4275 Ops/s 7.0918 Ops/s $\color{#35bf28}+4.73\%$
test_serialize_weights_pickle 1.0556s 0.6101s 1.6391 Ops/s 2.3188 Ops/s $\textbf{\color{#d91a1a}-29.31\%}$
test_serialize_weights_filesystem 97.0289ms 92.0905ms 10.8589 Ops/s 10.4861 Ops/s $\color{#35bf28}+3.56\%$
test_serialize_model_filesystem 0.1017s 92.5041ms 10.8103 Ops/s 10.0789 Ops/s $\textbf{\color{#35bf28}+7.26\%}$
test_reshape_pytree 61.9250μs 25.3895μs 39.3864 KOps/s 38.8305 KOps/s $\color{#35bf28}+1.43\%$
test_reshape_td 0.1052ms 34.3397μs 29.1208 KOps/s 28.5553 KOps/s $\color{#35bf28}+1.98\%$
test_view_pytree 77.2430μs 24.9930μs 40.0112 KOps/s 39.1539 KOps/s $\color{#35bf28}+2.19\%$
test_view_td 88.8850μs 38.7885μs 25.7808 KOps/s 25.8699 KOps/s $\color{#d91a1a}-0.34\%$
test_unbind_pytree 70.7510μs 29.5640μs 33.8250 KOps/s 34.3121 KOps/s $\color{#d91a1a}-1.42\%$
test_unbind_td 0.4213ms 38.0848μs 26.2572 KOps/s 26.1034 KOps/s $\color{#35bf28}+0.59\%$
test_split_pytree 61.3240μs 29.4742μs 33.9280 KOps/s 34.4032 KOps/s $\color{#d91a1a}-1.38\%$
test_split_td 0.1262ms 40.9813μs 24.4014 KOps/s 24.7333 KOps/s $\color{#d91a1a}-1.34\%$
test_add_pytree 0.1066ms 33.5743μs 29.7847 KOps/s 28.8130 KOps/s $\color{#35bf28}+3.37\%$
test_add_td 0.1195ms 52.7081μs 18.9724 KOps/s 18.4598 KOps/s $\color{#35bf28}+2.78\%$
test_distributed 0.2137ms 99.3720μs 10.0632 KOps/s 9.7127 KOps/s $\color{#35bf28}+3.61\%$
test_tdmodule 31.2480μs 16.5974μs 60.2505 KOps/s 59.1349 KOps/s $\color{#35bf28}+1.89\%$
test_tdmodule_dispatch 61.8450μs 33.0285μs 30.2769 KOps/s 30.3330 KOps/s $\color{#d91a1a}-0.19\%$
test_tdseq 45.9650μs 19.3894μs 51.5745 KOps/s 50.0548 KOps/s $\color{#35bf28}+3.04\%$
test_tdseq_dispatch 55.1620μs 38.3679μs 26.0634 KOps/s 25.8408 KOps/s $\color{#35bf28}+0.86\%$
test_instantiation_functorch 1.9339ms 1.2959ms 771.6425 Ops/s 748.3052 Ops/s $\color{#35bf28}+3.12\%$
test_instantiation_td 1.4512ms 1.0105ms 989.5759 Ops/s 986.3358 Ops/s $\color{#35bf28}+0.33\%$
test_exec_functorch 0.3158ms 0.1557ms 6.4233 KOps/s 6.1480 KOps/s $\color{#35bf28}+4.48\%$
test_exec_functional_call 0.3035ms 0.1490ms 6.7103 KOps/s 6.6919 KOps/s $\color{#35bf28}+0.27\%$
test_exec_td 0.2833ms 0.1447ms 6.9089 KOps/s 6.6817 KOps/s $\color{#35bf28}+3.40\%$
test_exec_td_decorator 0.8267ms 0.2195ms 4.5560 KOps/s 4.5039 KOps/s $\color{#35bf28}+1.16\%$
test_vmap_mlp_speed[True-True] 0.6533ms 0.4798ms 2.0841 KOps/s 2.0552 KOps/s $\color{#35bf28}+1.40\%$
test_vmap_mlp_speed[True-False] 0.8525ms 0.4789ms 2.0881 KOps/s 2.0684 KOps/s $\color{#35bf28}+0.95\%$
test_vmap_mlp_speed[False-True] 0.7735ms 0.3943ms 2.5364 KOps/s 2.4921 KOps/s $\color{#35bf28}+1.78\%$
test_vmap_mlp_speed[False-False] 0.6591ms 0.3943ms 2.5364 KOps/s 2.5224 KOps/s $\color{#35bf28}+0.55\%$
test_vmap_mlp_speed_decorator[True-True] 0.9493ms 0.5502ms 1.8174 KOps/s 1.8130 KOps/s $\color{#35bf28}+0.24\%$
test_vmap_mlp_speed_decorator[True-False] 0.7585ms 0.5493ms 1.8205 KOps/s 1.8187 KOps/s $\color{#35bf28}+0.09\%$
test_vmap_mlp_speed_decorator[False-True] 0.7350ms 0.4559ms 2.1936 KOps/s 2.1782 KOps/s $\color{#35bf28}+0.71\%$
test_vmap_mlp_speed_decorator[False-False] 0.8506ms 0.4570ms 2.1884 KOps/s 2.1815 KOps/s $\color{#35bf28}+0.31\%$
test_to_module_speed[True] 2.5338ms 1.6767ms 596.4135 Ops/s 583.2646 Ops/s $\color{#35bf28}+2.25\%$
test_to_module_speed[False] 1.9765ms 1.6425ms 608.8304 Ops/s 596.7746 Ops/s $\color{#35bf28}+2.02\%$
test_tc_init 56.9150μs 27.1345μs 36.8535 KOps/s 39.9541 KOps/s $\textbf{\color{#d91a1a}-7.76\%}$
test_tc_init_nested 0.1111ms 55.2586μs 18.0967 KOps/s 20.2900 KOps/s $\textbf{\color{#d91a1a}-10.81\%}$
test_tc_first_layer_tensor 5.8211μs 0.6804μs 1.4697 MOps/s 1.4202 MOps/s $\color{#35bf28}+3.49\%$
test_tc_first_layer_nontensor 1.8845μs 0.6528μs 1.5318 MOps/s 1.4536 MOps/s $\textbf{\color{#35bf28}+5.38\%}$
test_tc_second_layer_tensor 28.3930μs 1.8269μs 547.3611 KOps/s 531.8053 KOps/s $\color{#35bf28}+2.93\%$
test_tc_second_layer_nontensor 13.3050μs 1.6076μs 622.0620 KOps/s 649.3007 KOps/s $\color{#d91a1a}-4.20\%$
test_unbind 5.5329ms 5.1715ms 193.3659 Ops/s 142.4122 Ops/s $\textbf{\color{#35bf28}+35.78\%}$
test_full_like 17.1019ms 11.5154ms 86.8406 Ops/s 130.2399 Ops/s $\textbf{\color{#d91a1a}-33.32\%}$
test_zeros_like 11.4868ms 5.9739ms 167.3958 Ops/s 173.4469 Ops/s $\color{#d91a1a}-3.49\%$
test_ones_like 13.9096ms 6.5227ms 153.3097 Ops/s 161.3338 Ops/s $\color{#d91a1a}-4.97\%$
test_clone 14.2174ms 7.5964ms 131.6411 Ops/s 130.4626 Ops/s $\color{#35bf28}+0.90\%$
test_squeeze 88.3140μs 13.7668μs 72.6387 KOps/s 69.9679 KOps/s $\color{#35bf28}+3.82\%$
test_unsqueeze 0.1248ms 67.8432μs 14.7399 KOps/s 14.2068 KOps/s $\color{#35bf28}+3.75\%$
test_split 0.4422ms 0.1089ms 9.1835 KOps/s 8.7282 KOps/s $\textbf{\color{#35bf28}+5.22\%}$
test_permute 0.3123ms 0.1350ms 7.4093 KOps/s 7.2218 KOps/s $\color{#35bf28}+2.60\%$
test_stack 28.9768ms 21.3566ms 46.8240 Ops/s 46.0072 Ops/s $\color{#35bf28}+1.78\%$
test_cat 27.0671ms 21.6524ms 46.1842 Ops/s 45.6588 Ops/s $\color{#35bf28}+1.15\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.5217ms 13.5560μs 73.7681 KOps/s 73.2226 KOps/s $\color{#35bf28}+0.74\%$
test_plain_set_stack_nested 32.0400μs 13.5663μs 73.7119 KOps/s 72.7300 KOps/s $\color{#35bf28}+1.35\%$
test_plain_set_nested_inplace 84.3510μs 14.7286μs 67.8950 KOps/s 66.9349 KOps/s $\color{#35bf28}+1.43\%$
test_plain_set_stack_nested_inplace 55.2710μs 14.8757μs 67.2236 KOps/s 66.6207 KOps/s $\color{#35bf28}+0.90\%$
test_items 0.1968ms 4.7042μs 212.5772 KOps/s 208.8366 KOps/s $\color{#35bf28}+1.79\%$
test_items_nested 0.3864ms 0.3420ms 2.9239 KOps/s 2.9275 KOps/s $\color{#d91a1a}-0.12\%$
test_items_nested_locked 0.3766ms 0.3435ms 2.9111 KOps/s 2.8583 KOps/s $\color{#35bf28}+1.85\%$
test_items_nested_leaf 0.1118ms 83.5191μs 11.9733 KOps/s 11.9070 KOps/s $\color{#35bf28}+0.56\%$
test_items_stack_nested 0.4191ms 0.3447ms 2.9011 KOps/s 2.8622 KOps/s $\color{#35bf28}+1.36\%$
test_items_stack_nested_leaf 0.1085ms 84.9435μs 11.7725 KOps/s 11.7296 KOps/s $\color{#35bf28}+0.37\%$
test_items_stack_nested_locked 0.3773ms 0.3484ms 2.8700 KOps/s 2.9102 KOps/s $\color{#d91a1a}-1.38\%$
test_keys 21.8900μs 4.4279μs 225.8406 KOps/s 227.8274 KOps/s $\color{#d91a1a}-0.87\%$
test_keys_nested 0.1402ms 69.9763μs 14.2905 KOps/s 14.3798 KOps/s $\color{#d91a1a}-0.62\%$
test_keys_nested_locked 0.7650ms 74.6741μs 13.3915 KOps/s 13.4234 KOps/s $\color{#d91a1a}-0.24\%$
test_keys_nested_leaf 87.3810μs 60.4702μs 16.5371 KOps/s 16.5495 KOps/s $\color{#d91a1a}-0.08\%$
test_keys_stack_nested 92.8010μs 69.0111μs 14.4904 KOps/s 14.5483 KOps/s $\color{#d91a1a}-0.40\%$
test_keys_stack_nested_leaf 98.1420μs 60.5929μs 16.5036 KOps/s 16.6989 KOps/s $\color{#d91a1a}-1.17\%$
test_keys_stack_nested_locked 0.1021ms 73.9847μs 13.5163 KOps/s 13.5390 KOps/s $\color{#d91a1a}-0.17\%$
test_values 29.2037μs 1.8609μs 537.3657 KOps/s 542.9509 KOps/s $\color{#d91a1a}-1.03\%$
test_values_nested 66.8910μs 37.2472μs 26.8477 KOps/s 27.2059 KOps/s $\color{#d91a1a}-1.32\%$
test_values_nested_locked 62.3210μs 39.1262μs 25.5583 KOps/s 25.9844 KOps/s $\color{#d91a1a}-1.64\%$
test_values_nested_leaf 65.1010μs 33.0182μs 30.2864 KOps/s 30.5088 KOps/s $\color{#d91a1a}-0.73\%$
test_values_stack_nested 57.1110μs 38.2060μs 26.1739 KOps/s 26.3962 KOps/s $\color{#d91a1a}-0.84\%$
test_values_stack_nested_leaf 69.7100μs 33.9893μs 29.4211 KOps/s 29.7110 KOps/s $\color{#d91a1a}-0.98\%$
test_values_stack_nested_locked 80.8010μs 39.8312μs 25.1060 KOps/s 25.4313 KOps/s $\color{#d91a1a}-1.28\%$
test_membership 4.2257μs 0.7696μs 1.2993 MOps/s 1.3056 MOps/s $\color{#d91a1a}-0.48\%$
test_membership_nested 33.9700μs 2.6899μs 371.7604 KOps/s 371.0268 KOps/s $\color{#35bf28}+0.20\%$
test_membership_nested_leaf 17.4300μs 2.6907μs 371.6456 KOps/s 368.0398 KOps/s $\color{#35bf28}+0.98\%$
test_membership_stacked_nested 47.7810μs 2.6749μs 373.8515 KOps/s 367.5661 KOps/s $\color{#35bf28}+1.71\%$
test_membership_stacked_nested_leaf 19.5410μs 2.6917μs 371.5164 KOps/s 367.8240 KOps/s $\color{#35bf28}+1.00\%$
test_membership_nested_last 44.3700μs 3.2735μs 305.4821 KOps/s 305.6602 KOps/s $\color{#d91a1a}-0.06\%$
test_membership_nested_leaf_last 21.4700μs 3.2640μs 306.3759 KOps/s 305.7440 KOps/s $\color{#35bf28}+0.21\%$
test_membership_stacked_nested_last 20.0100μs 6.4383μs 155.3207 KOps/s 209.0258 KOps/s $\textbf{\color{#d91a1a}-25.69\%}$
test_membership_stacked_nested_leaf_last 83.9800μs 6.5234μs 153.2944 KOps/s 208.8175 KOps/s $\textbf{\color{#d91a1a}-26.59\%}$
test_nested_getleaf 54.4410μs 8.5335μs 117.1855 KOps/s 116.3205 KOps/s $\color{#35bf28}+0.74\%$
test_nested_get 23.7500μs 8.0057μs 124.9111 KOps/s 124.6785 KOps/s $\color{#35bf28}+0.19\%$
test_stacked_getleaf 23.9600μs 8.5935μs 116.3669 KOps/s 116.3375 KOps/s $\color{#35bf28}+0.03\%$
test_stacked_get 39.1700μs 8.0289μs 124.5499 KOps/s 124.4842 KOps/s $\color{#35bf28}+0.05\%$
test_nested_getitemleaf 23.8490μs 8.6705μs 115.3329 KOps/s 114.2321 KOps/s $\color{#35bf28}+0.96\%$
test_nested_getitem 81.5910μs 8.1310μs 122.9863 KOps/s 122.7134 KOps/s $\color{#35bf28}+0.22\%$
test_stacked_getitemleaf 37.3610μs 8.7320μs 114.5219 KOps/s 113.7947 KOps/s $\color{#35bf28}+0.64\%$
test_stacked_getitem 41.9010μs 8.1391μs 122.8644 KOps/s 122.4116 KOps/s $\color{#35bf28}+0.37\%$
test_lock_nested 60.7644ms 0.4330ms 2.3092 KOps/s 2.3175 KOps/s $\color{#d91a1a}-0.36\%$
test_lock_stack_nested 0.3734ms 0.3194ms 3.1307 KOps/s 3.0789 KOps/s $\color{#35bf28}+1.68\%$
test_unlock_nested 0.8795ms 0.3704ms 2.6996 KOps/s 2.6815 KOps/s $\color{#35bf28}+0.67\%$
test_unlock_stack_nested 0.3970ms 0.3251ms 3.0762 KOps/s 3.0180 KOps/s $\color{#35bf28}+1.93\%$
test_flatten_speed 0.2583ms 0.1028ms 9.7264 KOps/s 9.7098 KOps/s $\color{#35bf28}+0.17\%$
test_unflatten_speed 0.3449ms 0.3048ms 3.2812 KOps/s 3.2941 KOps/s $\color{#d91a1a}-0.39\%$
test_common_ops 1.2318ms 0.6198ms 1.6134 KOps/s 1.6043 KOps/s $\color{#35bf28}+0.57\%$
test_creation 34.2800μs 1.7450μs 573.0515 KOps/s 585.1280 KOps/s $\color{#d91a1a}-2.06\%$
test_creation_empty 23.4600μs 9.2622μs 107.9660 KOps/s 105.5154 KOps/s $\color{#35bf28}+2.32\%$
test_creation_nested_1 0.1956ms 11.1398μs 89.7681 KOps/s 87.8252 KOps/s $\color{#35bf28}+2.21\%$
test_creation_nested_2 45.1210μs 13.3537μs 74.8859 KOps/s 73.4515 KOps/s $\color{#35bf28}+1.95\%$
test_clone 57.1000μs 12.0563μs 82.9440 KOps/s 78.5781 KOps/s $\textbf{\color{#35bf28}+5.56\%}$
test_getitem[int] 31.0200μs 11.6920μs 85.5287 KOps/s 86.1695 KOps/s $\color{#d91a1a}-0.74\%$
test_getitem[slice_int] 40.3700μs 21.9867μs 45.4821 KOps/s 45.1785 KOps/s $\color{#35bf28}+0.67\%$
test_getitem[range] 68.1500μs 49.9384μs 20.0247 KOps/s 19.8971 KOps/s $\color{#35bf28}+0.64\%$
test_getitem[tuple] 49.2000μs 19.6476μs 50.8967 KOps/s 50.3051 KOps/s $\color{#35bf28}+1.18\%$
test_getitem[list] 0.1682ms 36.0000μs 27.7778 KOps/s 28.3027 KOps/s $\color{#d91a1a}-1.85\%$
test_setitem_dim[int] 48.0100μs 31.0527μs 32.2033 KOps/s 30.3619 KOps/s $\textbf{\color{#35bf28}+6.06\%}$
test_setitem_dim[slice_int] 0.1007ms 54.1339μs 18.4727 KOps/s 18.6854 KOps/s $\color{#d91a1a}-1.14\%$
test_setitem_dim[range] 95.3810μs 74.3668μs 13.4469 KOps/s 14.0912 KOps/s $\color{#d91a1a}-4.57\%$
test_setitem_dim[tuple] 66.2910μs 47.9266μs 20.8652 KOps/s 20.9598 KOps/s $\color{#d91a1a}-0.45\%$
test_setitem 59.4110μs 17.5051μs 57.1264 KOps/s 55.1142 KOps/s $\color{#35bf28}+3.65\%$
test_set 52.0210μs 16.8381μs 59.3893 KOps/s 56.8277 KOps/s $\color{#35bf28}+4.51\%$
test_set_shared 1.1827ms 0.1009ms 9.9126 KOps/s 9.8573 KOps/s $\color{#35bf28}+0.56\%$
test_update 0.1022ms 19.2456μs 51.9598 KOps/s 49.2714 KOps/s $\textbf{\color{#35bf28}+5.46\%}$
test_update_nested 0.1064ms 24.9637μs 40.0581 KOps/s 39.0831 KOps/s $\color{#35bf28}+2.49\%$
test_update__nested 62.6300μs 23.4131μs 42.7112 KOps/s 38.6984 KOps/s $\textbf{\color{#35bf28}+10.37\%}$
test_set_nested 0.1926ms 18.1778μs 55.0121 KOps/s 49.7178 KOps/s $\textbf{\color{#35bf28}+10.65\%}$
test_set_nested_new 0.1903ms 21.0817μs 47.4345 KOps/s 42.4422 KOps/s $\textbf{\color{#35bf28}+11.76\%}$
test_select 0.2396ms 35.4557μs 28.2042 KOps/s 27.4638 KOps/s $\color{#35bf28}+2.70\%$
test_select_nested 0.8065ms 56.5990μs 17.6682 KOps/s 17.5070 KOps/s $\color{#35bf28}+0.92\%$
test_exclude_nested 0.1334ms 0.1130ms 8.8522 KOps/s 8.7727 KOps/s $\color{#35bf28}+0.91\%$
test_empty[True] 0.5593ms 0.3593ms 2.7832 KOps/s 2.8093 KOps/s $\color{#d91a1a}-0.93\%$
test_empty[False] 6.7780μs 0.9857μs 1.0146 MOps/s 1.0284 MOps/s $\color{#d91a1a}-1.34\%$
test_to 0.1048ms 78.3895μs 12.7568 KOps/s 12.5502 KOps/s $\color{#35bf28}+1.65\%$
test_to_nonblocking 0.2673ms 68.3490μs 14.6308 KOps/s 15.7962 KOps/s $\textbf{\color{#d91a1a}-7.38\%}$
test_unbind_speed 0.3433ms 0.2858ms 3.4987 KOps/s 3.4887 KOps/s $\color{#35bf28}+0.29\%$
test_unbind_speed_stack0 0.4265ms 0.2846ms 3.5140 KOps/s 3.4631 KOps/s $\color{#35bf28}+1.47\%$
test_unbind_speed_stack1 77.2376ms 0.8659ms 1.1549 KOps/s 1.1776 KOps/s $\color{#d91a1a}-1.93\%$
test_split 78.5357ms 1.7487ms 571.8573 Ops/s 564.4338 Ops/s $\color{#35bf28}+1.32\%$
test_chunk 77.7322ms 1.7398ms 574.7784 Ops/s 564.8296 Ops/s $\color{#35bf28}+1.76\%$
test_creation[device0] 0.1859ms 60.1199μs 16.6334 KOps/s 16.6008 KOps/s $\color{#35bf28}+0.20\%$
test_creation_from_tensor 0.1922ms 56.0172μs 17.8517 KOps/s 16.3600 KOps/s $\textbf{\color{#35bf28}+9.12\%}$
test_add_one[memmap_tensor0] 64.5610μs 7.3190μs 136.6301 KOps/s 127.6353 KOps/s $\textbf{\color{#35bf28}+7.05\%}$
test_contiguous[memmap_tensor0] 34.1900μs 0.6936μs 1.4417 MOps/s 1.4231 MOps/s $\color{#35bf28}+1.30\%$
test_stack[memmap_tensor0] 36.7610μs 4.9107μs 203.6354 KOps/s 198.5167 KOps/s $\color{#35bf28}+2.58\%$
test_memmaptd_index 1.0948ms 0.2994ms 3.3405 KOps/s 3.3251 KOps/s $\color{#35bf28}+0.46\%$
test_memmaptd_index_astensor 0.7423ms 0.3740ms 2.6740 KOps/s 2.6709 KOps/s $\color{#35bf28}+0.12\%$
test_memmaptd_index_op 1.2315ms 0.6889ms 1.4516 KOps/s 1.4063 KOps/s $\color{#35bf28}+3.22\%$
test_serialize_model 0.1853s 0.1140s 8.7699 Ops/s 8.5538 Ops/s $\color{#35bf28}+2.53\%$
test_serialize_model_pickle 1.3497s 1.2353s 0.8095 Ops/s 0.8086 Ops/s $\color{#35bf28}+0.11\%$
test_serialize_weights 0.1827s 0.1106s 9.0424 Ops/s 8.5971 Ops/s $\textbf{\color{#35bf28}+5.18\%}$
test_serialize_weights_returnearly 0.2847s 0.1088s 9.1941 Ops/s 11.9003 Ops/s $\textbf{\color{#d91a1a}-22.74\%}$
test_serialize_weights_pickle 1.3500s 1.2484s 0.8010 Ops/s 0.8008 Ops/s $\color{#35bf28}+0.03\%$
test_reshape_pytree 0.1285ms 27.5747μs 36.2651 KOps/s 35.9773 KOps/s $\color{#35bf28}+0.80\%$
test_reshape_td 0.1070ms 32.4813μs 30.7870 KOps/s 29.9408 KOps/s $\color{#35bf28}+2.83\%$
test_view_pytree 0.1255ms 27.3900μs 36.5097 KOps/s 36.4922 KOps/s $\color{#35bf28}+0.05\%$
test_view_td 0.1596ms 36.9585μs 27.0574 KOps/s 26.2608 KOps/s $\color{#35bf28}+3.03\%$
test_unbind_pytree 0.1296ms 33.6816μs 29.6898 KOps/s 29.7211 KOps/s $\color{#d91a1a}-0.11\%$
test_unbind_td 0.4424ms 44.4304μs 22.5071 KOps/s 22.5704 KOps/s $\color{#d91a1a}-0.28\%$
test_split_pytree 0.1338ms 37.1370μs 26.9273 KOps/s 27.5545 KOps/s $\color{#d91a1a}-2.28\%$
test_split_td 0.4652ms 41.7717μs 23.9396 KOps/s 23.9644 KOps/s $\color{#d91a1a}-0.10\%$
test_add_pytree 67.0210μs 40.3713μs 24.7701 KOps/s 24.6371 KOps/s $\color{#35bf28}+0.54\%$
test_add_td 87.3110μs 51.0712μs 19.5805 KOps/s 17.9572 KOps/s $\textbf{\color{#35bf28}+9.04\%}$
test_distributed 2.2232ms 88.6635μs 11.2786 KOps/s 14.8511 KOps/s $\textbf{\color{#d91a1a}-24.06\%}$
test_tdmodule 35.8900μs 15.5720μs 64.2180 KOps/s 65.3039 KOps/s $\color{#d91a1a}-1.66\%$
test_tdmodule_dispatch 0.1154ms 30.7888μs 32.4793 KOps/s 32.9742 KOps/s $\color{#d91a1a}-1.50\%$
test_tdseq 42.9700μs 17.9109μs 55.8318 KOps/s 55.9178 KOps/s $\color{#d91a1a}-0.15\%$
test_tdseq_dispatch 63.8490μs 34.4121μs 29.0596 KOps/s 28.9391 KOps/s $\color{#35bf28}+0.42\%$
test_instantiation_functorch 1.7582ms 1.5754ms 634.7678 Ops/s 627.2571 Ops/s $\color{#35bf28}+1.20\%$
test_instantiation_td 1.7757ms 1.0982ms 910.6199 Ops/s 910.6748 Ops/s $-0.01\%$
test_exec_functorch 0.2297ms 0.1580ms 6.3285 KOps/s 6.2820 KOps/s $\color{#35bf28}+0.74\%$
test_exec_functional_call 0.2762ms 0.1479ms 6.7631 KOps/s 6.7110 KOps/s $\color{#35bf28}+0.78\%$
test_exec_td 0.2518ms 0.1460ms 6.8493 KOps/s 6.8392 KOps/s $\color{#35bf28}+0.15\%$
test_exec_td_decorator 0.5153ms 0.2203ms 4.5386 KOps/s 4.5225 KOps/s $\color{#35bf28}+0.36\%$
test_vmap_mlp_speed[True-True] 0.7750ms 0.6095ms 1.6406 KOps/s 1.6440 KOps/s $\color{#d91a1a}-0.21\%$
test_vmap_mlp_speed[True-False] 0.7481ms 0.6109ms 1.6371 KOps/s 1.6408 KOps/s $\color{#d91a1a}-0.23\%$
test_vmap_mlp_speed[False-True] 0.7208ms 0.5416ms 1.8464 KOps/s 1.8676 KOps/s $\color{#d91a1a}-1.13\%$
test_vmap_mlp_speed[False-False] 0.7395ms 0.5555ms 1.8003 KOps/s 1.8676 KOps/s $\color{#d91a1a}-3.60\%$
test_vmap_mlp_speed_decorator[True-True] 1.4679ms 0.6756ms 1.4802 KOps/s 1.4883 KOps/s $\color{#d91a1a}-0.54\%$
test_vmap_mlp_speed_decorator[True-False] 0.8335ms 0.6736ms 1.4846 KOps/s 1.4888 KOps/s $\color{#d91a1a}-0.28\%$
test_vmap_mlp_speed_decorator[False-True] 0.7829ms 0.6058ms 1.6507 KOps/s 1.6882 KOps/s $\color{#d91a1a}-2.22\%$
test_vmap_mlp_speed_decorator[False-False] 0.7601ms 0.6022ms 1.6607 KOps/s 1.6872 KOps/s $\color{#d91a1a}-1.57\%$
test_vmap_transformer_speed[True-True] 8.3279ms 8.0937ms 123.5526 Ops/s 123.5005 Ops/s $\color{#35bf28}+0.04\%$
test_vmap_transformer_speed[True-False] 8.2804ms 8.0884ms 123.6343 Ops/s 123.4115 Ops/s $\color{#35bf28}+0.18\%$
test_vmap_transformer_speed[False-True] 8.2538ms 8.0244ms 124.6204 Ops/s 124.3395 Ops/s $\color{#35bf28}+0.23\%$
test_vmap_transformer_speed[False-False] 8.3520ms 8.0291ms 124.5476 Ops/s 124.5580 Ops/s $-0.01\%$
test_vmap_transformer_speed_decorator[True-True] 20.4403ms 19.6568ms 50.8730 Ops/s 51.1538 Ops/s $\color{#d91a1a}-0.55\%$
test_vmap_transformer_speed_decorator[True-False] 19.7691ms 19.5514ms 51.1473 Ops/s 51.2480 Ops/s $\color{#d91a1a}-0.20\%$
test_vmap_transformer_speed_decorator[False-True] 20.5055ms 19.6029ms 51.0128 Ops/s 51.3648 Ops/s $\color{#d91a1a}-0.69\%$
test_vmap_transformer_speed_decorator[False-False] 20.1571ms 19.5361ms 51.1872 Ops/s 51.3293 Ops/s $\color{#d91a1a}-0.28\%$
test_to_module_speed[True] 2.1274ms 1.5760ms 634.5003 Ops/s 638.5452 Ops/s $\color{#d91a1a}-0.63\%$
test_to_module_speed[False] 1.6585ms 1.5411ms 648.8972 Ops/s 648.6792 Ops/s $\color{#35bf28}+0.03\%$
test_tc_init 49.9310μs 25.6962μs 38.9162 KOps/s 38.0081 KOps/s $\color{#35bf28}+2.39\%$
test_tc_init_nested 0.1810ms 51.8082μs 19.3020 KOps/s 18.1248 KOps/s $\textbf{\color{#35bf28}+6.49\%}$
test_tc_first_layer_tensor 1.8025μs 0.3638μs 2.7489 MOps/s 2.7396 MOps/s $\color{#35bf28}+0.34\%$
test_tc_first_layer_nontensor 1.7225μs 0.3917μs 2.5527 MOps/s 2.5212 MOps/s $\color{#35bf28}+1.25\%$
test_tc_second_layer_tensor 4.2182μs 0.9917μs 1.0084 MOps/s 988.0687 KOps/s $\color{#35bf28}+2.06\%$
test_tc_second_layer_nontensor 3.9018μs 0.8376μs 1.1939 MOps/s 1.1933 MOps/s $\color{#35bf28}+0.05\%$
test_unbind 5.4616ms 5.3136ms 188.1954 Ops/s 183.9971 Ops/s $\color{#35bf28}+2.28\%$
test_full_like 15.7717ms 13.9141ms 71.8693 Ops/s 74.1829 Ops/s $\color{#d91a1a}-3.12\%$
test_zeros_like 7.2870ms 7.0646ms 141.5518 Ops/s 126.9878 Ops/s $\textbf{\color{#35bf28}+11.47\%}$
test_ones_like 7.3901ms 7.0809ms 141.2256 Ops/s 126.2048 Ops/s $\textbf{\color{#35bf28}+11.90\%}$
test_clone 9.5250ms 8.9342ms 111.9297 Ops/s 103.8672 Ops/s $\textbf{\color{#35bf28}+7.76\%}$
test_squeeze 0.1467ms 11.1905μs 89.3615 KOps/s 87.9753 KOps/s $\color{#35bf28}+1.58\%$
test_unsqueeze 0.1915ms 66.8507μs 14.9587 KOps/s 15.5588 KOps/s $\color{#d91a1a}-3.86\%$
test_split 0.1680ms 0.1043ms 9.5898 KOps/s 9.3596 KOps/s $\color{#35bf28}+2.46\%$
test_permute 0.2014ms 0.1294ms 7.7292 KOps/s 7.5504 KOps/s $\color{#35bf28}+2.37\%$
test_stack 28.4294ms 27.6159ms 36.2110 Ops/s 36.1031 Ops/s $\color{#35bf28}+0.30\%$
test_cat 28.4572ms 27.6580ms 36.1559 Ops/s 35.6138 Ops/s $\color{#35bf28}+1.52\%$

@vmoens vmoens deleted the fix-decorator-inplace branch May 27, 2024 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants