Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] module hook fixes #673

Merged
merged 1 commit into from
Feb 10, 2024
Merged

[BugFix] module hook fixes #673

merged 1 commit into from
Feb 10, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 10, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 10, 2024
@vmoens vmoens added the bug Something isn't working label Feb 10, 2024
@vmoens vmoens merged commit 46eef3c into main Feb 10, 2024
24 of 33 checks passed
@vmoens vmoens deleted the hook-fix branch February 10, 2024 20:49
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 126. Improved: $\large\color{#35bf28}24$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 42.0190μs 15.7652μs 63.4307 KOps/s 58.9446 KOps/s $\textbf{\color{#35bf28}+7.61\%}$
test_plain_set_stack_nested 0.1768ms 0.1435ms 6.9678 KOps/s 6.9408 KOps/s $\color{#35bf28}+0.39\%$
test_plain_set_nested_inplace 55.7220μs 18.0598μs 55.3715 KOps/s 51.5138 KOps/s $\textbf{\color{#35bf28}+7.49\%}$
test_plain_set_stack_nested_inplace 0.3027ms 0.1776ms 5.6308 KOps/s 5.5897 KOps/s $\color{#35bf28}+0.74\%$
test_items 20.3480μs 2.5208μs 396.7058 KOps/s 397.2808 KOps/s $\color{#d91a1a}-0.14\%$
test_items_nested 0.4087ms 0.2728ms 3.6663 KOps/s 3.7352 KOps/s $\color{#d91a1a}-1.84\%$
test_items_nested_locked 0.4528ms 0.2714ms 3.6841 KOps/s 3.6942 KOps/s $\color{#d91a1a}-0.28\%$
test_items_nested_leaf 0.5617ms 0.1690ms 5.9171 KOps/s 5.9955 KOps/s $\color{#d91a1a}-1.31\%$
test_items_stack_nested 1.5757ms 1.3224ms 756.2284 Ops/s 760.4450 Ops/s $\color{#d91a1a}-0.55\%$
test_items_stack_nested_leaf 1.5288ms 1.1968ms 835.5718 Ops/s 846.1525 Ops/s $\color{#d91a1a}-1.25\%$
test_items_stack_nested_locked 1.1745ms 0.8944ms 1.1181 KOps/s 1.1429 KOps/s $\color{#d91a1a}-2.17\%$
test_keys 25.1870μs 3.8670μs 258.6002 KOps/s 260.5898 KOps/s $\color{#d91a1a}-0.76\%$
test_keys_nested 1.8425ms 0.1457ms 6.8645 KOps/s 6.7186 KOps/s $\color{#35bf28}+2.17\%$
test_keys_nested_locked 0.2913ms 0.1508ms 6.6309 KOps/s 6.6043 KOps/s $\color{#35bf28}+0.40\%$
test_keys_nested_leaf 0.2494ms 0.1289ms 7.7566 KOps/s 7.6776 KOps/s $\color{#35bf28}+1.03\%$
test_keys_stack_nested 1.8100ms 1.2967ms 771.1661 Ops/s 778.8856 Ops/s $\color{#d91a1a}-0.99\%$
test_keys_stack_nested_leaf 1.5317ms 1.2727ms 785.7141 Ops/s 792.0405 Ops/s $\color{#d91a1a}-0.80\%$
test_keys_stack_nested_locked 1.3023ms 0.8206ms 1.2186 KOps/s 1.2494 KOps/s $\color{#d91a1a}-2.46\%$
test_values 10.1290μs 1.1732μs 852.3939 KOps/s 857.6014 KOps/s $\color{#d91a1a}-0.61\%$
test_values_nested 0.1281ms 51.9470μs 19.2504 KOps/s 19.3097 KOps/s $\color{#d91a1a}-0.31\%$
test_values_nested_locked 0.1225ms 52.3612μs 19.0981 KOps/s 19.3199 KOps/s $\color{#d91a1a}-1.15\%$
test_values_nested_leaf 86.1710μs 46.4675μs 21.5204 KOps/s 21.4120 KOps/s $\color{#35bf28}+0.51\%$
test_values_stack_nested 1.6372ms 1.0288ms 972.0153 Ops/s 959.8000 Ops/s $\color{#35bf28}+1.27\%$
test_values_stack_nested_leaf 1.3031ms 1.0152ms 984.9862 Ops/s 969.7380 Ops/s $\color{#35bf28}+1.57\%$
test_values_stack_nested_locked 0.8109ms 0.6026ms 1.6596 KOps/s 1.6748 KOps/s $\color{#d91a1a}-0.91\%$
test_membership 13.6260μs 1.3430μs 744.6095 KOps/s 748.1500 KOps/s $\color{#d91a1a}-0.47\%$
test_membership_nested 52.9390μs 3.4537μs 289.5412 KOps/s 286.7943 KOps/s $\color{#35bf28}+0.96\%$
test_membership_nested_leaf 25.9690μs 3.4912μs 286.4384 KOps/s 288.8277 KOps/s $\color{#d91a1a}-0.83\%$
test_membership_stacked_nested 49.2320μs 11.8620μs 84.3031 KOps/s 83.7436 KOps/s $\color{#35bf28}+0.67\%$
test_membership_stacked_nested_leaf 40.3260μs 11.7948μs 84.7831 KOps/s 80.3773 KOps/s $\textbf{\color{#35bf28}+5.48\%}$
test_membership_nested_last 32.0300μs 6.6263μs 150.9145 KOps/s 151.4305 KOps/s $\color{#d91a1a}-0.34\%$
test_membership_nested_leaf_last 45.8660μs 6.6796μs 149.7086 KOps/s 150.8024 KOps/s $\color{#d91a1a}-0.73\%$
test_membership_stacked_nested_last 0.3971ms 0.1781ms 5.6140 KOps/s 5.6004 KOps/s $\color{#35bf28}+0.24\%$
test_membership_stacked_nested_leaf_last 43.9220μs 13.9652μs 71.6068 KOps/s 71.7731 KOps/s $\color{#d91a1a}-0.23\%$
test_nested_getleaf 47.4010μs 10.6819μs 93.6161 KOps/s 93.3619 KOps/s $\color{#35bf28}+0.27\%$
test_nested_get 36.5390μs 10.0942μs 99.0668 KOps/s 97.7826 KOps/s $\color{#35bf28}+1.31\%$
test_stacked_getleaf 0.4853ms 0.3984ms 2.5103 KOps/s 2.4899 KOps/s $\color{#35bf28}+0.82\%$
test_stacked_get 0.4914ms 0.3677ms 2.7198 KOps/s 2.7259 KOps/s $\color{#d91a1a}-0.23\%$
test_nested_getitemleaf 41.2970μs 12.1466μs 82.3274 KOps/s 81.5118 KOps/s $\color{#35bf28}+1.00\%$
test_nested_getitem 44.2130μs 11.5429μs 86.6330 KOps/s 84.8724 KOps/s $\color{#35bf28}+2.07\%$
test_stacked_getitemleaf 0.5726ms 0.4006ms 2.4965 KOps/s 2.4250 KOps/s $\color{#35bf28}+2.95\%$
test_stacked_getitem 0.5747ms 0.3686ms 2.7133 KOps/s 2.6876 KOps/s $\color{#35bf28}+0.96\%$
test_lock_nested 2.8698ms 0.3410ms 2.9326 KOps/s 2.9462 KOps/s $\color{#d91a1a}-0.46\%$
test_lock_stack_nested 90.4692ms 6.1682ms 162.1222 Ops/s 164.1127 Ops/s $\color{#d91a1a}-1.21\%$
test_unlock_nested 73.1753ms 0.4104ms 2.4369 KOps/s 2.9364 KOps/s $\textbf{\color{#d91a1a}-17.01\%}$
test_unlock_stack_nested 95.6620ms 6.5056ms 153.7146 Ops/s 157.4749 Ops/s $\color{#d91a1a}-2.39\%$
test_flatten_speed 0.7384ms 0.3648ms 2.7414 KOps/s 2.7004 KOps/s $\color{#35bf28}+1.52\%$
test_unflatten_speed 0.6869ms 0.4676ms 2.1385 KOps/s 2.1342 KOps/s $\color{#35bf28}+0.20\%$
test_common_ops 4.7164ms 0.6360ms 1.5723 KOps/s 1.4119 KOps/s $\textbf{\color{#35bf28}+11.36\%}$
test_creation 57.0270μs 1.8459μs 541.7473 KOps/s 525.4360 KOps/s $\color{#35bf28}+3.10\%$
test_creation_empty 53.3130μs 7.4728μs 133.8180 KOps/s 102.2521 KOps/s $\textbf{\color{#35bf28}+30.87\%}$
test_creation_nested_1 43.8920μs 10.1481μs 98.5406 KOps/s 80.0932 KOps/s $\textbf{\color{#35bf28}+23.03\%}$
test_creation_nested_2 52.9000μs 13.1972μs 75.7739 KOps/s 63.2706 KOps/s $\textbf{\color{#35bf28}+19.76\%}$
test_clone 67.2460μs 13.0992μs 76.3408 KOps/s 77.0425 KOps/s $\color{#d91a1a}-0.91\%$
test_getitem[int] 37.2300μs 10.9493μs 91.3303 KOps/s 90.5900 KOps/s $\color{#35bf28}+0.82\%$
test_getitem[slice_int] 90.6000μs 22.1782μs 45.0893 KOps/s 44.1969 KOps/s $\color{#35bf28}+2.02\%$
test_getitem[range] 0.1497ms 43.2827μs 23.1039 KOps/s 23.7130 KOps/s $\color{#d91a1a}-2.57\%$
test_getitem[tuple] 0.1519ms 18.5274μs 53.9741 KOps/s 54.5847 KOps/s $\color{#d91a1a}-1.12\%$
test_getitem[list] 0.2663ms 39.0709μs 25.5945 KOps/s 26.9076 KOps/s $\color{#d91a1a}-4.88\%$
test_setitem_dim[int] 55.5640μs 27.2390μs 36.7121 KOps/s 33.2036 KOps/s $\textbf{\color{#35bf28}+10.57\%}$
test_setitem_dim[slice_int] 98.4950μs 52.0555μs 19.2103 KOps/s 17.7964 KOps/s $\textbf{\color{#35bf28}+7.94\%}$
test_setitem_dim[range] 0.1216ms 74.4592μs 13.4302 KOps/s 12.9155 KOps/s $\color{#35bf28}+3.99\%$
test_setitem_dim[tuple] 76.3730μs 41.4207μs 24.1425 KOps/s 22.5213 KOps/s $\textbf{\color{#35bf28}+7.20\%}$
test_setitem 91.8620μs 18.3026μs 54.6372 KOps/s 50.4158 KOps/s $\textbf{\color{#35bf28}+8.37\%}$
test_set 90.9710μs 17.7179μs 56.4400 KOps/s 52.6239 KOps/s $\textbf{\color{#35bf28}+7.25\%}$
test_set_shared 1.8696ms 0.1399ms 7.1455 KOps/s 6.9445 KOps/s $\color{#35bf28}+2.89\%$
test_update 0.1291ms 19.0160μs 52.5873 KOps/s 46.2622 KOps/s $\textbf{\color{#35bf28}+13.67\%}$
test_update_nested 0.1077ms 26.6164μs 37.5709 KOps/s 33.9998 KOps/s $\textbf{\color{#35bf28}+10.50\%}$
test_set_nested 0.1126ms 19.7206μs 50.7085 KOps/s 46.8162 KOps/s $\textbf{\color{#35bf28}+8.31\%}$
test_set_nested_new 0.1036ms 23.4728μs 42.6025 KOps/s 39.7159 KOps/s $\textbf{\color{#35bf28}+7.27\%}$
test_select 0.1488ms 37.0931μs 26.9592 KOps/s 26.2019 KOps/s $\color{#35bf28}+2.89\%$
test_select_nested 0.1288ms 58.9968μs 16.9501 KOps/s 17.4206 KOps/s $\color{#d91a1a}-2.70\%$
test_exclude_nested 0.2156ms 0.1178ms 8.4856 KOps/s 8.6290 KOps/s $\color{#d91a1a}-1.66\%$
test_empty[True] 5.5056ms 0.4266ms 2.3443 KOps/s 2.4254 KOps/s $\color{#d91a1a}-3.34\%$
test_empty[False] 7.9950μs 1.0650μs 938.9936 KOps/s 954.4091 KOps/s $\color{#d91a1a}-1.62\%$
test_unbind_speed 0.3114ms 0.2459ms 4.0665 KOps/s 4.0522 KOps/s $\color{#35bf28}+0.35\%$
test_unbind_speed_stack0 87.2141ms 3.5241ms 283.7575 Ops/s 306.3701 Ops/s $\textbf{\color{#d91a1a}-7.38\%}$
test_unbind_speed_stack1 35.7660μs 1.9438μs 514.4438 KOps/s 520.2641 KOps/s $\color{#d91a1a}-1.12\%$
test_split 1.5522ms 1.4618ms 684.0675 Ops/s 602.5273 Ops/s $\textbf{\color{#35bf28}+13.53\%}$
test_chunk 79.9509ms 1.5775ms 633.9334 Ops/s 632.4532 Ops/s $\color{#35bf28}+0.23\%$
test_creation[device0] 3.8061ms 0.1047ms 9.5470 KOps/s 9.8014 KOps/s $\color{#d91a1a}-2.60\%$
test_creation_from_tensor 0.2611ms 81.9975μs 12.1955 KOps/s 11.8303 KOps/s $\color{#35bf28}+3.09\%$
test_add_one[memmap_tensor0] 0.3257ms 5.4759μs 182.6197 KOps/s 185.4720 KOps/s $\color{#d91a1a}-1.54\%$
test_contiguous[memmap_tensor0] 31.1390μs 0.6361μs 1.5720 MOps/s 1.5109 MOps/s $\color{#35bf28}+4.05\%$
test_stack[memmap_tensor0] 53.7210μs 3.6971μs 270.4808 KOps/s 279.0552 KOps/s $\color{#d91a1a}-3.07\%$
test_memmaptd_index 0.9950ms 0.2380ms 4.2010 KOps/s 4.1996 KOps/s $\color{#35bf28}+0.03\%$
test_memmaptd_index_astensor 0.6976ms 0.3014ms 3.3180 KOps/s 3.3474 KOps/s $\color{#d91a1a}-0.88\%$
test_memmaptd_index_op 1.0165ms 0.5632ms 1.7757 KOps/s 1.7117 KOps/s $\color{#35bf28}+3.74\%$
test_serialize_model 0.1902s 0.1090s 9.1706 Ops/s 8.6425 Ops/s $\textbf{\color{#35bf28}+6.11\%}$
test_serialize_model_pickle 0.4497s 0.3751s 2.6656 Ops/s 2.6084 Ops/s $\color{#35bf28}+2.19\%$
test_serialize_weights 0.1045s 0.1009s 9.9120 Ops/s 9.7175 Ops/s $\color{#35bf28}+2.00\%$
test_serialize_weights_returnearly 0.2066s 0.1328s 7.5285 Ops/s 7.8763 Ops/s $\color{#d91a1a}-4.42\%$
test_serialize_weights_pickle 1.0403s 0.6088s 1.6426 Ops/s 2.3752 Ops/s $\textbf{\color{#d91a1a}-30.84\%}$
test_serialize_weights_filesystem 99.7693ms 94.0694ms 10.6304 Ops/s 9.5055 Ops/s $\textbf{\color{#35bf28}+11.84\%}$
test_serialize_model_filesystem 0.1813s 0.1055s 9.4751 Ops/s 10.3755 Ops/s $\textbf{\color{#d91a1a}-8.68\%}$
test_reshape_pytree 60.3430μs 20.6303μs 48.4723 KOps/s 48.3882 KOps/s $\color{#35bf28}+0.17\%$
test_reshape_td 66.9660μs 30.4078μs 32.8863 KOps/s 31.9888 KOps/s $\color{#35bf28}+2.81\%$
test_view_pytree 75.4510μs 20.8062μs 48.0625 KOps/s 48.0016 KOps/s $\color{#35bf28}+0.13\%$
test_view_td 91.1384ms 11.9999μs 83.3342 KOps/s 81.4289 KOps/s $\color{#35bf28}+2.34\%$
test_unbind_pytree 58.4790μs 24.1396μs 41.4257 KOps/s 41.4894 KOps/s $\color{#d91a1a}-0.15\%$
test_unbind_td 0.5272ms 36.2814μs 27.5623 KOps/s 28.1909 KOps/s $\color{#d91a1a}-2.23\%$
test_split_pytree 60.2530μs 23.8558μs 41.9186 KOps/s 42.2923 KOps/s $\color{#d91a1a}-0.88\%$
test_split_td 0.1259ms 39.3891μs 25.3877 KOps/s 24.9938 KOps/s $\color{#35bf28}+1.58\%$
test_add_pytree 76.3630μs 30.0275μs 33.3028 KOps/s 34.2791 KOps/s $\color{#d91a1a}-2.85\%$
test_add_td 0.1136ms 48.4701μs 20.6313 KOps/s 19.7707 KOps/s $\color{#35bf28}+4.35\%$
test_distributed 0.2308ms 99.5828μs 10.0419 KOps/s 9.7831 KOps/s $\color{#35bf28}+2.65\%$
test_tdmodule 0.1853ms 20.9845μs 47.6543 KOps/s 44.9547 KOps/s $\textbf{\color{#35bf28}+6.01\%}$
test_tdmodule_dispatch 0.2280ms 41.2342μs 24.2517 KOps/s 23.0311 KOps/s $\textbf{\color{#35bf28}+5.30\%}$
test_tdseq 0.1154ms 23.6662μs 42.2544 KOps/s 39.8439 KOps/s $\textbf{\color{#35bf28}+6.05\%}$
test_tdseq_dispatch 0.1486ms 44.7098μs 22.3664 KOps/s 20.8719 KOps/s $\textbf{\color{#35bf28}+7.16\%}$
test_instantiation_functorch 1.7639ms 1.3131ms 761.5525 Ops/s 772.1585 Ops/s $\color{#d91a1a}-1.37\%$
test_instantiation_td 1.6631ms 1.0093ms 990.8121 Ops/s 993.3930 Ops/s $\color{#d91a1a}-0.26\%$
test_exec_functorch 0.3474ms 0.1622ms 6.1668 KOps/s 6.3665 KOps/s $\color{#d91a1a}-3.14\%$
test_exec_functional_call 0.2758ms 0.1472ms 6.7928 KOps/s 6.6334 KOps/s $\color{#35bf28}+2.40\%$
test_exec_td 0.2733ms 0.1459ms 6.8558 KOps/s 6.8757 KOps/s $\color{#d91a1a}-0.29\%$
test_exec_td_decorator 0.6677ms 0.1763ms 5.6720 KOps/s 5.7268 KOps/s $\color{#d91a1a}-0.96\%$
test_vmap_mlp_speed[True-True] 1.2028ms 0.8747ms 1.1432 KOps/s 1.1124 KOps/s $\color{#35bf28}+2.77\%$
test_vmap_mlp_speed[True-False] 0.7193ms 0.4673ms 2.1397 KOps/s 2.1372 KOps/s $\color{#35bf28}+0.12\%$
test_vmap_mlp_speed[False-True] 1.1864ms 0.7684ms 1.3014 KOps/s 1.2818 KOps/s $\color{#35bf28}+1.53\%$
test_vmap_mlp_speed[False-False] 0.5908ms 0.3907ms 2.5596 KOps/s 2.5788 KOps/s $\color{#d91a1a}-0.74\%$
test_vmap_mlp_speed_decorator[True-True] 1.7724ms 1.5179ms 658.8224 Ops/s 621.8177 Ops/s $\textbf{\color{#35bf28}+5.95\%}$
test_vmap_mlp_speed_decorator[True-False] 1.0903ms 0.5128ms 1.9502 KOps/s 1.9313 KOps/s $\color{#35bf28}+0.98\%$
test_vmap_mlp_speed_decorator[False-True] 1.6268ms 1.2699ms 787.4415 Ops/s 759.7174 Ops/s $\color{#35bf28}+3.65\%$
test_vmap_mlp_speed_decorator[False-False] 0.7088ms 0.4003ms 2.4981 KOps/s 2.5141 KOps/s $\color{#d91a1a}-0.64\%$
test_to_module_speed[True] 1.3205ms 1.1276ms 886.8062 Ops/s 897.5916 Ops/s $\color{#d91a1a}-1.20\%$
test_to_module_speed[False] 1.2194ms 1.1096ms 901.2592 Ops/s 906.9225 Ops/s $\color{#d91a1a}-0.62\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 134. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 80.9020μs 13.9109μs 71.8859 KOps/s 75.7444 KOps/s $\textbf{\color{#d91a1a}-5.09\%}$
test_plain_set_stack_nested 0.1681ms 0.1204ms 8.3069 KOps/s 8.3351 KOps/s $\color{#d91a1a}-0.34\%$
test_plain_set_nested_inplace 40.2200μs 15.4664μs 64.6564 KOps/s 68.6088 KOps/s $\textbf{\color{#d91a1a}-5.76\%}$
test_plain_set_stack_nested_inplace 0.1897ms 0.1492ms 6.7023 KOps/s 6.6858 KOps/s $\color{#35bf28}+0.25\%$
test_items 22.8000μs 4.6953μs 212.9783 KOps/s 211.1333 KOps/s $\color{#35bf28}+0.87\%$
test_items_nested 0.4623ms 0.3366ms 2.9709 KOps/s 2.9321 KOps/s $\color{#35bf28}+1.32\%$
test_items_nested_locked 0.3608ms 0.3401ms 2.9404 KOps/s 2.8936 KOps/s $\color{#35bf28}+1.62\%$
test_items_nested_leaf 0.2189ms 0.1988ms 5.0293 KOps/s 4.9419 KOps/s $\color{#35bf28}+1.77\%$
test_items_stack_nested 1.3609ms 1.3118ms 762.3059 Ops/s 754.1162 Ops/s $\color{#35bf28}+1.09\%$
test_items_stack_nested_leaf 1.2209ms 1.1597ms 862.2663 Ops/s 861.1480 Ops/s $\color{#35bf28}+0.13\%$
test_items_stack_nested_locked 0.9674ms 0.9124ms 1.0960 KOps/s 1.0925 KOps/s $\color{#35bf28}+0.32\%$
test_keys 22.6400μs 4.5422μs 220.1581 KOps/s 219.3631 KOps/s $\color{#35bf28}+0.36\%$
test_keys_nested 0.9047ms 93.5829μs 10.6857 KOps/s 10.6455 KOps/s $\color{#35bf28}+0.38\%$
test_keys_nested_locked 0.1275ms 96.4318μs 10.3700 KOps/s 10.2576 KOps/s $\color{#35bf28}+1.10\%$
test_keys_nested_leaf 0.1852ms 77.1421μs 12.9631 KOps/s 12.8127 KOps/s $\color{#35bf28}+1.17\%$
test_keys_stack_nested 1.2075ms 1.1547ms 866.0417 Ops/s 858.6120 Ops/s $\color{#35bf28}+0.87\%$
test_keys_stack_nested_leaf 1.6161ms 1.1429ms 874.9491 Ops/s 871.7281 Ops/s $\color{#35bf28}+0.37\%$
test_keys_stack_nested_locked 0.7824ms 0.7293ms 1.3712 KOps/s 1.3633 KOps/s $\color{#35bf28}+0.58\%$
test_values 8.8733μs 1.8757μs 533.1316 KOps/s 527.6168 KOps/s $\color{#35bf28}+1.05\%$
test_values_nested 66.8310μs 44.8424μs 22.3003 KOps/s 21.7436 KOps/s $\color{#35bf28}+2.56\%$
test_values_nested_locked 67.5110μs 47.1274μs 21.2191 KOps/s 20.8665 KOps/s $\color{#35bf28}+1.69\%$
test_values_nested_leaf 59.7810μs 39.3502μs 25.4128 KOps/s 24.9822 KOps/s $\color{#35bf28}+1.72\%$
test_values_stack_nested 1.0136ms 0.9548ms 1.0473 KOps/s 1.0450 KOps/s $\color{#35bf28}+0.22\%$
test_values_stack_nested_leaf 1.0241ms 0.9520ms 1.0504 KOps/s 1.0415 KOps/s $\color{#35bf28}+0.86\%$
test_values_stack_nested_locked 0.6339ms 0.5713ms 1.7503 KOps/s 1.6906 KOps/s $\color{#35bf28}+3.53\%$
test_membership 5.0680μs 0.9434μs 1.0600 MOps/s 949.1530 KOps/s $\textbf{\color{#35bf28}+11.68\%}$
test_membership_nested 32.3000μs 2.8448μs 351.5230 KOps/s 340.7262 KOps/s $\color{#35bf28}+3.17\%$
test_membership_nested_leaf 21.8200μs 2.8678μs 348.6937 KOps/s 339.3515 KOps/s $\color{#35bf28}+2.75\%$
test_membership_stacked_nested 46.1800μs 11.2208μs 89.1205 KOps/s 86.8177 KOps/s $\color{#35bf28}+2.65\%$
test_membership_stacked_nested_leaf 29.5300μs 11.3708μs 87.9448 KOps/s 86.7293 KOps/s $\color{#35bf28}+1.40\%$
test_membership_nested_last 23.3400μs 5.3075μs 188.4144 KOps/s 187.1172 KOps/s $\color{#35bf28}+0.69\%$
test_membership_nested_leaf_last 30.2300μs 5.2905μs 189.0176 KOps/s 188.9033 KOps/s $\color{#35bf28}+0.06\%$
test_membership_stacked_nested_last 0.1797ms 0.1555ms 6.4308 KOps/s 6.3986 KOps/s $\color{#35bf28}+0.50\%$
test_membership_stacked_nested_leaf_last 48.9510μs 13.1683μs 75.9400 KOps/s 74.9850 KOps/s $\color{#35bf28}+1.27\%$
test_nested_getleaf 35.0510μs 8.3641μs 119.5587 KOps/s 117.9652 KOps/s $\color{#35bf28}+1.35\%$
test_nested_get 31.1310μs 7.9238μs 126.2021 KOps/s 125.3103 KOps/s $\color{#35bf28}+0.71\%$
test_stacked_getleaf 0.3933ms 0.3254ms 3.0730 KOps/s 3.0782 KOps/s $\color{#d91a1a}-0.17\%$
test_stacked_get 0.3402ms 0.2926ms 3.4180 KOps/s 3.4102 KOps/s $\color{#35bf28}+0.23\%$
test_nested_getitemleaf 29.2000μs 9.7762μs 102.2894 KOps/s 101.6367 KOps/s $\color{#35bf28}+0.64\%$
test_nested_getitem 30.2100μs 9.2968μs 107.5637 KOps/s 106.1167 KOps/s $\color{#35bf28}+1.36\%$
test_stacked_getitemleaf 0.3562ms 0.3307ms 3.0240 KOps/s 3.0291 KOps/s $\color{#d91a1a}-0.17\%$
test_stacked_getitem 0.3449ms 0.2962ms 3.3760 KOps/s 3.3710 KOps/s $\color{#35bf28}+0.15\%$
test_lock_nested 1.2536ms 0.3507ms 2.8517 KOps/s 2.2245 KOps/s $\textbf{\color{#35bf28}+28.19\%}$
test_lock_stack_nested 0.1050s 6.7482ms 148.1868 Ops/s 148.1417 Ops/s $\color{#35bf28}+0.03\%$
test_unlock_nested 92.1366ms 0.4437ms 2.2537 KOps/s 2.2483 KOps/s $\color{#35bf28}+0.24\%$
test_unlock_stack_nested 0.1100s 6.8357ms 146.2914 Ops/s 147.3188 Ops/s $\color{#d91a1a}-0.70\%$
test_flatten_speed 0.3467ms 0.2571ms 3.8893 KOps/s 3.8346 KOps/s $\color{#35bf28}+1.43\%$
test_unflatten_speed 0.3896ms 0.3589ms 2.7864 KOps/s 2.7807 KOps/s $\color{#35bf28}+0.21\%$
test_common_ops 1.0647ms 0.6040ms 1.6558 KOps/s 1.6602 KOps/s $\color{#d91a1a}-0.27\%$
test_creation 29.4700μs 1.5391μs 649.7378 KOps/s 636.7428 KOps/s $\color{#35bf28}+2.04\%$
test_creation_empty 27.1110μs 8.7009μs 114.9312 KOps/s 133.4783 KOps/s $\textbf{\color{#d91a1a}-13.90\%}$
test_creation_nested_1 35.8000μs 10.4479μs 95.7128 KOps/s 109.0621 KOps/s $\textbf{\color{#d91a1a}-12.24\%}$
test_creation_nested_2 38.3510μs 12.7927μs 78.1698 KOps/s 86.9446 KOps/s $\textbf{\color{#d91a1a}-10.09\%}$
test_clone 54.5910μs 13.5937μs 73.5633 KOps/s 68.8378 KOps/s $\textbf{\color{#35bf28}+6.86\%}$
test_getitem[int] 36.4700μs 11.0848μs 90.2137 KOps/s 92.6769 KOps/s $\color{#d91a1a}-2.66\%$
test_getitem[slice_int] 79.4410μs 21.0022μs 47.6141 KOps/s 46.4012 KOps/s $\color{#35bf28}+2.61\%$
test_getitem[range] 60.1910μs 39.0270μs 25.6233 KOps/s 25.2264 KOps/s $\color{#35bf28}+1.57\%$
test_getitem[tuple] 55.8310μs 18.4498μs 54.2012 KOps/s 52.9683 KOps/s $\color{#35bf28}+2.33\%$
test_getitem[list] 0.2022ms 36.2306μs 27.6010 KOps/s 27.2107 KOps/s $\color{#35bf28}+1.43\%$
test_setitem_dim[int] 43.4410μs 27.2458μs 36.7029 KOps/s 37.6165 KOps/s $\color{#d91a1a}-2.43\%$
test_setitem_dim[slice_int] 73.1910μs 48.7419μs 20.5162 KOps/s 20.6236 KOps/s $\color{#d91a1a}-0.52\%$
test_setitem_dim[range] 0.1027ms 67.1082μs 14.9013 KOps/s 15.2181 KOps/s $\color{#d91a1a}-2.08\%$
test_setitem_dim[tuple] 60.5710μs 42.8722μs 23.3252 KOps/s 24.1658 KOps/s $\color{#d91a1a}-3.48\%$
test_setitem 77.8020μs 18.5147μs 54.0110 KOps/s 53.2603 KOps/s $\color{#35bf28}+1.41\%$
test_set 62.4210μs 18.1356μs 55.1402 KOps/s 54.6214 KOps/s $\color{#35bf28}+0.95\%$
test_set_shared 2.9593ms 0.1027ms 9.7415 KOps/s 9.5014 KOps/s $\color{#35bf28}+2.53\%$
test_update 87.5710μs 20.6628μs 48.3961 KOps/s 49.1553 KOps/s $\color{#d91a1a}-1.54\%$
test_update_nested 81.7810μs 27.0426μs 36.9786 KOps/s 37.1258 KOps/s $\color{#d91a1a}-0.40\%$
test_set_nested 75.1120μs 19.2660μs 51.9050 KOps/s 50.5507 KOps/s $\color{#35bf28}+2.68\%$
test_set_nested_new 67.3810μs 21.7880μs 45.8969 KOps/s 44.9312 KOps/s $\color{#35bf28}+2.15\%$
test_select 79.4410μs 34.3415μs 29.1193 KOps/s 27.4313 KOps/s $\textbf{\color{#35bf28}+6.15\%}$
test_select_nested 69.3710μs 52.7503μs 18.9572 KOps/s 19.0091 KOps/s $\color{#d91a1a}-0.27\%$
test_exclude_nested 0.1408ms 0.1135ms 8.8139 KOps/s 8.9363 KOps/s $\color{#d91a1a}-1.37\%$
test_empty[True] 0.4210ms 0.3862ms 2.5894 KOps/s 2.6260 KOps/s $\color{#d91a1a}-1.40\%$
test_empty[False] 2.2691μs 0.8857μs 1.1291 MOps/s 1.1734 MOps/s $\color{#d91a1a}-3.78\%$
test_to 77.3020μs 55.2830μs 18.0887 KOps/s 17.8485 KOps/s $\color{#35bf28}+1.35\%$
test_to_nonblocking 61.4210μs 33.9444μs 29.4599 KOps/s 28.3947 KOps/s $\color{#35bf28}+3.75\%$
test_unbind_speed 0.3050ms 0.2678ms 3.7345 KOps/s 3.6426 KOps/s $\color{#35bf28}+2.52\%$
test_unbind_speed_stack0 91.5055ms 3.4130ms 292.9946 Ops/s 229.5041 Ops/s $\textbf{\color{#35bf28}+27.66\%}$
test_unbind_speed_stack1 18.2300μs 1.8197μs 549.5267 KOps/s 541.7269 KOps/s $\color{#35bf28}+1.44\%$
test_split 88.7273ms 1.7593ms 568.3983 Ops/s 636.4226 Ops/s $\textbf{\color{#d91a1a}-10.69\%}$
test_chunk 90.7286ms 1.6928ms 590.7310 Ops/s 586.7734 Ops/s $\color{#35bf28}+0.67\%$
test_creation[device0] 0.1369ms 73.0671μs 13.6860 KOps/s 13.2431 KOps/s $\color{#35bf28}+3.34\%$
test_creation_from_tensor 0.1292ms 56.8854μs 17.5792 KOps/s 17.5592 KOps/s $\color{#35bf28}+0.11\%$
test_add_one[memmap_tensor0] 0.1230ms 7.1171μs 140.5073 KOps/s 135.4086 KOps/s $\color{#35bf28}+3.77\%$
test_contiguous[memmap_tensor0] 24.6110μs 0.6638μs 1.5065 MOps/s 1.5412 MOps/s $\color{#d91a1a}-2.25\%$
test_stack[memmap_tensor0] 29.4400μs 4.6450μs 215.2870 KOps/s 214.2582 KOps/s $\color{#35bf28}+0.48\%$
test_memmaptd_index 1.1774ms 0.2620ms 3.8168 KOps/s 3.8279 KOps/s $\color{#d91a1a}-0.29\%$
test_memmaptd_index_astensor 0.5720ms 0.3198ms 3.1266 KOps/s 3.1337 KOps/s $\color{#d91a1a}-0.23\%$
test_memmaptd_index_op 0.9223ms 0.6360ms 1.5722 KOps/s 1.6170 KOps/s $\color{#d91a1a}-2.77\%$
test_serialize_model 92.7156ms 89.2449ms 11.2051 Ops/s 9.4409 Ops/s $\textbf{\color{#35bf28}+18.69\%}$
test_serialize_model_pickle 1.3491s 1.2360s 0.8091 Ops/s 0.8066 Ops/s $\color{#35bf28}+0.30\%$
test_serialize_weights 89.5314ms 86.6980ms 11.5343 Ops/s 10.9571 Ops/s $\textbf{\color{#35bf28}+5.27\%}$
test_serialize_weights_returnearly 63.0392ms 57.6493ms 17.3463 Ops/s 13.4533 Ops/s $\textbf{\color{#35bf28}+28.94\%}$
test_serialize_weights_pickle 1.3464s 1.2488s 0.8007 Ops/s 0.8084 Ops/s $\color{#d91a1a}-0.94\%$
test_reshape_pytree 60.1310μs 24.5485μs 40.7357 KOps/s 40.0021 KOps/s $\color{#35bf28}+1.83\%$
test_reshape_td 0.2445ms 30.8698μs 32.3941 KOps/s 31.5649 KOps/s $\color{#35bf28}+2.63\%$
test_view_pytree 0.1109ms 24.2022μs 41.3186 KOps/s 40.7737 KOps/s $\color{#35bf28}+1.34\%$
test_view_td 90.8999ms 10.9241μs 91.5406 KOps/s 88.8892 KOps/s $\color{#35bf28}+2.98\%$
test_unbind_pytree 73.0010μs 30.4970μs 32.7901 KOps/s 32.2662 KOps/s $\color{#35bf28}+1.62\%$
test_unbind_td 0.2499ms 40.2070μs 24.8713 KOps/s 23.2382 KOps/s $\textbf{\color{#35bf28}+7.03\%}$
test_split_pytree 54.4810μs 28.9178μs 34.5808 KOps/s 33.9613 KOps/s $\color{#35bf28}+1.82\%$
test_split_td 0.1851ms 40.2710μs 24.8317 KOps/s 25.1467 KOps/s $\color{#d91a1a}-1.25\%$
test_add_pytree 0.2639ms 36.2707μs 27.5705 KOps/s 26.8307 KOps/s $\color{#35bf28}+2.76\%$
test_add_td 0.1429ms 51.2575μs 19.5094 KOps/s 19.8191 KOps/s $\color{#d91a1a}-1.56\%$
test_distributed 3.3148ms 92.0640μs 10.8620 KOps/s 14.3490 KOps/s $\textbf{\color{#d91a1a}-24.30\%}$
test_tdmodule 41.5400μs 17.9207μs 55.8015 KOps/s 57.6995 KOps/s $\color{#d91a1a}-3.29\%$
test_tdmodule_dispatch 0.1293ms 36.8499μs 27.1371 KOps/s 27.9376 KOps/s $\color{#d91a1a}-2.87\%$
test_tdseq 39.9810μs 20.8310μs 48.0053 KOps/s 49.2878 KOps/s $\color{#d91a1a}-2.60\%$
test_tdseq_dispatch 59.5310μs 39.4418μs 25.3538 KOps/s 26.3519 KOps/s $\color{#d91a1a}-3.79\%$
test_instantiation_functorch 1.7508ms 1.6553ms 604.1257 Ops/s 600.4161 Ops/s $\color{#35bf28}+0.62\%$
test_instantiation_td 1.6809ms 1.1481ms 871.0135 Ops/s 863.8144 Ops/s $\color{#35bf28}+0.83\%$
test_exec_functorch 0.1829ms 0.1596ms 6.2650 KOps/s 6.1022 KOps/s $\color{#35bf28}+2.67\%$
test_exec_functional_call 0.2122ms 0.1586ms 6.3036 KOps/s 6.0949 KOps/s $\color{#35bf28}+3.42\%$
test_exec_td 0.1876ms 0.1504ms 6.6481 KOps/s 6.4619 KOps/s $\color{#35bf28}+2.88\%$
test_exec_td_decorator 0.7176ms 0.1805ms 5.5416 KOps/s 5.4933 KOps/s $\color{#35bf28}+0.88\%$
test_vmap_mlp_speed[True-True] 1.1001ms 1.0300ms 970.8623 Ops/s 965.8333 Ops/s $\color{#35bf28}+0.52\%$
test_vmap_mlp_speed[True-False] 0.6382ms 0.5965ms 1.6764 KOps/s 1.6777 KOps/s $\color{#d91a1a}-0.08\%$
test_vmap_mlp_speed[False-True] 1.0248ms 0.9485ms 1.0543 KOps/s 1.0499 KOps/s $\color{#35bf28}+0.42\%$
test_vmap_mlp_speed[False-False] 0.5884ms 0.5243ms 1.9072 KOps/s 1.8960 KOps/s $\color{#35bf28}+0.59\%$
test_vmap_mlp_speed_decorator[True-True] 2.2536ms 1.7965ms 556.6461 Ops/s 556.8608 Ops/s $\color{#d91a1a}-0.04\%$
test_vmap_mlp_speed_decorator[True-False] 0.7468ms 0.6294ms 1.5889 KOps/s 1.5871 KOps/s $\color{#35bf28}+0.11\%$
test_vmap_mlp_speed_decorator[False-True] 1.6710ms 1.5592ms 641.3376 Ops/s 635.5100 Ops/s $\color{#35bf28}+0.92\%$
test_vmap_mlp_speed_decorator[False-False] 0.8851ms 0.5329ms 1.8765 KOps/s 1.8557 KOps/s $\color{#35bf28}+1.12\%$
test_vmap_transformer_speed[True-True] 12.3059ms 12.1244ms 82.4783 Ops/s 81.6351 Ops/s $\color{#35bf28}+1.03\%$
test_vmap_transformer_speed[True-False] 8.9001ms 8.0021ms 124.9670 Ops/s 118.8509 Ops/s $\textbf{\color{#35bf28}+5.15\%}$
test_vmap_transformer_speed[False-True] 12.2586ms 12.0922ms 82.6978 Ops/s 80.1971 Ops/s $\color{#35bf28}+3.12\%$
test_vmap_transformer_speed[False-False] 8.0967ms 7.9344ms 126.0334 Ops/s 122.0384 Ops/s $\color{#35bf28}+3.27\%$
test_vmap_transformer_speed_decorator[True-True] 59.5661ms 58.6791ms 17.0418 Ops/s 16.4743 Ops/s $\color{#35bf28}+3.44\%$
test_vmap_transformer_speed_decorator[True-False] 20.0422ms 19.6783ms 50.8174 Ops/s 51.2578 Ops/s $\color{#d91a1a}-0.86\%$
test_vmap_transformer_speed_decorator[False-True] 55.0670ms 53.6974ms 18.6229 Ops/s 18.1505 Ops/s $\color{#35bf28}+2.60\%$
test_vmap_transformer_speed_decorator[False-False] 19.5261ms 18.8790ms 52.9688 Ops/s 52.1727 Ops/s $\color{#35bf28}+1.53\%$
test_to_module_speed[True] 0.1362s 1.1525ms 867.6957 Ops/s 968.0075 Ops/s $\textbf{\color{#d91a1a}-10.36\%}$
test_to_module_speed[False] 1.6017ms 0.9757ms 1.0249 KOps/s 1.0152 KOps/s $\color{#35bf28}+0.95\%$

vmoens added a commit that referenced this pull request Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants