Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Faster to_module #575

Merged
merged 1 commit into from
Nov 24, 2023
Merged

[Performance] Faster to_module #575

merged 1 commit into from
Nov 24, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Nov 24, 2023

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 24, 2023
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 113. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 37.1490μs 16.0992μs 62.1148 KOps/s 62.1769 KOps/s $\color{#d91a1a}-0.10\%$
test_plain_set_stack_nested 0.2536ms 0.1454ms 6.8769 KOps/s 6.8277 KOps/s $\color{#35bf28}+0.72\%$
test_plain_set_nested_inplace 46.0060μs 19.3833μs 51.5909 KOps/s 52.1739 KOps/s $\color{#d91a1a}-1.12\%$
test_plain_set_stack_nested_inplace 0.3055ms 0.1771ms 5.6462 KOps/s 5.6618 KOps/s $\color{#d91a1a}-0.28\%$
test_items 0.1294ms 2.6046μs 383.9299 KOps/s 355.0682 KOps/s $\textbf{\color{#35bf28}+8.13\%}$
test_items_nested 0.4759ms 0.2706ms 3.6955 KOps/s 3.7514 KOps/s $\color{#d91a1a}-1.49\%$
test_items_nested_locked 0.9910ms 0.2705ms 3.6963 KOps/s 3.7441 KOps/s $\color{#d91a1a}-1.28\%$
test_items_nested_leaf 0.5848ms 0.1650ms 6.0608 KOps/s 6.0306 KOps/s $\color{#35bf28}+0.50\%$
test_items_stack_nested 1.6879ms 1.4844ms 673.6512 Ops/s 675.2712 Ops/s $\color{#d91a1a}-0.24\%$
test_items_stack_nested_leaf 2.0860ms 1.3681ms 730.9665 Ops/s 739.1634 Ops/s $\color{#d91a1a}-1.11\%$
test_items_stack_nested_locked 0.8691ms 0.7655ms 1.3064 KOps/s 1.2949 KOps/s $\color{#35bf28}+0.89\%$
test_keys 0.1263ms 3.8974μs 256.5800 KOps/s 261.0417 KOps/s $\color{#d91a1a}-1.71\%$
test_keys_nested 1.4512ms 0.1427ms 7.0060 KOps/s 6.6669 KOps/s $\textbf{\color{#35bf28}+5.09\%}$
test_keys_nested_locked 0.1944ms 0.1418ms 7.0499 KOps/s 7.0865 KOps/s $\color{#d91a1a}-0.52\%$
test_keys_nested_leaf 0.3243ms 0.1425ms 7.0152 KOps/s 7.0783 KOps/s $\color{#d91a1a}-0.89\%$
test_keys_stack_nested 2.1390ms 1.4132ms 707.5926 Ops/s 704.4243 Ops/s $\color{#35bf28}+0.45\%$
test_keys_stack_nested_leaf 2.1344ms 1.4081ms 710.1575 Ops/s 704.5780 Ops/s $\color{#35bf28}+0.79\%$
test_keys_stack_nested_locked 1.1629ms 0.6725ms 1.4871 KOps/s 1.4407 KOps/s $\color{#35bf28}+3.22\%$
test_values 8.2378μs 1.1873μs 842.2167 KOps/s 856.3997 KOps/s $\color{#d91a1a}-1.66\%$
test_values_nested 85.7500μs 49.4506μs 20.2222 KOps/s 20.3051 KOps/s $\color{#d91a1a}-0.41\%$
test_values_nested_locked 98.3640μs 49.5034μs 20.2006 KOps/s 20.1147 KOps/s $\color{#35bf28}+0.43\%$
test_values_nested_leaf 55.7240μs 43.9320μs 22.7625 KOps/s 22.5214 KOps/s $\color{#35bf28}+1.07\%$
test_values_stack_nested 1.3934ms 1.1883ms 841.5384 Ops/s 831.9000 Ops/s $\color{#35bf28}+1.16\%$
test_values_stack_nested_leaf 1.2703ms 1.1853ms 843.6415 Ops/s 836.3661 Ops/s $\color{#35bf28}+0.87\%$
test_values_stack_nested_locked 0.8939ms 0.5101ms 1.9605 KOps/s 1.9189 KOps/s $\color{#35bf28}+2.17\%$
test_membership 16.0700μs 1.3606μs 734.9906 KOps/s 739.8790 KOps/s $\color{#d91a1a}-0.66\%$
test_membership_nested 36.8490μs 2.7888μs 358.5803 KOps/s 345.7939 KOps/s $\color{#35bf28}+3.70\%$
test_membership_nested_leaf 21.7200μs 2.8290μs 353.4816 KOps/s 344.4963 KOps/s $\color{#35bf28}+2.61\%$
test_membership_stacked_nested 45.5450μs 11.7055μs 85.4299 KOps/s 84.3855 KOps/s $\color{#35bf28}+1.24\%$
test_membership_stacked_nested_leaf 40.7360μs 11.5619μs 86.4907 KOps/s 83.6561 KOps/s $\color{#35bf28}+3.39\%$
test_membership_nested_last 28.0020μs 5.8693μs 170.3768 KOps/s 162.8009 KOps/s $\color{#35bf28}+4.65\%$
test_membership_nested_leaf_last 39.9240μs 5.9031μs 169.4018 KOps/s 167.2527 KOps/s $\color{#35bf28}+1.28\%$
test_membership_stacked_nested_last 0.3300ms 0.1691ms 5.9145 KOps/s 5.8964 KOps/s $\color{#35bf28}+0.31\%$
test_membership_stacked_nested_leaf_last 32.7510μs 13.5994μs 73.5329 KOps/s 73.7285 KOps/s $\color{#d91a1a}-0.27\%$
test_nested_getleaf 46.5970μs 10.7366μs 93.1396 KOps/s 95.1910 KOps/s $\color{#d91a1a}-2.15\%$
test_nested_get 36.1080μs 10.2612μs 97.4547 KOps/s 99.3129 KOps/s $\color{#d91a1a}-1.87\%$
test_stacked_getleaf 0.8595ms 0.6464ms 1.5471 KOps/s 1.5420 KOps/s $\color{#35bf28}+0.33\%$
test_stacked_get 1.3914ms 0.6133ms 1.6306 KOps/s 1.6055 KOps/s $\color{#35bf28}+1.56\%$
test_nested_getitemleaf 49.9930μs 10.7118μs 93.3549 KOps/s 92.7389 KOps/s $\color{#35bf28}+0.66\%$
test_nested_getitem 39.4340μs 10.1618μs 98.4080 KOps/s 99.0053 KOps/s $\color{#d91a1a}-0.60\%$
test_stacked_getitemleaf 1.0988ms 0.6451ms 1.5502 KOps/s 1.5464 KOps/s $\color{#35bf28}+0.25\%$
test_stacked_getitem 1.1529ms 0.6268ms 1.5953 KOps/s 1.6313 KOps/s $\color{#d91a1a}-2.21\%$
test_lock_nested 55.0275ms 0.5424ms 1.8438 KOps/s 2.0256 KOps/s $\textbf{\color{#d91a1a}-8.97\%}$
test_lock_stack_nested 74.1596ms 8.2480ms 121.2418 Ops/s 124.8375 Ops/s $\color{#d91a1a}-2.88\%$
test_unlock_nested 60.7838ms 0.5023ms 1.9909 KOps/s 1.9536 KOps/s $\color{#35bf28}+1.91\%$
test_unlock_stack_nested 68.5761ms 8.0016ms 124.9752 Ops/s 206.1984 Ops/s $\textbf{\color{#d91a1a}-39.39\%}$
test_flatten_speed 1.1797ms 0.2788ms 3.5870 KOps/s 3.6930 KOps/s $\color{#d91a1a}-2.87\%$
test_unflatten_speed 0.5351ms 0.4677ms 2.1383 KOps/s 2.1598 KOps/s $\color{#d91a1a}-1.00\%$
test_common_ops 4.1997ms 0.6804ms 1.4698 KOps/s 1.4932 KOps/s $\color{#d91a1a}-1.57\%$
test_creation 25.7980μs 2.4049μs 415.8128 KOps/s 418.3835 KOps/s $\color{#d91a1a}-0.61\%$
test_creation_empty 39.6340μs 7.9539μs 125.7243 KOps/s 122.5708 KOps/s $\color{#35bf28}+2.57\%$
test_creation_nested_1 40.5860μs 11.3128μs 88.3954 KOps/s 85.2253 KOps/s $\color{#35bf28}+3.72\%$
test_creation_nested_2 38.5520μs 14.6970μs 68.0412 KOps/s 66.2535 KOps/s $\color{#35bf28}+2.70\%$
test_clone 85.1090μs 13.2138μs 75.6784 KOps/s 74.8756 KOps/s $\color{#35bf28}+1.07\%$
test_getitem[int] 44.7530μs 12.9914μs 76.9741 KOps/s 75.2508 KOps/s $\color{#35bf28}+2.29\%$
test_getitem[slice_int] 72.7760μs 25.0768μs 39.8776 KOps/s 39.5558 KOps/s $\color{#35bf28}+0.81\%$
test_getitem[range] 84.3670μs 45.5076μs 21.9744 KOps/s 21.6114 KOps/s $\color{#35bf28}+1.68\%$
test_getitem[tuple] 66.7840μs 20.4001μs 49.0194 KOps/s 48.4561 KOps/s $\color{#35bf28}+1.16\%$
test_getitem[list] 0.2674ms 40.3544μs 24.7804 KOps/s 24.2328 KOps/s $\color{#35bf28}+2.26\%$
test_setitem_dim[int] 0.1004ms 28.2860μs 35.3532 KOps/s 36.3653 KOps/s $\color{#d91a1a}-2.78\%$
test_setitem_dim[slice_int] 85.9900μs 52.4884μs 19.0518 KOps/s 18.9072 KOps/s $\color{#35bf28}+0.77\%$
test_setitem_dim[range] 0.1118ms 72.4376μs 13.8050 KOps/s 13.6967 KOps/s $\color{#35bf28}+0.79\%$
test_setitem_dim[tuple] 87.0130μs 41.1120μs 24.3238 KOps/s 24.6220 KOps/s $\color{#d91a1a}-1.21\%$
test_setitem 0.1266ms 18.2367μs 54.8345 KOps/s 53.8737 KOps/s $\color{#35bf28}+1.78\%$
test_set 0.1272ms 17.4067μs 57.4490 KOps/s 56.1146 KOps/s $\color{#35bf28}+2.38\%$
test_set_shared 1.9672ms 0.1401ms 7.1383 KOps/s 7.1165 KOps/s $\color{#35bf28}+0.31\%$
test_update 0.1106ms 19.1860μs 52.1215 KOps/s 53.0755 KOps/s $\color{#d91a1a}-1.80\%$
test_update_nested 0.1493ms 26.5327μs 37.6893 KOps/s 38.1510 KOps/s $\color{#d91a1a}-1.21\%$
test_set_nested 0.1175ms 19.9209μs 50.1985 KOps/s 50.7384 KOps/s $\color{#d91a1a}-1.06\%$
test_set_nested_new 0.1160ms 24.7793μs 40.3562 KOps/s 38.8773 KOps/s $\color{#35bf28}+3.80\%$
test_select 0.1217ms 50.3912μs 19.8448 KOps/s 19.9157 KOps/s $\color{#d91a1a}-0.36\%$
test_unbind_speed 0.4404ms 0.3745ms 2.6703 KOps/s 2.6889 KOps/s $\color{#d91a1a}-0.69\%$
test_unbind_speed_stack0 65.5388ms 5.2956ms 188.8346 Ops/s 248.8094 Ops/s $\textbf{\color{#d91a1a}-24.10\%}$
test_unbind_speed_stack1 2.6093μs 0.6335μs 1.5787 MOps/s 1.5718 MOps/s $\color{#35bf28}+0.44\%$
test_split 55.3698ms 1.7556ms 569.6016 Ops/s 557.7401 Ops/s $\color{#35bf28}+2.13\%$
test_chunk 58.6198ms 1.7392ms 574.9751 Ops/s 567.6994 Ops/s $\color{#35bf28}+1.28\%$
test_creation[device0] 5.2488ms 0.2950ms 3.3901 KOps/s 3.2776 KOps/s $\color{#35bf28}+3.43\%$
test_creation_from_tensor 59.5799ms 0.3580ms 2.7933 KOps/s 2.9859 KOps/s $\textbf{\color{#d91a1a}-6.45\%}$
test_add_one[memmap_tensor0] 70.6520μs 25.7879μs 38.7779 KOps/s 40.0790 KOps/s $\color{#d91a1a}-3.25\%$
test_contiguous[memmap_tensor0] 29.4350μs 6.0130μs 166.3074 KOps/s 171.7384 KOps/s $\color{#d91a1a}-3.16\%$
test_stack[memmap_tensor0] 96.9910μs 19.6944μs 50.7759 KOps/s 51.3259 KOps/s $\color{#d91a1a}-1.07\%$
test_memmaptd_index 0.4862ms 0.4016ms 2.4899 KOps/s 2.4456 KOps/s $\color{#35bf28}+1.81\%$
test_memmaptd_index_astensor 0.9022ms 0.4683ms 2.1352 KOps/s 2.0830 KOps/s $\color{#35bf28}+2.51\%$
test_memmaptd_index_op 0.8051ms 0.7051ms 1.4183 KOps/s 1.3855 KOps/s $\color{#35bf28}+2.36\%$
test_reshape_pytree 0.3323ms 23.2215μs 43.0635 KOps/s 41.6877 KOps/s $\color{#35bf28}+3.30\%$
test_reshape_td 89.5580μs 31.6180μs 31.6276 KOps/s 30.5861 KOps/s $\color{#35bf28}+3.41\%$
test_view_pytree 75.1900μs 23.4766μs 42.5956 KOps/s 42.6087 KOps/s $\color{#d91a1a}-0.03\%$
test_view_td 23.1530μs 4.8547μs 205.9853 KOps/s 203.5797 KOps/s $\color{#35bf28}+1.18\%$
test_unbind_pytree 86.3010μs 26.3338μs 37.9740 KOps/s 37.5214 KOps/s $\color{#35bf28}+1.21\%$
test_unbind_td 0.1154ms 59.7871μs 16.7260 KOps/s 16.6501 KOps/s $\color{#35bf28}+0.46\%$
test_split_pytree 89.1680μs 26.2076μs 38.1568 KOps/s 37.5763 KOps/s $\color{#35bf28}+1.54\%$
test_split_td 0.1298ms 46.9680μs 21.2911 KOps/s 20.9089 KOps/s $\color{#35bf28}+1.83\%$
test_add_pytree 70.2610μs 32.3736μs 30.8894 KOps/s 30.8967 KOps/s $\color{#d91a1a}-0.02\%$
test_add_td 0.1073ms 45.0460μs 22.1995 KOps/s 21.9269 KOps/s $\color{#35bf28}+1.24\%$
test_distributed 24.8660μs 6.0096μs 166.4017 KOps/s 166.5763 KOps/s $\color{#d91a1a}-0.10\%$
test_tdmodule 0.1021ms 20.9949μs 47.6305 KOps/s 46.4625 KOps/s $\color{#35bf28}+2.51\%$
test_tdmodule_dispatch 0.1678ms 38.4929μs 25.9788 KOps/s 25.5022 KOps/s $\color{#35bf28}+1.87\%$
test_tdseq 0.3638ms 24.4907μs 40.8318 KOps/s 40.6807 KOps/s $\color{#35bf28}+0.37\%$
test_tdseq_dispatch 0.4113ms 42.3081μs 23.6362 KOps/s 23.4523 KOps/s $\color{#35bf28}+0.78\%$
test_instantiation_functorch 1.9890ms 1.3153ms 760.3111 Ops/s 766.2344 Ops/s $\color{#d91a1a}-0.77\%$
test_instantiation_td 1.5170ms 1.0264ms 974.2583 Ops/s 977.4129 Ops/s $\color{#d91a1a}-0.32\%$
test_exec_functorch 0.2263ms 0.1617ms 6.1849 KOps/s 6.2688 KOps/s $\color{#d91a1a}-1.34\%$
test_exec_functional_call 0.3898ms 0.1465ms 6.8240 KOps/s 6.6695 KOps/s $\color{#35bf28}+2.32\%$
test_exec_td 0.2165ms 0.1423ms 7.0269 KOps/s 6.8472 KOps/s $\color{#35bf28}+2.62\%$
test_exec_td_decorator 1.0105ms 0.1807ms 5.5343 KOps/s 3.6724 KOps/s $\textbf{\color{#35bf28}+50.70\%}$
test_vmap_mlp_speed[True-True] 1.0075ms 0.8958ms 1.1164 KOps/s 1.0970 KOps/s $\color{#35bf28}+1.77\%$
test_vmap_mlp_speed[True-False] 0.7111ms 0.4673ms 2.1398 KOps/s 2.0827 KOps/s $\color{#35bf28}+2.74\%$
test_vmap_mlp_speed[False-True] 1.1664ms 0.7819ms 1.2790 KOps/s 1.2631 KOps/s $\color{#35bf28}+1.26\%$
test_vmap_mlp_speed[False-False] 0.5048ms 0.3861ms 2.5898 KOps/s 2.5397 KOps/s $\color{#35bf28}+1.97\%$
test_vmap_mlp_speed_decorator[True-True] 2.7442ms 1.7742ms 563.6298 Ops/s 623.8952 Ops/s $\textbf{\color{#d91a1a}-9.66\%}$
test_vmap_mlp_speed_decorator[True-False] 0.9940ms 0.5179ms 1.9310 KOps/s 1.7831 KOps/s $\textbf{\color{#35bf28}+8.29\%}$
test_vmap_mlp_speed_decorator[False-True] 2.3741ms 1.4872ms 672.3859 Ops/s 728.0136 Ops/s $\textbf{\color{#d91a1a}-7.64\%}$
test_vmap_mlp_speed_decorator[False-False] 1.2460ms 0.4025ms 2.4844 KOps/s 2.2905 KOps/s $\textbf{\color{#35bf28}+8.47\%}$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.4211ms 12.6294μs 79.1805 KOps/s 78.9239 KOps/s $\color{#35bf28}+0.33\%$
test_plain_set_stack_nested 0.1431ms 0.1154ms 8.6671 KOps/s 8.3673 KOps/s $\color{#35bf28}+3.58\%$
test_plain_set_nested_inplace 39.8720μs 14.8730μs 67.2359 KOps/s 66.2317 KOps/s $\color{#35bf28}+1.52\%$
test_plain_set_stack_nested_inplace 0.1716ms 0.1394ms 7.1752 KOps/s 7.1217 KOps/s $\color{#35bf28}+0.75\%$
test_items 24.0710μs 4.9031μs 203.9512 KOps/s 207.8566 KOps/s $\color{#d91a1a}-1.88\%$
test_items_nested 0.4046ms 0.3358ms 2.9784 KOps/s 2.9564 KOps/s $\color{#35bf28}+0.74\%$
test_items_nested_locked 0.3949ms 0.3365ms 2.9718 KOps/s 2.9811 KOps/s $\color{#d91a1a}-0.31\%$
test_items_nested_leaf 0.2241ms 0.1981ms 5.0482 KOps/s 5.0160 KOps/s $\color{#35bf28}+0.64\%$
test_items_stack_nested 1.5875ms 1.4776ms 676.7658 Ops/s 678.4091 Ops/s $\color{#d91a1a}-0.24\%$
test_items_stack_nested_leaf 1.3704ms 1.3053ms 766.0841 Ops/s 764.0761 Ops/s $\color{#35bf28}+0.26\%$
test_items_stack_nested_locked 1.7633ms 0.7967ms 1.2552 KOps/s 1.2552 KOps/s $+0.00\%$
test_keys 24.3810μs 4.5738μs 218.6377 KOps/s 218.6668 KOps/s $\color{#d91a1a}-0.01\%$
test_keys_nested 0.4960ms 90.3431μs 11.0689 KOps/s 11.1941 KOps/s $\color{#d91a1a}-1.12\%$
test_keys_nested_locked 0.1210ms 89.6516μs 11.1543 KOps/s 11.2335 KOps/s $\color{#d91a1a}-0.71\%$
test_keys_nested_leaf 42.0050ms 86.9698μs 11.4982 KOps/s 12.3349 KOps/s $\textbf{\color{#d91a1a}-6.78\%}$
test_keys_stack_nested 1.3674ms 1.2767ms 783.2743 Ops/s 772.7949 Ops/s $\color{#35bf28}+1.36\%$
test_keys_stack_nested_leaf 1.3863ms 1.2673ms 789.0860 Ops/s 776.1860 Ops/s $\color{#35bf28}+1.66\%$
test_keys_stack_nested_locked 0.6909ms 0.5958ms 1.6783 KOps/s 1.6771 KOps/s $\color{#35bf28}+0.07\%$
test_values 8.2603μs 1.8848μs 530.5684 KOps/s 533.7036 KOps/s $\color{#d91a1a}-0.59\%$
test_values_nested 74.6130μs 42.8407μs 23.3423 KOps/s 23.2731 KOps/s $\color{#35bf28}+0.30\%$
test_values_nested_locked 65.3330μs 43.0486μs 23.2296 KOps/s 22.9672 KOps/s $\color{#35bf28}+1.14\%$
test_values_nested_leaf 64.8340μs 37.1143μs 26.9438 KOps/s 26.5796 KOps/s $\color{#35bf28}+1.37\%$
test_values_stack_nested 1.1897ms 1.1283ms 886.3239 Ops/s 893.0326 Ops/s $\color{#d91a1a}-0.75\%$
test_values_stack_nested_leaf 1.1748ms 1.1087ms 901.9867 Ops/s 899.7830 Ops/s $\color{#35bf28}+0.24\%$
test_values_stack_nested_locked 0.5304ms 0.4728ms 2.1151 KOps/s 2.0980 KOps/s $\color{#35bf28}+0.81\%$
test_membership 4.5818μs 0.9278μs 1.0778 MOps/s 1.0693 MOps/s $\color{#35bf28}+0.80\%$
test_membership_nested 28.3510μs 2.1844μs 457.7935 KOps/s 468.9322 KOps/s $\color{#d91a1a}-2.38\%$
test_membership_nested_leaf 16.0055μs 2.1360μs 468.1577 KOps/s 475.1231 KOps/s $\color{#d91a1a}-1.47\%$
test_membership_stacked_nested 44.0520μs 10.7129μs 93.3450 KOps/s 92.6988 KOps/s $\color{#35bf28}+0.70\%$
test_membership_stacked_nested_leaf 35.3520μs 10.7207μs 93.2775 KOps/s 91.1184 KOps/s $\color{#35bf28}+2.37\%$
test_membership_nested_last 21.5510μs 4.5702μs 218.8085 KOps/s 217.3792 KOps/s $\color{#35bf28}+0.66\%$
test_membership_nested_leaf_last 33.2210μs 4.5652μs 219.0503 KOps/s 218.4525 KOps/s $\color{#35bf28}+0.27\%$
test_membership_stacked_nested_last 0.1649ms 0.1330ms 7.5187 KOps/s 7.4985 KOps/s $\color{#35bf28}+0.27\%$
test_membership_stacked_nested_leaf_last 32.1320μs 12.5882μs 79.4396 KOps/s 79.1724 KOps/s $\color{#35bf28}+0.34\%$
test_nested_getleaf 29.7310μs 8.3609μs 119.6048 KOps/s 118.9282 KOps/s $\color{#35bf28}+0.57\%$
test_nested_get 29.3810μs 7.8730μs 127.0170 KOps/s 125.7927 KOps/s $\color{#35bf28}+0.97\%$
test_stacked_getleaf 0.6145ms 0.5606ms 1.7839 KOps/s 1.7410 KOps/s $\color{#35bf28}+2.46\%$
test_stacked_get 0.5766ms 0.5282ms 1.8932 KOps/s 1.8597 KOps/s $\color{#35bf28}+1.80\%$
test_nested_getitemleaf 31.4010μs 8.3752μs 119.3996 KOps/s 118.1746 KOps/s $\color{#35bf28}+1.04\%$
test_nested_getitem 28.8910μs 7.9380μs 125.9770 KOps/s 124.9850 KOps/s $\color{#35bf28}+0.79\%$
test_stacked_getitemleaf 0.6342ms 0.5613ms 1.7814 KOps/s 1.7608 KOps/s $\color{#35bf28}+1.17\%$
test_stacked_getitem 0.6048ms 0.5378ms 1.8596 KOps/s 1.8769 KOps/s $\color{#d91a1a}-0.92\%$
test_lock_nested 4.3950ms 0.4539ms 2.2033 KOps/s 2.1870 KOps/s $\color{#35bf28}+0.75\%$
test_lock_stack_nested 67.9166ms 6.5143ms 153.5091 Ops/s 151.5574 Ops/s $\color{#35bf28}+1.29\%$
test_unlock_nested 1.2988ms 0.4327ms 2.3108 KOps/s 2.0322 KOps/s $\textbf{\color{#35bf28}+13.71\%}$
test_unlock_stack_nested 63.8541ms 7.2507ms 137.9185 Ops/s 138.2704 Ops/s $\color{#d91a1a}-0.25\%$
test_flatten_speed 0.5201ms 0.1869ms 5.3505 KOps/s 5.3941 KOps/s $\color{#d91a1a}-0.81\%$
test_unflatten_speed 0.4305ms 0.3620ms 2.7625 KOps/s 2.7980 KOps/s $\color{#d91a1a}-1.27\%$
test_common_ops 1.0558ms 0.5823ms 1.7174 KOps/s 1.6958 KOps/s $\color{#35bf28}+1.27\%$
test_creation 13.5710μs 1.9300μs 518.1304 KOps/s 520.0878 KOps/s $\color{#d91a1a}-0.38\%$
test_creation_empty 24.9510μs 6.5387μs 152.9363 KOps/s 141.9366 KOps/s $\textbf{\color{#35bf28}+7.75\%}$
test_creation_nested_1 41.6520μs 8.9675μs 111.5143 KOps/s 106.3376 KOps/s $\color{#35bf28}+4.87\%$
test_creation_nested_2 30.2810μs 11.6067μs 86.1575 KOps/s 83.5301 KOps/s $\color{#35bf28}+3.15\%$
test_clone 0.1066ms 13.7770μs 72.5850 KOps/s 72.5156 KOps/s $\color{#35bf28}+0.10\%$
test_getitem[int] 39.7020μs 11.8738μs 84.2192 KOps/s 83.8293 KOps/s $\color{#35bf28}+0.47\%$
test_getitem[slice_int] 48.3020μs 22.5313μs 44.3828 KOps/s 43.1771 KOps/s $\color{#35bf28}+2.79\%$
test_getitem[range] 61.6730μs 38.7413μs 25.8123 KOps/s 25.1509 KOps/s $\color{#35bf28}+2.63\%$
test_getitem[tuple] 49.5520μs 19.3865μs 51.5824 KOps/s 50.3319 KOps/s $\color{#35bf28}+2.48\%$
test_getitem[list] 0.3019ms 35.5470μs 28.1317 KOps/s 27.2563 KOps/s $\color{#35bf28}+3.21\%$
test_setitem_dim[int] 40.8520μs 24.0906μs 41.5099 KOps/s 38.5385 KOps/s $\textbf{\color{#35bf28}+7.71\%}$
test_setitem_dim[slice_int] 61.1930μs 43.5627μs 22.9554 KOps/s 21.8642 KOps/s $\color{#35bf28}+4.99\%$
test_setitem_dim[range] 83.4740μs 61.2008μs 16.3397 KOps/s 15.8878 KOps/s $\color{#35bf28}+2.84\%$
test_setitem_dim[tuple] 78.6440μs 37.5969μs 26.5980 KOps/s 25.8887 KOps/s $\color{#35bf28}+2.74\%$
test_setitem 0.1126ms 17.5392μs 57.0153 KOps/s 57.2480 KOps/s $\color{#d91a1a}-0.41\%$
test_set 0.1082ms 16.8393μs 59.3850 KOps/s 58.2965 KOps/s $\color{#35bf28}+1.87\%$
test_set_shared 2.6463ms 98.9800μs 10.1030 KOps/s 9.3730 KOps/s $\textbf{\color{#35bf28}+7.79\%}$
test_update 0.1069ms 18.0709μs 55.3375 KOps/s 54.4232 KOps/s $\color{#35bf28}+1.68\%$
test_update_nested 0.1285ms 24.5734μs 40.6944 KOps/s 40.5572 KOps/s $\color{#35bf28}+0.34\%$
test_set_nested 0.1069ms 18.2419μs 54.8187 KOps/s 55.2669 KOps/s $\color{#d91a1a}-0.81\%$
test_set_nested_new 0.1208ms 22.5770μs 44.2930 KOps/s 43.8337 KOps/s $\color{#35bf28}+1.05\%$
test_select 0.1467ms 46.2471μs 21.6230 KOps/s 21.9435 KOps/s $\color{#d91a1a}-1.46\%$
test_to 73.6140μs 51.1312μs 19.5575 KOps/s 19.7928 KOps/s $\color{#d91a1a}-1.19\%$
test_to_nonblocking 62.2230μs 33.2236μs 30.0990 KOps/s 29.6131 KOps/s $\color{#35bf28}+1.64\%$
test_unbind_speed 0.3799ms 0.3452ms 2.8967 KOps/s 2.8854 KOps/s $\color{#35bf28}+0.39\%$
test_unbind_speed_stack0 60.0775ms 5.0693ms 197.2671 Ops/s 195.3944 Ops/s $\color{#35bf28}+0.96\%$
test_unbind_speed_stack1 1.9831μs 0.5290μs 1.8902 MOps/s 1.9091 MOps/s $\color{#d91a1a}-0.99\%$
test_split 53.0894ms 1.7574ms 569.0289 Ops/s 562.7664 Ops/s $\color{#35bf28}+1.11\%$
test_chunk 53.0076ms 1.7579ms 568.8754 Ops/s 567.4860 Ops/s $\color{#35bf28}+0.24\%$
test_creation[device0] 0.4213ms 0.3093ms 3.2334 KOps/s 3.2287 KOps/s $\color{#35bf28}+0.15\%$
test_creation[device1] 54.9972ms 0.3358ms 2.9781 KOps/s 3.1891 KOps/s $\textbf{\color{#d91a1a}-6.62\%}$
test_creation_from_tensor 0.5787ms 0.3376ms 2.9620 KOps/s 2.6957 KOps/s $\textbf{\color{#35bf28}+9.88\%}$
test_add_one[memmap_tensor0] 70.2840μs 22.6726μs 44.1062 KOps/s 42.6968 KOps/s $\color{#35bf28}+3.30\%$
test_add_one[memmap_tensor1] 0.2095ms 71.9854μs 13.8917 KOps/s 13.9939 KOps/s $\color{#d91a1a}-0.73\%$
test_contiguous[memmap_tensor0] 29.9920μs 5.7976μs 172.4843 KOps/s 174.7782 KOps/s $\color{#d91a1a}-1.31\%$
test_contiguous[memmap_tensor1] 53.0330μs 21.0752μs 47.4491 KOps/s 47.8573 KOps/s $\color{#d91a1a}-0.85\%$
test_stack[memmap_tensor0] 49.0720μs 18.9636μs 52.7326 KOps/s 51.9975 KOps/s $\color{#35bf28}+1.41\%$
test_stack[memmap_tensor1] 0.1528ms 72.1606μs 13.8580 KOps/s 13.4228 KOps/s $\color{#35bf28}+3.24\%$
test_memmaptd_index 0.4609ms 0.4199ms 2.3812 KOps/s 2.3747 KOps/s $\color{#35bf28}+0.28\%$
test_memmaptd_index_astensor 0.5294ms 0.4776ms 2.0939 KOps/s 2.0862 KOps/s $\color{#35bf28}+0.37\%$
test_memmaptd_index_op 0.7716ms 0.7150ms 1.3987 KOps/s 1.3514 KOps/s $\color{#35bf28}+3.50\%$
test_reshape_pytree 42.7710μs 20.6918μs 48.3282 KOps/s 47.9222 KOps/s $\color{#35bf28}+0.85\%$
test_reshape_td 48.2920μs 29.8524μs 33.4982 KOps/s 33.5291 KOps/s $\color{#d91a1a}-0.09\%$
test_view_pytree 35.4120μs 20.4906μs 48.8028 KOps/s 48.3753 KOps/s $\color{#35bf28}+0.88\%$
test_view_td 23.4510μs 4.0882μs 244.6036 KOps/s 245.3667 KOps/s $\color{#d91a1a}-0.31\%$
test_unbind_pytree 49.5920μs 25.4534μs 39.2875 KOps/s 39.3311 KOps/s $\color{#d91a1a}-0.11\%$
test_unbind_td 77.6140μs 54.9088μs 18.2120 KOps/s 17.8613 KOps/s $\color{#35bf28}+1.96\%$
test_split_pytree 0.8458ms 24.1525μs 41.4035 KOps/s 41.6226 KOps/s $\color{#d91a1a}-0.53\%$
test_split_td 73.0030μs 42.3103μs 23.6349 KOps/s 22.4112 KOps/s $\textbf{\color{#35bf28}+5.46\%}$
test_add_pytree 47.7130μs 30.4811μs 32.8073 KOps/s 33.1077 KOps/s $\color{#d91a1a}-0.91\%$
test_add_td 66.9230μs 40.8044μs 24.5071 KOps/s 23.9581 KOps/s $\color{#35bf28}+2.29\%$
test_distributed 17.6010μs 5.5205μs 181.1427 KOps/s 175.9385 KOps/s $\color{#35bf28}+2.96\%$
test_tdmodule 35.4810μs 16.4018μs 60.9690 KOps/s 59.7600 KOps/s $\color{#35bf28}+2.02\%$
test_tdmodule_dispatch 0.2673ms 32.1188μs 31.1344 KOps/s 30.5154 KOps/s $\color{#35bf28}+2.03\%$
test_tdseq 34.6420μs 19.4664μs 51.3705 KOps/s 49.8794 KOps/s $\color{#35bf28}+2.99\%$
test_tdseq_dispatch 0.1510ms 34.8325μs 28.7088 KOps/s 26.9404 KOps/s $\textbf{\color{#35bf28}+6.56\%}$
test_instantiation_functorch 1.7643ms 1.6793ms 595.5019 Ops/s 593.8954 Ops/s $\color{#35bf28}+0.27\%$
test_instantiation_td 1.9085ms 1.1762ms 850.2305 Ops/s 852.1358 Ops/s $\color{#d91a1a}-0.22\%$
test_exec_functorch 0.2046ms 0.1543ms 6.4801 KOps/s 6.4676 KOps/s $\color{#35bf28}+0.19\%$
test_exec_functional_call 0.2163ms 0.1533ms 6.5229 KOps/s 6.5892 KOps/s $\color{#d91a1a}-1.01\%$
test_exec_td 0.1734ms 0.1420ms 7.0409 KOps/s 6.9983 KOps/s $\color{#35bf28}+0.61\%$
test_exec_td_decorator 63.9216ms 0.2010ms 4.9750 KOps/s 4.6120 KOps/s $\textbf{\color{#35bf28}+7.87\%}$
test_vmap_mlp_speed[True-True] 1.5470ms 1.0549ms 947.9602 Ops/s 945.3579 Ops/s $\color{#35bf28}+0.28\%$
test_vmap_mlp_speed[True-False] 0.6434ms 0.5903ms 1.6941 KOps/s 1.6718 KOps/s $\color{#35bf28}+1.34\%$
test_vmap_mlp_speed[False-True] 1.0100ms 0.9586ms 1.0432 KOps/s 1.0312 KOps/s $\color{#35bf28}+1.16\%$
test_vmap_mlp_speed[False-False] 0.5813ms 0.5237ms 1.9095 KOps/s 1.8943 KOps/s $\color{#35bf28}+0.80\%$
test_vmap_mlp_speed_decorator[True-True] 2.7425ms 2.0231ms 494.2838 Ops/s 566.1121 Ops/s $\textbf{\color{#d91a1a}-12.69\%}$
test_vmap_mlp_speed_decorator[True-False] 1.1712ms 0.6410ms 1.5600 KOps/s 1.4916 KOps/s $\color{#35bf28}+4.59\%$
test_vmap_mlp_speed_decorator[False-True] 2.2573ms 1.7400ms 574.7166 Ops/s 630.2028 Ops/s $\textbf{\color{#d91a1a}-8.80\%}$
test_vmap_mlp_speed_decorator[False-False] 1.0054ms 0.5454ms 1.8335 KOps/s 1.7749 KOps/s $\color{#35bf28}+3.30\%$
test_vmap_transformer_speed[True-True] 12.3646ms 12.2601ms 81.5656 Ops/s 81.3220 Ops/s $\color{#35bf28}+0.30\%$
test_vmap_transformer_speed[True-False] 8.0023ms 7.9440ms 125.8818 Ops/s 125.1025 Ops/s $\color{#35bf28}+0.62\%$
test_vmap_transformer_speed[False-True] 12.2011ms 12.1082ms 82.5887 Ops/s 81.2234 Ops/s $\color{#35bf28}+1.68\%$
test_vmap_transformer_speed[False-False] 7.9540ms 7.8556ms 127.2983 Ops/s 126.3453 Ops/s $\color{#35bf28}+0.75\%$
test_vmap_transformer_speed_decorator[True-True] 64.2383ms 63.2098ms 15.8203 Ops/s 23.7193 Ops/s $\textbf{\color{#d91a1a}-33.30\%}$
test_vmap_transformer_speed_decorator[True-False] 20.9064ms 19.2437ms 51.9650 Ops/s 47.1718 Ops/s $\textbf{\color{#35bf28}+10.16\%}$
test_vmap_transformer_speed_decorator[False-True] 58.8411ms 57.6712ms 17.3397 Ops/s 23.9580 Ops/s $\textbf{\color{#d91a1a}-27.62\%}$
test_vmap_transformer_speed_decorator[False-False] 99.0540ms 20.4384ms 48.9275 Ops/s 48.1389 Ops/s $\color{#35bf28}+1.64\%$

@vmoens vmoens marked this pull request as ready for review November 24, 2023 12:33
@vmoens vmoens merged commit 4dbabc6 into main Nov 24, 2023
42 of 45 checks passed
@vmoens vmoens deleted the faster-to-module branch November 24, 2023 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants