Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Make copy_ a no-op if tensors are identical #588

Merged
merged 1 commit into from
Dec 4, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Dec 4, 2023

Updated version of #510

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 4, 2023
@vmoens vmoens marked this pull request as ready for review December 4, 2023 11:42
@vmoens vmoens merged commit 86c239f into main Dec 4, 2023
19 of 33 checks passed
@vmoens vmoens deleted the fix_id_copy2 branch December 4, 2023 11:44
Copy link

github-actions bot commented Dec 4, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 113. Improved: $\large\color{#35bf28}15$. Worsened: $\large\color{#d91a1a}2$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 34.3240μs 15.5137μs 64.4592 KOps/s 63.5857 KOps/s $\color{#35bf28}+1.37\%$
test_plain_set_stack_nested 0.2026ms 0.1433ms 6.9806 KOps/s 7.0415 KOps/s $\color{#d91a1a}-0.86\%$
test_plain_set_nested_inplace 57.7370μs 18.1546μs 55.0825 KOps/s 52.2228 KOps/s $\textbf{\color{#35bf28}+5.48\%}$
test_plain_set_stack_nested_inplace 0.2440ms 0.1784ms 5.6064 KOps/s 5.7409 KOps/s $\color{#d91a1a}-2.34\%$
test_items 19.2260μs 2.5591μs 390.7686 KOps/s 401.0244 KOps/s $\color{#d91a1a}-2.56\%$
test_items_nested 0.4745ms 0.2686ms 3.7234 KOps/s 3.6970 KOps/s $\color{#35bf28}+0.71\%$
test_items_nested_locked 1.0656ms 0.2713ms 3.6854 KOps/s 3.6823 KOps/s $\color{#35bf28}+0.09\%$
test_items_nested_leaf 0.5145ms 0.1667ms 5.9994 KOps/s 6.0642 KOps/s $\color{#d91a1a}-1.07\%$
test_items_stack_nested 2.5592ms 1.4885ms 671.8376 Ops/s 667.6541 Ops/s $\color{#35bf28}+0.63\%$
test_items_stack_nested_leaf 1.8129ms 1.3581ms 736.3088 Ops/s 733.9149 Ops/s $\color{#35bf28}+0.33\%$
test_items_stack_nested_locked 1.9579ms 0.7751ms 1.2901 KOps/s 1.2752 KOps/s $\color{#35bf28}+1.17\%$
test_keys 18.1040μs 3.8704μs 258.3680 KOps/s 257.4328 KOps/s $\color{#35bf28}+0.36\%$
test_keys_nested 0.5422ms 0.1425ms 7.0185 KOps/s 6.6830 KOps/s $\textbf{\color{#35bf28}+5.02\%}$
test_keys_nested_locked 0.2480ms 0.1407ms 7.1094 KOps/s 7.0322 KOps/s $\color{#35bf28}+1.10\%$
test_keys_nested_leaf 0.3780ms 0.1404ms 7.1223 KOps/s 6.9889 KOps/s $\color{#35bf28}+1.91\%$
test_keys_stack_nested 2.0937ms 1.4088ms 709.8155 Ops/s 698.5194 Ops/s $\color{#35bf28}+1.62\%$
test_keys_stack_nested_leaf 1.7722ms 1.4154ms 706.5152 Ops/s 705.2572 Ops/s $\color{#35bf28}+0.18\%$
test_keys_stack_nested_locked 1.1328ms 0.6918ms 1.4456 KOps/s 1.4269 KOps/s $\color{#35bf28}+1.31\%$
test_values 6.6272μs 1.1789μs 848.2218 KOps/s 863.7460 KOps/s $\color{#d91a1a}-1.80\%$
test_values_nested 96.3800μs 49.7084μs 20.1173 KOps/s 20.4129 KOps/s $\color{#d91a1a}-1.45\%$
test_values_nested_locked 95.1530μs 50.3636μs 19.8556 KOps/s 20.0471 KOps/s $\color{#d91a1a}-0.96\%$
test_values_nested_leaf 72.8160μs 44.3006μs 22.5730 KOps/s 22.7943 KOps/s $\color{#d91a1a}-0.97\%$
test_values_stack_nested 1.5174ms 1.2014ms 832.3292 Ops/s 822.6289 Ops/s $\color{#35bf28}+1.18\%$
test_values_stack_nested_leaf 1.3292ms 1.1976ms 835.0236 Ops/s 826.9352 Ops/s $\color{#35bf28}+0.98\%$
test_values_stack_nested_locked 0.7989ms 0.5235ms 1.9104 KOps/s 1.8809 KOps/s $\color{#35bf28}+1.57\%$
test_membership 15.1280μs 1.3785μs 725.4074 KOps/s 749.2656 KOps/s $\color{#d91a1a}-3.18\%$
test_membership_nested 21.1790μs 2.8088μs 356.0206 KOps/s 359.2406 KOps/s $\color{#d91a1a}-0.90\%$
test_membership_nested_leaf 26.1180μs 2.8173μs 354.9481 KOps/s 358.1263 KOps/s $\color{#d91a1a}-0.89\%$
test_membership_stacked_nested 31.6590μs 11.9764μs 83.4975 KOps/s 84.9651 KOps/s $\color{#d91a1a}-1.73\%$
test_membership_stacked_nested_leaf 30.7970μs 12.0269μs 83.1471 KOps/s 84.0226 KOps/s $\color{#d91a1a}-1.04\%$
test_membership_nested_last 33.1720μs 5.9636μs 167.6843 KOps/s 168.8428 KOps/s $\color{#d91a1a}-0.69\%$
test_membership_nested_leaf_last 26.5800μs 5.9849μs 167.0860 KOps/s 161.9192 KOps/s $\color{#35bf28}+3.19\%$
test_membership_stacked_nested_last 0.2534ms 0.1699ms 5.8848 KOps/s 5.9290 KOps/s $\color{#d91a1a}-0.74\%$
test_membership_stacked_nested_leaf_last 53.8910μs 13.8193μs 72.3626 KOps/s 72.8154 KOps/s $\color{#d91a1a}-0.62\%$
test_nested_getleaf 54.1200μs 10.5773μs 94.5418 KOps/s 92.2632 KOps/s $\color{#35bf28}+2.47\%$
test_nested_get 52.6180μs 10.1162μs 98.8509 KOps/s 98.5666 KOps/s $\color{#35bf28}+0.29\%$
test_stacked_getleaf 0.7195ms 0.6362ms 1.5719 KOps/s 1.5519 KOps/s $\color{#35bf28}+1.29\%$
test_stacked_get 0.6781ms 0.6066ms 1.6485 KOps/s 1.6517 KOps/s $\color{#d91a1a}-0.20\%$
test_nested_getitemleaf 48.1390μs 10.7243μs 93.2466 KOps/s 92.1082 KOps/s $\color{#35bf28}+1.24\%$
test_nested_getitem 60.2030μs 10.2084μs 97.9585 KOps/s 97.5673 KOps/s $\color{#35bf28}+0.40\%$
test_stacked_getitemleaf 0.9300ms 0.6353ms 1.5740 KOps/s 1.5514 KOps/s $\color{#35bf28}+1.46\%$
test_stacked_getitem 0.8889ms 0.6080ms 1.6447 KOps/s 1.6405 KOps/s $\color{#35bf28}+0.25\%$
test_lock_nested 59.4649ms 0.6113ms 1.6360 KOps/s 1.8052 KOps/s $\textbf{\color{#d91a1a}-9.37\%}$
test_lock_stack_nested 7.7154ms 5.0260ms 198.9673 Ops/s 199.6216 Ops/s $\color{#d91a1a}-0.33\%$
test_unlock_nested 0.8615ms 0.4372ms 2.2871 KOps/s 2.2740 KOps/s $\color{#35bf28}+0.58\%$
test_unlock_stack_nested 72.9064ms 6.9381ms 144.1312 Ops/s 143.8644 Ops/s $\color{#35bf28}+0.19\%$
test_flatten_speed 0.3284ms 0.2676ms 3.7370 KOps/s 3.7325 KOps/s $\color{#35bf28}+0.12\%$
test_unflatten_speed 0.5485ms 0.4621ms 2.1641 KOps/s 2.1900 KOps/s $\color{#d91a1a}-1.18\%$
test_common_ops 5.0402ms 0.6618ms 1.5111 KOps/s 1.4568 KOps/s $\color{#35bf28}+3.73\%$
test_creation 19.3960μs 2.6128μs 382.7299 KOps/s 409.7604 KOps/s $\textbf{\color{#d91a1a}-6.60\%}$
test_creation_empty 30.1660μs 7.9780μs 125.3449 KOps/s 119.0596 KOps/s $\textbf{\color{#35bf28}+5.28\%}$
test_creation_nested_1 88.3640μs 11.5635μs 86.4790 KOps/s 85.7299 KOps/s $\color{#35bf28}+0.87\%$
test_creation_nested_2 56.8060μs 15.0170μs 66.5911 KOps/s 66.3560 KOps/s $\color{#35bf28}+0.35\%$
test_clone 79.7080μs 13.6276μs 73.3803 KOps/s 74.7041 KOps/s $\color{#d91a1a}-1.77\%$
test_getitem[int] 39.9550μs 13.1453μs 76.0730 KOps/s 77.5432 KOps/s $\color{#d91a1a}-1.90\%$
test_getitem[slice_int] 55.7630μs 24.4740μs 40.8597 KOps/s 40.8222 KOps/s $\color{#35bf28}+0.09\%$
test_getitem[range] 87.0720μs 43.4578μs 23.0108 KOps/s 22.3807 KOps/s $\color{#35bf28}+2.82\%$
test_getitem[tuple] 54.7020μs 20.3588μs 49.1188 KOps/s 50.0770 KOps/s $\color{#d91a1a}-1.91\%$
test_getitem[list] 0.2000ms 38.9722μs 25.6593 KOps/s 25.2817 KOps/s $\color{#35bf28}+1.49\%$
test_setitem_dim[int] 49.6130μs 27.0287μs 36.9977 KOps/s 35.9833 KOps/s $\color{#35bf28}+2.82\%$
test_setitem_dim[slice_int] 0.1069ms 51.3309μs 19.4814 KOps/s 19.4232 KOps/s $\color{#35bf28}+0.30\%$
test_setitem_dim[range] 0.1593ms 69.6675μs 14.3539 KOps/s 13.8871 KOps/s $\color{#35bf28}+3.36\%$
test_setitem_dim[tuple] 94.9970μs 40.6165μs 24.6206 KOps/s 24.8547 KOps/s $\color{#d91a1a}-0.94\%$
test_setitem 80.5700μs 18.2625μs 54.7571 KOps/s 52.9372 KOps/s $\color{#35bf28}+3.44\%$
test_set 81.9930μs 17.7308μs 56.3992 KOps/s 54.3842 KOps/s $\color{#35bf28}+3.71\%$
test_set_shared 3.0258ms 0.1423ms 7.0287 KOps/s 6.9274 KOps/s $\color{#35bf28}+1.46\%$
test_update 90.4380μs 18.4985μs 54.0584 KOps/s 49.1942 KOps/s $\textbf{\color{#35bf28}+9.89\%}$
test_update_nested 0.1086ms 25.5289μs 39.1712 KOps/s 35.8460 KOps/s $\textbf{\color{#35bf28}+9.28\%}$
test_set_nested 70.6110μs 19.1250μs 52.2875 KOps/s 49.4471 KOps/s $\textbf{\color{#35bf28}+5.74\%}$
test_set_nested_new 84.7880μs 24.2503μs 41.2367 KOps/s 39.0173 KOps/s $\textbf{\color{#35bf28}+5.69\%}$
test_select 0.1115ms 49.3472μs 20.2646 KOps/s 19.6802 KOps/s $\color{#35bf28}+2.97\%$
test_unbind_speed 0.6768ms 0.3736ms 2.6768 KOps/s 2.7039 KOps/s $\color{#d91a1a}-1.00\%$
test_unbind_speed_stack0 69.5982ms 4.8639ms 205.5959 Ops/s 212.6370 Ops/s $\color{#d91a1a}-3.31\%$
test_unbind_speed_stack1 2.6489μs 0.6606μs 1.5137 MOps/s 1.5262 MOps/s $\color{#d91a1a}-0.81\%$
test_split 58.5186ms 1.7578ms 568.8790 Ops/s 559.7347 Ops/s $\color{#35bf28}+1.63\%$
test_chunk 59.8944ms 1.7252ms 579.6269 Ops/s 571.2625 Ops/s $\color{#35bf28}+1.46\%$
test_creation[device0] 4.7992ms 0.2946ms 3.3943 KOps/s 3.4630 KOps/s $\color{#d91a1a}-1.98\%$
test_creation_from_tensor 0.8214ms 0.3235ms 3.0914 KOps/s 3.0566 KOps/s $\color{#35bf28}+1.14\%$
test_add_one[memmap_tensor0] 82.1630μs 25.2613μs 39.5862 KOps/s 38.4226 KOps/s $\color{#35bf28}+3.03\%$
test_contiguous[memmap_tensor0] 44.3330μs 5.9114μs 169.1646 KOps/s 172.8253 KOps/s $\color{#d91a1a}-2.12\%$
test_stack[memmap_tensor0] 77.3540μs 19.4255μs 51.4786 KOps/s 51.1712 KOps/s $\color{#35bf28}+0.60\%$
test_memmaptd_index 0.3624ms 0.1982ms 5.0445 KOps/s 5.0749 KOps/s $\color{#d91a1a}-0.60\%$
test_memmaptd_index_astensor 0.3379ms 0.2568ms 3.8946 KOps/s 3.9201 KOps/s $\color{#d91a1a}-0.65\%$
test_memmaptd_index_op 0.5845ms 0.4893ms 2.0439 KOps/s 1.9961 KOps/s $\color{#35bf28}+2.39\%$
test_reshape_pytree 55.5940μs 23.1095μs 43.2723 KOps/s 42.8726 KOps/s $\color{#35bf28}+0.93\%$
test_reshape_td 87.2220μs 32.5483μs 30.7236 KOps/s 31.5252 KOps/s $\color{#d91a1a}-2.54\%$
test_view_pytree 66.5740μs 23.1569μs 43.1837 KOps/s 42.2972 KOps/s $\color{#35bf28}+2.10\%$
test_view_td 21.6510μs 4.9570μs 201.7352 KOps/s 210.0338 KOps/s $\color{#d91a1a}-3.95\%$
test_unbind_pytree 58.8690μs 26.6528μs 37.5195 KOps/s 37.8758 KOps/s $\color{#d91a1a}-0.94\%$
test_unbind_td 0.1195ms 59.3196μs 16.8578 KOps/s 16.9087 KOps/s $\color{#d91a1a}-0.30\%$
test_split_pytree 0.5197ms 26.2803μs 38.0513 KOps/s 37.5366 KOps/s $\color{#35bf28}+1.37\%$
test_split_td 97.8620μs 45.9944μs 21.7418 KOps/s 21.5416 KOps/s $\color{#35bf28}+0.93\%$
test_add_pytree 64.4600μs 31.9839μs 31.2657 KOps/s 28.9720 KOps/s $\textbf{\color{#35bf28}+7.92\%}$
test_add_td 0.1014ms 44.2918μs 22.5776 KOps/s 21.3024 KOps/s $\textbf{\color{#35bf28}+5.99\%}$
test_distributed 19.5360μs 6.0735μs 164.6496 KOps/s 168.9517 KOps/s $\color{#d91a1a}-2.55\%$
test_tdmodule 0.7940ms 20.8883μs 47.8736 KOps/s 43.9300 KOps/s $\textbf{\color{#35bf28}+8.98\%}$
test_tdmodule_dispatch 0.2019ms 37.9662μs 26.3392 KOps/s 24.3469 KOps/s $\textbf{\color{#35bf28}+8.18\%}$
test_tdseq 49.7330μs 22.8677μs 43.7298 KOps/s 40.1651 KOps/s $\textbf{\color{#35bf28}+8.88\%}$
test_tdseq_dispatch 0.1305ms 41.4886μs 24.1030 KOps/s 22.8316 KOps/s $\textbf{\color{#35bf28}+5.57\%}$
test_instantiation_functorch 1.6540ms 1.2925ms 773.7077 Ops/s 755.1587 Ops/s $\color{#35bf28}+2.46\%$
test_instantiation_td 1.5557ms 1.0296ms 971.2156 Ops/s 911.5993 Ops/s $\textbf{\color{#35bf28}+6.54\%}$
test_exec_functorch 0.2291ms 0.1616ms 6.1878 KOps/s 6.2471 KOps/s $\color{#d91a1a}-0.95\%$
test_exec_functional_call 0.2486ms 0.1477ms 6.7709 KOps/s 6.5906 KOps/s $\color{#35bf28}+2.74\%$
test_exec_td 0.2870ms 0.1452ms 6.8876 KOps/s 6.8940 KOps/s $\color{#d91a1a}-0.09\%$
test_exec_td_decorator 0.8330ms 0.1777ms 5.6270 KOps/s 5.6048 KOps/s $\color{#35bf28}+0.40\%$
test_vmap_mlp_speed[True-True] 1.2983ms 0.8773ms 1.1399 KOps/s 1.0741 KOps/s $\textbf{\color{#35bf28}+6.12\%}$
test_vmap_mlp_speed[True-False] 0.7235ms 0.4719ms 2.1193 KOps/s 2.0876 KOps/s $\color{#35bf28}+1.52\%$
test_vmap_mlp_speed[False-True] 1.2206ms 0.7749ms 1.2904 KOps/s 1.2358 KOps/s $\color{#35bf28}+4.42\%$
test_vmap_mlp_speed[False-False] 0.5036ms 0.3871ms 2.5835 KOps/s 2.5322 KOps/s $\color{#35bf28}+2.03\%$
test_vmap_mlp_speed_decorator[True-True] 2.3704ms 1.7310ms 577.6935 Ops/s 552.3910 Ops/s $\color{#35bf28}+4.58\%$
test_vmap_mlp_speed_decorator[True-False] 0.9645ms 0.5130ms 1.9491 KOps/s 1.8999 KOps/s $\color{#35bf28}+2.59\%$
test_vmap_mlp_speed_decorator[False-True] 1.9684ms 1.4463ms 691.4158 Ops/s 660.7043 Ops/s $\color{#35bf28}+4.65\%$
test_vmap_mlp_speed_decorator[False-False] 0.9798ms 0.3976ms 2.5151 KOps/s 2.4738 KOps/s $\color{#35bf28}+1.67\%$

Copy link

github-actions bot commented Dec 4, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}29$. Worsened: $\large\color{#d91a1a}1$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.5487ms 12.8039μs 78.1014 KOps/s 78.4907 KOps/s $\color{#d91a1a}-0.50\%$
test_plain_set_stack_nested 0.1362ms 0.1153ms 8.6736 KOps/s 8.2624 KOps/s $\color{#35bf28}+4.98\%$
test_plain_set_nested_inplace 38.2610μs 14.1859μs 70.4927 KOps/s 66.2438 KOps/s $\textbf{\color{#35bf28}+6.41\%}$
test_plain_set_stack_nested_inplace 0.1703ms 0.1433ms 6.9800 KOps/s 7.0190 KOps/s $\color{#d91a1a}-0.55\%$
test_items 18.8000μs 4.6349μs 215.7559 KOps/s 211.9976 KOps/s $\color{#35bf28}+1.77\%$
test_items_nested 0.3910ms 0.3361ms 2.9754 KOps/s 2.9626 KOps/s $\color{#35bf28}+0.43\%$
test_items_nested_locked 0.3932ms 0.3377ms 2.9616 KOps/s 2.9304 KOps/s $\color{#35bf28}+1.07\%$
test_items_nested_leaf 0.2326ms 0.1975ms 5.0631 KOps/s 5.0263 KOps/s $\color{#35bf28}+0.73\%$
test_items_stack_nested 1.5613ms 1.4754ms 677.7858 Ops/s 663.8340 Ops/s $\color{#35bf28}+2.10\%$
test_items_stack_nested_leaf 1.3634ms 1.3023ms 767.8583 Ops/s 750.1647 Ops/s $\color{#35bf28}+2.36\%$
test_items_stack_nested_locked 0.8778ms 0.8236ms 1.2142 KOps/s 1.1748 KOps/s $\color{#35bf28}+3.35\%$
test_keys 20.4400μs 4.5679μs 218.9206 KOps/s 218.8186 KOps/s $\color{#35bf28}+0.05\%$
test_keys_nested 3.4048ms 90.8237μs 11.0103 KOps/s 11.0846 KOps/s $\color{#d91a1a}-0.67\%$
test_keys_nested_locked 0.1146ms 90.6584μs 11.0304 KOps/s 11.1620 KOps/s $\color{#d91a1a}-1.18\%$
test_keys_nested_leaf 42.8885ms 86.8877μs 11.5091 KOps/s 12.1897 KOps/s $\textbf{\color{#d91a1a}-5.58\%}$
test_keys_stack_nested 1.3792ms 1.2853ms 778.0457 Ops/s 757.7148 Ops/s $\color{#35bf28}+2.68\%$
test_keys_stack_nested_leaf 1.3551ms 1.2823ms 779.8545 Ops/s 761.8074 Ops/s $\color{#35bf28}+2.37\%$
test_keys_stack_nested_locked 0.6964ms 0.6334ms 1.5789 KOps/s 1.5455 KOps/s $\color{#35bf28}+2.16\%$
test_values 6.7037μs 1.8960μs 527.4275 KOps/s 521.5566 KOps/s $\color{#35bf28}+1.13\%$
test_values_nested 69.5010μs 42.7003μs 23.4190 KOps/s 23.1621 KOps/s $\color{#35bf28}+1.11\%$
test_values_nested_locked 65.0210μs 44.9033μs 22.2701 KOps/s 22.0585 KOps/s $\color{#35bf28}+0.96\%$
test_values_nested_leaf 57.2110μs 37.1368μs 26.9274 KOps/s 26.6648 KOps/s $\color{#35bf28}+0.99\%$
test_values_stack_nested 1.2320ms 1.1274ms 887.0293 Ops/s 868.2667 Ops/s $\color{#35bf28}+2.16\%$
test_values_stack_nested_leaf 1.1763ms 1.1100ms 900.9005 Ops/s 884.4522 Ops/s $\color{#35bf28}+1.86\%$
test_values_stack_nested_locked 0.5598ms 0.5052ms 1.9796 KOps/s 1.9399 KOps/s $\color{#35bf28}+2.04\%$
test_membership 4.0062μs 0.9318μs 1.0732 MOps/s 1.0569 MOps/s $\color{#35bf28}+1.54\%$
test_membership_nested 21.2410μs 2.1860μs 457.4467 KOps/s 447.5371 KOps/s $\color{#35bf28}+2.21\%$
test_membership_nested_leaf 10.4200μs 2.0835μs 479.9538 KOps/s 470.6439 KOps/s $\color{#35bf28}+1.98\%$
test_membership_stacked_nested 39.5010μs 10.7390μs 93.1186 KOps/s 90.7875 KOps/s $\color{#35bf28}+2.57\%$
test_membership_stacked_nested_leaf 34.7820μs 10.7329μs 93.1713 KOps/s 91.5096 KOps/s $\color{#35bf28}+1.82\%$
test_membership_nested_last 33.1310μs 4.5268μs 220.9073 KOps/s 218.0709 KOps/s $\color{#35bf28}+1.30\%$
test_membership_nested_leaf_last 22.7900μs 4.5028μs 222.0825 KOps/s 217.5531 KOps/s $\color{#35bf28}+2.08\%$
test_membership_stacked_nested_last 0.1605ms 0.1330ms 7.5199 KOps/s 7.4277 KOps/s $\color{#35bf28}+1.24\%$
test_membership_stacked_nested_leaf_last 29.3910μs 12.5498μs 79.6823 KOps/s 79.2798 KOps/s $\color{#35bf28}+0.51\%$
test_nested_getleaf 31.9610μs 8.3613μs 119.5987 KOps/s 118.5932 KOps/s $\color{#35bf28}+0.85\%$
test_nested_get 23.0800μs 7.8725μs 127.0241 KOps/s 125.3909 KOps/s $\color{#35bf28}+1.30\%$
test_stacked_getleaf 0.6136ms 0.5612ms 1.7818 KOps/s 1.7635 KOps/s $\color{#35bf28}+1.04\%$
test_stacked_get 0.6167ms 0.5342ms 1.8720 KOps/s 1.8727 KOps/s $\color{#d91a1a}-0.04\%$
test_nested_getitemleaf 23.3700μs 8.4071μs 118.9470 KOps/s 117.6476 KOps/s $\color{#35bf28}+1.10\%$
test_nested_getitem 29.2500μs 7.9108μs 126.4092 KOps/s 123.6611 KOps/s $\color{#35bf28}+2.22\%$
test_stacked_getitemleaf 0.8524ms 0.5657ms 1.7678 KOps/s 1.7602 KOps/s $\color{#35bf28}+0.43\%$
test_stacked_getitem 0.6204ms 0.5301ms 1.8865 KOps/s 1.8821 KOps/s $\color{#35bf28}+0.23\%$
test_lock_nested 3.2228ms 0.5505ms 1.8165 KOps/s 1.7662 KOps/s $\color{#35bf28}+2.84\%$
test_lock_stack_nested 82.3490ms 7.2251ms 138.4070 Ops/s 133.6922 Ops/s $\color{#35bf28}+3.53\%$
test_unlock_nested 2.3443ms 0.4278ms 2.3378 KOps/s 2.2603 KOps/s $\color{#35bf28}+3.43\%$
test_unlock_stack_nested 69.0773ms 6.2740ms 159.3887 Ops/s 157.8591 Ops/s $\color{#35bf28}+0.97\%$
test_flatten_speed 0.2246ms 0.1860ms 5.3759 KOps/s 5.3303 KOps/s $\color{#35bf28}+0.85\%$
test_unflatten_speed 0.3971ms 0.3639ms 2.7480 KOps/s 2.7303 KOps/s $\color{#35bf28}+0.64\%$
test_common_ops 1.1183ms 0.5919ms 1.6895 KOps/s 1.5927 KOps/s $\textbf{\color{#35bf28}+6.07\%}$
test_creation 30.9200μs 2.0432μs 489.4399 KOps/s 465.0022 KOps/s $\textbf{\color{#35bf28}+5.26\%}$
test_creation_empty 18.6600μs 7.0231μs 142.3875 KOps/s 138.2920 KOps/s $\color{#35bf28}+2.96\%$
test_creation_nested_1 31.5910μs 9.3284μs 107.1998 KOps/s 105.3544 KOps/s $\color{#35bf28}+1.75\%$
test_creation_nested_2 29.3100μs 11.9502μs 83.6805 KOps/s 81.4590 KOps/s $\color{#35bf28}+2.73\%$
test_clone 79.2410μs 13.7816μs 72.5604 KOps/s 64.7868 KOps/s $\textbf{\color{#35bf28}+12.00\%}$
test_getitem[int] 26.6210μs 12.0851μs 82.7466 KOps/s 77.7683 KOps/s $\textbf{\color{#35bf28}+6.40\%}$
test_getitem[slice_int] 47.0300μs 22.8566μs 43.7511 KOps/s 40.2683 KOps/s $\textbf{\color{#35bf28}+8.65\%}$
test_getitem[range] 63.2610μs 40.2241μs 24.8607 KOps/s 22.3154 KOps/s $\textbf{\color{#35bf28}+11.41\%}$
test_getitem[tuple] 41.6120μs 19.4752μs 51.3475 KOps/s 46.2564 KOps/s $\textbf{\color{#35bf28}+11.01\%}$
test_getitem[list] 0.2767ms 36.6105μs 27.3146 KOps/s 24.6156 KOps/s $\textbf{\color{#35bf28}+10.96\%}$
test_setitem_dim[int] 44.1410μs 26.5411μs 37.6775 KOps/s 35.5083 KOps/s $\textbf{\color{#35bf28}+6.11\%}$
test_setitem_dim[slice_int] 64.3410μs 46.7091μs 21.4091 KOps/s 20.3892 KOps/s $\textbf{\color{#35bf28}+5.00\%}$
test_setitem_dim[range] 83.2220μs 64.5080μs 15.5019 KOps/s 15.0643 KOps/s $\color{#35bf28}+2.90\%$
test_setitem_dim[tuple] 68.5710μs 39.4679μs 25.3371 KOps/s 24.1922 KOps/s $\color{#35bf28}+4.73\%$
test_setitem 70.6710μs 17.7113μs 56.4610 KOps/s 51.5546 KOps/s $\textbf{\color{#35bf28}+9.52\%}$
test_set 70.0310μs 17.2349μs 58.0217 KOps/s 52.5466 KOps/s $\textbf{\color{#35bf28}+10.42\%}$
test_set_shared 2.7865ms 0.1037ms 9.6386 KOps/s 8.1589 KOps/s $\textbf{\color{#35bf28}+18.14\%}$
test_update 95.4920μs 18.6072μs 53.7425 KOps/s 48.6966 KOps/s $\textbf{\color{#35bf28}+10.36\%}$
test_update_nested 89.7120μs 25.2409μs 39.6182 KOps/s 37.1785 KOps/s $\textbf{\color{#35bf28}+6.56\%}$
test_set_nested 81.5910μs 18.5839μs 53.8101 KOps/s 50.3833 KOps/s $\textbf{\color{#35bf28}+6.80\%}$
test_set_nested_new 75.2210μs 22.8331μs 43.7960 KOps/s 40.7562 KOps/s $\textbf{\color{#35bf28}+7.46\%}$
test_select 92.0010μs 44.7440μs 22.3494 KOps/s 20.9956 KOps/s $\textbf{\color{#35bf28}+6.45\%}$
test_to 74.5520μs 52.3221μs 19.1124 KOps/s 18.6552 KOps/s $\color{#35bf28}+2.45\%$
test_to_nonblocking 64.5910μs 34.6905μs 28.8264 KOps/s 27.9686 KOps/s $\color{#35bf28}+3.07\%$
test_unbind_speed 0.4061ms 0.3556ms 2.8124 KOps/s 2.7079 KOps/s $\color{#35bf28}+3.86\%$
test_unbind_speed_stack0 62.8848ms 4.2907ms 233.0621 Ops/s 239.4317 Ops/s $\color{#d91a1a}-2.66\%$
test_unbind_speed_stack1 1.3995μs 0.5329μs 1.8766 MOps/s 1.9019 MOps/s $\color{#d91a1a}-1.33\%$
test_split 56.7696ms 1.7481ms 572.0379 Ops/s 549.6593 Ops/s $\color{#35bf28}+4.07\%$
test_chunk 53.1605ms 1.7247ms 579.8175 Ops/s 556.3689 Ops/s $\color{#35bf28}+4.21\%$
test_creation[device0] 0.3802ms 0.3116ms 3.2091 KOps/s 3.2392 KOps/s $\color{#d91a1a}-0.93\%$
test_creation[device1] 0.6818ms 0.3149ms 3.1756 KOps/s 3.2090 KOps/s $\color{#d91a1a}-1.04\%$
test_creation_from_tensor 0.6600ms 0.3405ms 2.9372 KOps/s 2.9619 KOps/s $\color{#d91a1a}-0.83\%$
test_add_one[memmap_tensor0] 71.9610μs 23.4717μs 42.6045 KOps/s 39.7465 KOps/s $\textbf{\color{#35bf28}+7.19\%}$
test_add_one[memmap_tensor1] 0.2073ms 72.6632μs 13.7621 KOps/s 13.4872 KOps/s $\color{#35bf28}+2.04\%$
test_contiguous[memmap_tensor0] 36.1710μs 5.8779μs 170.1278 KOps/s 168.4631 KOps/s $\color{#35bf28}+0.99\%$
test_contiguous[memmap_tensor1] 49.7110μs 21.3424μs 46.8550 KOps/s 45.2468 KOps/s $\color{#35bf28}+3.55\%$
test_stack[memmap_tensor0] 48.7810μs 19.8176μs 50.4602 KOps/s 48.2755 KOps/s $\color{#35bf28}+4.53\%$
test_stack[memmap_tensor1] 0.1588ms 75.4282μs 13.2576 KOps/s 13.3119 KOps/s $\color{#d91a1a}-0.41\%$
test_memmaptd_index 0.2787ms 0.2312ms 4.3253 KOps/s 4.0514 KOps/s $\textbf{\color{#35bf28}+6.76\%}$
test_memmaptd_index_astensor 0.3179ms 0.2886ms 3.4656 KOps/s 3.2567 KOps/s $\textbf{\color{#35bf28}+6.41\%}$
test_memmaptd_index_op 0.6270ms 0.5645ms 1.7715 KOps/s 1.6534 KOps/s $\textbf{\color{#35bf28}+7.14\%}$
test_reshape_pytree 54.2610μs 20.5521μs 48.6569 KOps/s 46.5656 KOps/s $\color{#35bf28}+4.49\%$
test_reshape_td 50.1910μs 30.1735μs 33.1417 KOps/s 31.6320 KOps/s $\color{#35bf28}+4.77\%$
test_view_pytree 0.3601ms 20.5351μs 48.6972 KOps/s 46.9877 KOps/s $\color{#35bf28}+3.64\%$
test_view_td 18.4400μs 4.0356μs 247.7927 KOps/s 248.7242 KOps/s $\color{#d91a1a}-0.37\%$
test_unbind_pytree 45.0800μs 25.2305μs 39.6346 KOps/s 37.8337 KOps/s $\color{#35bf28}+4.76\%$
test_unbind_td 84.2410μs 56.3395μs 17.7495 KOps/s 17.0004 KOps/s $\color{#35bf28}+4.41\%$
test_split_pytree 40.2200μs 23.7962μs 42.0235 KOps/s 40.3359 KOps/s $\color{#35bf28}+4.18\%$
test_split_td 69.1110μs 42.8427μs 23.3412 KOps/s 21.9260 KOps/s $\textbf{\color{#35bf28}+6.45\%}$
test_add_pytree 51.6500μs 31.6844μs 31.5613 KOps/s 29.2840 KOps/s $\textbf{\color{#35bf28}+7.78\%}$
test_add_td 75.5020μs 45.6111μs 21.9245 KOps/s 20.5375 KOps/s $\textbf{\color{#35bf28}+6.75\%}$
test_distributed 20.4700μs 5.5633μs 179.7489 KOps/s 182.1728 KOps/s $\color{#d91a1a}-1.33\%$
test_tdmodule 33.0200μs 16.6078μs 60.2126 KOps/s 58.3780 KOps/s $\color{#35bf28}+3.14\%$
test_tdmodule_dispatch 0.1334ms 32.8319μs 30.4582 KOps/s 29.7790 KOps/s $\color{#35bf28}+2.28\%$
test_tdseq 35.5600μs 19.8340μs 50.4184 KOps/s 49.7682 KOps/s $\color{#35bf28}+1.31\%$
test_tdseq_dispatch 56.5610μs 36.2428μs 27.5917 KOps/s 27.1695 KOps/s $\color{#35bf28}+1.55\%$
test_instantiation_functorch 1.9746ms 1.6563ms 603.7714 Ops/s 582.8394 Ops/s $\color{#35bf28}+3.59\%$
test_instantiation_td 1.7000ms 1.1544ms 866.2190 Ops/s 844.6354 Ops/s $\color{#35bf28}+2.56\%$
test_exec_functorch 0.2036ms 0.1551ms 6.4464 KOps/s 6.1126 KOps/s $\textbf{\color{#35bf28}+5.46\%}$
test_exec_functional_call 0.2129ms 0.1568ms 6.3763 KOps/s 6.1643 KOps/s $\color{#35bf28}+3.44\%$
test_exec_td 0.1793ms 0.1483ms 6.7432 KOps/s 6.5354 KOps/s $\color{#35bf28}+3.18\%$
test_exec_td_decorator 0.8534ms 0.1844ms 5.4238 KOps/s 5.1998 KOps/s $\color{#35bf28}+4.31\%$
test_vmap_mlp_speed[True-True] 1.2860ms 1.0891ms 918.1929 Ops/s 915.4228 Ops/s $\color{#35bf28}+0.30\%$
test_vmap_mlp_speed[True-False] 0.6753ms 0.6234ms 1.6040 KOps/s 1.5898 KOps/s $\color{#35bf28}+0.89\%$
test_vmap_mlp_speed[False-True] 1.1643ms 1.0006ms 999.4174 Ops/s 946.7073 Ops/s $\textbf{\color{#35bf28}+5.57\%}$
test_vmap_mlp_speed[False-False] 0.6119ms 0.5535ms 1.8068 KOps/s 1.7157 KOps/s $\textbf{\color{#35bf28}+5.31\%}$
test_vmap_mlp_speed_decorator[True-True] 3.0797ms 2.0732ms 482.3367 Ops/s 468.6838 Ops/s $\color{#35bf28}+2.91\%$
test_vmap_mlp_speed_decorator[True-False] 1.2251ms 0.6684ms 1.4962 KOps/s 1.4974 KOps/s $\color{#d91a1a}-0.08\%$
test_vmap_mlp_speed_decorator[False-True] 2.2328ms 1.8018ms 555.0051 Ops/s 546.1029 Ops/s $\color{#35bf28}+1.63\%$
test_vmap_mlp_speed_decorator[False-False] 1.0516ms 0.5678ms 1.7613 KOps/s 1.7433 KOps/s $\color{#35bf28}+1.03\%$
test_vmap_transformer_speed[True-True] 12.9594ms 12.8713ms 77.6919 Ops/s 77.4304 Ops/s $\color{#35bf28}+0.34\%$
test_vmap_transformer_speed[True-False] 8.4914ms 8.3931ms 119.1451 Ops/s 119.0909 Ops/s $\color{#35bf28}+0.05\%$
test_vmap_transformer_speed[False-True] 12.8107ms 12.7070ms 78.6971 Ops/s 78.5455 Ops/s $\color{#35bf28}+0.19\%$
test_vmap_transformer_speed[False-False] 8.3901ms 8.3066ms 120.3868 Ops/s 118.9734 Ops/s $\color{#35bf28}+1.19\%$
test_vmap_transformer_speed_decorator[True-True] 0.1441s 70.5959ms 14.1651 Ops/s 14.8834 Ops/s $\color{#d91a1a}-4.83\%$
test_vmap_transformer_speed_decorator[True-False] 22.5718ms 20.3111ms 49.2342 Ops/s 49.5731 Ops/s $\color{#d91a1a}-0.68\%$
test_vmap_transformer_speed_decorator[False-True] 60.6943ms 59.5541ms 16.7915 Ops/s 16.5465 Ops/s $\color{#35bf28}+1.48\%$
test_vmap_transformer_speed_decorator[False-False] 22.2465ms 19.8524ms 50.3718 Ops/s 50.5151 Ops/s $\color{#d91a1a}-0.28\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants