-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] Make copy_ a no-op if tensors are identical #588
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
facebook-github-bot
added
the
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
label
Dec 4, 2023
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 34.3240μs | 15.5137μs | 64.4592 KOps/s | 63.5857 KOps/s | |
test_plain_set_stack_nested | 0.2026ms | 0.1433ms | 6.9806 KOps/s | 7.0415 KOps/s | |
test_plain_set_nested_inplace | 57.7370μs | 18.1546μs | 55.0825 KOps/s | 52.2228 KOps/s | |
test_plain_set_stack_nested_inplace | 0.2440ms | 0.1784ms | 5.6064 KOps/s | 5.7409 KOps/s | |
test_items | 19.2260μs | 2.5591μs | 390.7686 KOps/s | 401.0244 KOps/s | |
test_items_nested | 0.4745ms | 0.2686ms | 3.7234 KOps/s | 3.6970 KOps/s | |
test_items_nested_locked | 1.0656ms | 0.2713ms | 3.6854 KOps/s | 3.6823 KOps/s | |
test_items_nested_leaf | 0.5145ms | 0.1667ms | 5.9994 KOps/s | 6.0642 KOps/s | |
test_items_stack_nested | 2.5592ms | 1.4885ms | 671.8376 Ops/s | 667.6541 Ops/s | |
test_items_stack_nested_leaf | 1.8129ms | 1.3581ms | 736.3088 Ops/s | 733.9149 Ops/s | |
test_items_stack_nested_locked | 1.9579ms | 0.7751ms | 1.2901 KOps/s | 1.2752 KOps/s | |
test_keys | 18.1040μs | 3.8704μs | 258.3680 KOps/s | 257.4328 KOps/s | |
test_keys_nested | 0.5422ms | 0.1425ms | 7.0185 KOps/s | 6.6830 KOps/s | |
test_keys_nested_locked | 0.2480ms | 0.1407ms | 7.1094 KOps/s | 7.0322 KOps/s | |
test_keys_nested_leaf | 0.3780ms | 0.1404ms | 7.1223 KOps/s | 6.9889 KOps/s | |
test_keys_stack_nested | 2.0937ms | 1.4088ms | 709.8155 Ops/s | 698.5194 Ops/s | |
test_keys_stack_nested_leaf | 1.7722ms | 1.4154ms | 706.5152 Ops/s | 705.2572 Ops/s | |
test_keys_stack_nested_locked | 1.1328ms | 0.6918ms | 1.4456 KOps/s | 1.4269 KOps/s | |
test_values | 6.6272μs | 1.1789μs | 848.2218 KOps/s | 863.7460 KOps/s | |
test_values_nested | 96.3800μs | 49.7084μs | 20.1173 KOps/s | 20.4129 KOps/s | |
test_values_nested_locked | 95.1530μs | 50.3636μs | 19.8556 KOps/s | 20.0471 KOps/s | |
test_values_nested_leaf | 72.8160μs | 44.3006μs | 22.5730 KOps/s | 22.7943 KOps/s | |
test_values_stack_nested | 1.5174ms | 1.2014ms | 832.3292 Ops/s | 822.6289 Ops/s | |
test_values_stack_nested_leaf | 1.3292ms | 1.1976ms | 835.0236 Ops/s | 826.9352 Ops/s | |
test_values_stack_nested_locked | 0.7989ms | 0.5235ms | 1.9104 KOps/s | 1.8809 KOps/s | |
test_membership | 15.1280μs | 1.3785μs | 725.4074 KOps/s | 749.2656 KOps/s | |
test_membership_nested | 21.1790μs | 2.8088μs | 356.0206 KOps/s | 359.2406 KOps/s | |
test_membership_nested_leaf | 26.1180μs | 2.8173μs | 354.9481 KOps/s | 358.1263 KOps/s | |
test_membership_stacked_nested | 31.6590μs | 11.9764μs | 83.4975 KOps/s | 84.9651 KOps/s | |
test_membership_stacked_nested_leaf | 30.7970μs | 12.0269μs | 83.1471 KOps/s | 84.0226 KOps/s | |
test_membership_nested_last | 33.1720μs | 5.9636μs | 167.6843 KOps/s | 168.8428 KOps/s | |
test_membership_nested_leaf_last | 26.5800μs | 5.9849μs | 167.0860 KOps/s | 161.9192 KOps/s | |
test_membership_stacked_nested_last | 0.2534ms | 0.1699ms | 5.8848 KOps/s | 5.9290 KOps/s | |
test_membership_stacked_nested_leaf_last | 53.8910μs | 13.8193μs | 72.3626 KOps/s | 72.8154 KOps/s | |
test_nested_getleaf | 54.1200μs | 10.5773μs | 94.5418 KOps/s | 92.2632 KOps/s | |
test_nested_get | 52.6180μs | 10.1162μs | 98.8509 KOps/s | 98.5666 KOps/s | |
test_stacked_getleaf | 0.7195ms | 0.6362ms | 1.5719 KOps/s | 1.5519 KOps/s | |
test_stacked_get | 0.6781ms | 0.6066ms | 1.6485 KOps/s | 1.6517 KOps/s | |
test_nested_getitemleaf | 48.1390μs | 10.7243μs | 93.2466 KOps/s | 92.1082 KOps/s | |
test_nested_getitem | 60.2030μs | 10.2084μs | 97.9585 KOps/s | 97.5673 KOps/s | |
test_stacked_getitemleaf | 0.9300ms | 0.6353ms | 1.5740 KOps/s | 1.5514 KOps/s | |
test_stacked_getitem | 0.8889ms | 0.6080ms | 1.6447 KOps/s | 1.6405 KOps/s | |
test_lock_nested | 59.4649ms | 0.6113ms | 1.6360 KOps/s | 1.8052 KOps/s | |
test_lock_stack_nested | 7.7154ms | 5.0260ms | 198.9673 Ops/s | 199.6216 Ops/s | |
test_unlock_nested | 0.8615ms | 0.4372ms | 2.2871 KOps/s | 2.2740 KOps/s | |
test_unlock_stack_nested | 72.9064ms | 6.9381ms | 144.1312 Ops/s | 143.8644 Ops/s | |
test_flatten_speed | 0.3284ms | 0.2676ms | 3.7370 KOps/s | 3.7325 KOps/s | |
test_unflatten_speed | 0.5485ms | 0.4621ms | 2.1641 KOps/s | 2.1900 KOps/s | |
test_common_ops | 5.0402ms | 0.6618ms | 1.5111 KOps/s | 1.4568 KOps/s | |
test_creation | 19.3960μs | 2.6128μs | 382.7299 KOps/s | 409.7604 KOps/s | |
test_creation_empty | 30.1660μs | 7.9780μs | 125.3449 KOps/s | 119.0596 KOps/s | |
test_creation_nested_1 | 88.3640μs | 11.5635μs | 86.4790 KOps/s | 85.7299 KOps/s | |
test_creation_nested_2 | 56.8060μs | 15.0170μs | 66.5911 KOps/s | 66.3560 KOps/s | |
test_clone | 79.7080μs | 13.6276μs | 73.3803 KOps/s | 74.7041 KOps/s | |
test_getitem[int] | 39.9550μs | 13.1453μs | 76.0730 KOps/s | 77.5432 KOps/s | |
test_getitem[slice_int] | 55.7630μs | 24.4740μs | 40.8597 KOps/s | 40.8222 KOps/s | |
test_getitem[range] | 87.0720μs | 43.4578μs | 23.0108 KOps/s | 22.3807 KOps/s | |
test_getitem[tuple] | 54.7020μs | 20.3588μs | 49.1188 KOps/s | 50.0770 KOps/s | |
test_getitem[list] | 0.2000ms | 38.9722μs | 25.6593 KOps/s | 25.2817 KOps/s | |
test_setitem_dim[int] | 49.6130μs | 27.0287μs | 36.9977 KOps/s | 35.9833 KOps/s | |
test_setitem_dim[slice_int] | 0.1069ms | 51.3309μs | 19.4814 KOps/s | 19.4232 KOps/s | |
test_setitem_dim[range] | 0.1593ms | 69.6675μs | 14.3539 KOps/s | 13.8871 KOps/s | |
test_setitem_dim[tuple] | 94.9970μs | 40.6165μs | 24.6206 KOps/s | 24.8547 KOps/s | |
test_setitem | 80.5700μs | 18.2625μs | 54.7571 KOps/s | 52.9372 KOps/s | |
test_set | 81.9930μs | 17.7308μs | 56.3992 KOps/s | 54.3842 KOps/s | |
test_set_shared | 3.0258ms | 0.1423ms | 7.0287 KOps/s | 6.9274 KOps/s | |
test_update | 90.4380μs | 18.4985μs | 54.0584 KOps/s | 49.1942 KOps/s | |
test_update_nested | 0.1086ms | 25.5289μs | 39.1712 KOps/s | 35.8460 KOps/s | |
test_set_nested | 70.6110μs | 19.1250μs | 52.2875 KOps/s | 49.4471 KOps/s | |
test_set_nested_new | 84.7880μs | 24.2503μs | 41.2367 KOps/s | 39.0173 KOps/s | |
test_select | 0.1115ms | 49.3472μs | 20.2646 KOps/s | 19.6802 KOps/s | |
test_unbind_speed | 0.6768ms | 0.3736ms | 2.6768 KOps/s | 2.7039 KOps/s | |
test_unbind_speed_stack0 | 69.5982ms | 4.8639ms | 205.5959 Ops/s | 212.6370 Ops/s | |
test_unbind_speed_stack1 | 2.6489μs | 0.6606μs | 1.5137 MOps/s | 1.5262 MOps/s | |
test_split | 58.5186ms | 1.7578ms | 568.8790 Ops/s | 559.7347 Ops/s | |
test_chunk | 59.8944ms | 1.7252ms | 579.6269 Ops/s | 571.2625 Ops/s | |
test_creation[device0] | 4.7992ms | 0.2946ms | 3.3943 KOps/s | 3.4630 KOps/s | |
test_creation_from_tensor | 0.8214ms | 0.3235ms | 3.0914 KOps/s | 3.0566 KOps/s | |
test_add_one[memmap_tensor0] | 82.1630μs | 25.2613μs | 39.5862 KOps/s | 38.4226 KOps/s | |
test_contiguous[memmap_tensor0] | 44.3330μs | 5.9114μs | 169.1646 KOps/s | 172.8253 KOps/s | |
test_stack[memmap_tensor0] | 77.3540μs | 19.4255μs | 51.4786 KOps/s | 51.1712 KOps/s | |
test_memmaptd_index | 0.3624ms | 0.1982ms | 5.0445 KOps/s | 5.0749 KOps/s | |
test_memmaptd_index_astensor | 0.3379ms | 0.2568ms | 3.8946 KOps/s | 3.9201 KOps/s | |
test_memmaptd_index_op | 0.5845ms | 0.4893ms | 2.0439 KOps/s | 1.9961 KOps/s | |
test_reshape_pytree | 55.5940μs | 23.1095μs | 43.2723 KOps/s | 42.8726 KOps/s | |
test_reshape_td | 87.2220μs | 32.5483μs | 30.7236 KOps/s | 31.5252 KOps/s | |
test_view_pytree | 66.5740μs | 23.1569μs | 43.1837 KOps/s | 42.2972 KOps/s | |
test_view_td | 21.6510μs | 4.9570μs | 201.7352 KOps/s | 210.0338 KOps/s | |
test_unbind_pytree | 58.8690μs | 26.6528μs | 37.5195 KOps/s | 37.8758 KOps/s | |
test_unbind_td | 0.1195ms | 59.3196μs | 16.8578 KOps/s | 16.9087 KOps/s | |
test_split_pytree | 0.5197ms | 26.2803μs | 38.0513 KOps/s | 37.5366 KOps/s | |
test_split_td | 97.8620μs | 45.9944μs | 21.7418 KOps/s | 21.5416 KOps/s | |
test_add_pytree | 64.4600μs | 31.9839μs | 31.2657 KOps/s | 28.9720 KOps/s | |
test_add_td | 0.1014ms | 44.2918μs | 22.5776 KOps/s | 21.3024 KOps/s | |
test_distributed | 19.5360μs | 6.0735μs | 164.6496 KOps/s | 168.9517 KOps/s | |
test_tdmodule | 0.7940ms | 20.8883μs | 47.8736 KOps/s | 43.9300 KOps/s | |
test_tdmodule_dispatch | 0.2019ms | 37.9662μs | 26.3392 KOps/s | 24.3469 KOps/s | |
test_tdseq | 49.7330μs | 22.8677μs | 43.7298 KOps/s | 40.1651 KOps/s | |
test_tdseq_dispatch | 0.1305ms | 41.4886μs | 24.1030 KOps/s | 22.8316 KOps/s | |
test_instantiation_functorch | 1.6540ms | 1.2925ms | 773.7077 Ops/s | 755.1587 Ops/s | |
test_instantiation_td | 1.5557ms | 1.0296ms | 971.2156 Ops/s | 911.5993 Ops/s | |
test_exec_functorch | 0.2291ms | 0.1616ms | 6.1878 KOps/s | 6.2471 KOps/s | |
test_exec_functional_call | 0.2486ms | 0.1477ms | 6.7709 KOps/s | 6.5906 KOps/s | |
test_exec_td | 0.2870ms | 0.1452ms | 6.8876 KOps/s | 6.8940 KOps/s | |
test_exec_td_decorator | 0.8330ms | 0.1777ms | 5.6270 KOps/s | 5.6048 KOps/s | |
test_vmap_mlp_speed[True-True] | 1.2983ms | 0.8773ms | 1.1399 KOps/s | 1.0741 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.7235ms | 0.4719ms | 2.1193 KOps/s | 2.0876 KOps/s | |
test_vmap_mlp_speed[False-True] | 1.2206ms | 0.7749ms | 1.2904 KOps/s | 1.2358 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.5036ms | 0.3871ms | 2.5835 KOps/s | 2.5322 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 2.3704ms | 1.7310ms | 577.6935 Ops/s | 552.3910 Ops/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.9645ms | 0.5130ms | 1.9491 KOps/s | 1.8999 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 1.9684ms | 1.4463ms | 691.4158 Ops/s | 660.7043 Ops/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.9798ms | 0.3976ms | 2.5151 KOps/s | 2.4738 KOps/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 0.5487ms | 12.8039μs | 78.1014 KOps/s | 78.4907 KOps/s | |
test_plain_set_stack_nested | 0.1362ms | 0.1153ms | 8.6736 KOps/s | 8.2624 KOps/s | |
test_plain_set_nested_inplace | 38.2610μs | 14.1859μs | 70.4927 KOps/s | 66.2438 KOps/s | |
test_plain_set_stack_nested_inplace | 0.1703ms | 0.1433ms | 6.9800 KOps/s | 7.0190 KOps/s | |
test_items | 18.8000μs | 4.6349μs | 215.7559 KOps/s | 211.9976 KOps/s | |
test_items_nested | 0.3910ms | 0.3361ms | 2.9754 KOps/s | 2.9626 KOps/s | |
test_items_nested_locked | 0.3932ms | 0.3377ms | 2.9616 KOps/s | 2.9304 KOps/s | |
test_items_nested_leaf | 0.2326ms | 0.1975ms | 5.0631 KOps/s | 5.0263 KOps/s | |
test_items_stack_nested | 1.5613ms | 1.4754ms | 677.7858 Ops/s | 663.8340 Ops/s | |
test_items_stack_nested_leaf | 1.3634ms | 1.3023ms | 767.8583 Ops/s | 750.1647 Ops/s | |
test_items_stack_nested_locked | 0.8778ms | 0.8236ms | 1.2142 KOps/s | 1.1748 KOps/s | |
test_keys | 20.4400μs | 4.5679μs | 218.9206 KOps/s | 218.8186 KOps/s | |
test_keys_nested | 3.4048ms | 90.8237μs | 11.0103 KOps/s | 11.0846 KOps/s | |
test_keys_nested_locked | 0.1146ms | 90.6584μs | 11.0304 KOps/s | 11.1620 KOps/s | |
test_keys_nested_leaf | 42.8885ms | 86.8877μs | 11.5091 KOps/s | 12.1897 KOps/s | |
test_keys_stack_nested | 1.3792ms | 1.2853ms | 778.0457 Ops/s | 757.7148 Ops/s | |
test_keys_stack_nested_leaf | 1.3551ms | 1.2823ms | 779.8545 Ops/s | 761.8074 Ops/s | |
test_keys_stack_nested_locked | 0.6964ms | 0.6334ms | 1.5789 KOps/s | 1.5455 KOps/s | |
test_values | 6.7037μs | 1.8960μs | 527.4275 KOps/s | 521.5566 KOps/s | |
test_values_nested | 69.5010μs | 42.7003μs | 23.4190 KOps/s | 23.1621 KOps/s | |
test_values_nested_locked | 65.0210μs | 44.9033μs | 22.2701 KOps/s | 22.0585 KOps/s | |
test_values_nested_leaf | 57.2110μs | 37.1368μs | 26.9274 KOps/s | 26.6648 KOps/s | |
test_values_stack_nested | 1.2320ms | 1.1274ms | 887.0293 Ops/s | 868.2667 Ops/s | |
test_values_stack_nested_leaf | 1.1763ms | 1.1100ms | 900.9005 Ops/s | 884.4522 Ops/s | |
test_values_stack_nested_locked | 0.5598ms | 0.5052ms | 1.9796 KOps/s | 1.9399 KOps/s | |
test_membership | 4.0062μs | 0.9318μs | 1.0732 MOps/s | 1.0569 MOps/s | |
test_membership_nested | 21.2410μs | 2.1860μs | 457.4467 KOps/s | 447.5371 KOps/s | |
test_membership_nested_leaf | 10.4200μs | 2.0835μs | 479.9538 KOps/s | 470.6439 KOps/s | |
test_membership_stacked_nested | 39.5010μs | 10.7390μs | 93.1186 KOps/s | 90.7875 KOps/s | |
test_membership_stacked_nested_leaf | 34.7820μs | 10.7329μs | 93.1713 KOps/s | 91.5096 KOps/s | |
test_membership_nested_last | 33.1310μs | 4.5268μs | 220.9073 KOps/s | 218.0709 KOps/s | |
test_membership_nested_leaf_last | 22.7900μs | 4.5028μs | 222.0825 KOps/s | 217.5531 KOps/s | |
test_membership_stacked_nested_last | 0.1605ms | 0.1330ms | 7.5199 KOps/s | 7.4277 KOps/s | |
test_membership_stacked_nested_leaf_last | 29.3910μs | 12.5498μs | 79.6823 KOps/s | 79.2798 KOps/s | |
test_nested_getleaf | 31.9610μs | 8.3613μs | 119.5987 KOps/s | 118.5932 KOps/s | |
test_nested_get | 23.0800μs | 7.8725μs | 127.0241 KOps/s | 125.3909 KOps/s | |
test_stacked_getleaf | 0.6136ms | 0.5612ms | 1.7818 KOps/s | 1.7635 KOps/s | |
test_stacked_get | 0.6167ms | 0.5342ms | 1.8720 KOps/s | 1.8727 KOps/s | |
test_nested_getitemleaf | 23.3700μs | 8.4071μs | 118.9470 KOps/s | 117.6476 KOps/s | |
test_nested_getitem | 29.2500μs | 7.9108μs | 126.4092 KOps/s | 123.6611 KOps/s | |
test_stacked_getitemleaf | 0.8524ms | 0.5657ms | 1.7678 KOps/s | 1.7602 KOps/s | |
test_stacked_getitem | 0.6204ms | 0.5301ms | 1.8865 KOps/s | 1.8821 KOps/s | |
test_lock_nested | 3.2228ms | 0.5505ms | 1.8165 KOps/s | 1.7662 KOps/s | |
test_lock_stack_nested | 82.3490ms | 7.2251ms | 138.4070 Ops/s | 133.6922 Ops/s | |
test_unlock_nested | 2.3443ms | 0.4278ms | 2.3378 KOps/s | 2.2603 KOps/s | |
test_unlock_stack_nested | 69.0773ms | 6.2740ms | 159.3887 Ops/s | 157.8591 Ops/s | |
test_flatten_speed | 0.2246ms | 0.1860ms | 5.3759 KOps/s | 5.3303 KOps/s | |
test_unflatten_speed | 0.3971ms | 0.3639ms | 2.7480 KOps/s | 2.7303 KOps/s | |
test_common_ops | 1.1183ms | 0.5919ms | 1.6895 KOps/s | 1.5927 KOps/s | |
test_creation | 30.9200μs | 2.0432μs | 489.4399 KOps/s | 465.0022 KOps/s | |
test_creation_empty | 18.6600μs | 7.0231μs | 142.3875 KOps/s | 138.2920 KOps/s | |
test_creation_nested_1 | 31.5910μs | 9.3284μs | 107.1998 KOps/s | 105.3544 KOps/s | |
test_creation_nested_2 | 29.3100μs | 11.9502μs | 83.6805 KOps/s | 81.4590 KOps/s | |
test_clone | 79.2410μs | 13.7816μs | 72.5604 KOps/s | 64.7868 KOps/s | |
test_getitem[int] | 26.6210μs | 12.0851μs | 82.7466 KOps/s | 77.7683 KOps/s | |
test_getitem[slice_int] | 47.0300μs | 22.8566μs | 43.7511 KOps/s | 40.2683 KOps/s | |
test_getitem[range] | 63.2610μs | 40.2241μs | 24.8607 KOps/s | 22.3154 KOps/s | |
test_getitem[tuple] | 41.6120μs | 19.4752μs | 51.3475 KOps/s | 46.2564 KOps/s | |
test_getitem[list] | 0.2767ms | 36.6105μs | 27.3146 KOps/s | 24.6156 KOps/s | |
test_setitem_dim[int] | 44.1410μs | 26.5411μs | 37.6775 KOps/s | 35.5083 KOps/s | |
test_setitem_dim[slice_int] | 64.3410μs | 46.7091μs | 21.4091 KOps/s | 20.3892 KOps/s | |
test_setitem_dim[range] | 83.2220μs | 64.5080μs | 15.5019 KOps/s | 15.0643 KOps/s | |
test_setitem_dim[tuple] | 68.5710μs | 39.4679μs | 25.3371 KOps/s | 24.1922 KOps/s | |
test_setitem | 70.6710μs | 17.7113μs | 56.4610 KOps/s | 51.5546 KOps/s | |
test_set | 70.0310μs | 17.2349μs | 58.0217 KOps/s | 52.5466 KOps/s | |
test_set_shared | 2.7865ms | 0.1037ms | 9.6386 KOps/s | 8.1589 KOps/s | |
test_update | 95.4920μs | 18.6072μs | 53.7425 KOps/s | 48.6966 KOps/s | |
test_update_nested | 89.7120μs | 25.2409μs | 39.6182 KOps/s | 37.1785 KOps/s | |
test_set_nested | 81.5910μs | 18.5839μs | 53.8101 KOps/s | 50.3833 KOps/s | |
test_set_nested_new | 75.2210μs | 22.8331μs | 43.7960 KOps/s | 40.7562 KOps/s | |
test_select | 92.0010μs | 44.7440μs | 22.3494 KOps/s | 20.9956 KOps/s | |
test_to | 74.5520μs | 52.3221μs | 19.1124 KOps/s | 18.6552 KOps/s | |
test_to_nonblocking | 64.5910μs | 34.6905μs | 28.8264 KOps/s | 27.9686 KOps/s | |
test_unbind_speed | 0.4061ms | 0.3556ms | 2.8124 KOps/s | 2.7079 KOps/s | |
test_unbind_speed_stack0 | 62.8848ms | 4.2907ms | 233.0621 Ops/s | 239.4317 Ops/s | |
test_unbind_speed_stack1 | 1.3995μs | 0.5329μs | 1.8766 MOps/s | 1.9019 MOps/s | |
test_split | 56.7696ms | 1.7481ms | 572.0379 Ops/s | 549.6593 Ops/s | |
test_chunk | 53.1605ms | 1.7247ms | 579.8175 Ops/s | 556.3689 Ops/s | |
test_creation[device0] | 0.3802ms | 0.3116ms | 3.2091 KOps/s | 3.2392 KOps/s | |
test_creation[device1] | 0.6818ms | 0.3149ms | 3.1756 KOps/s | 3.2090 KOps/s | |
test_creation_from_tensor | 0.6600ms | 0.3405ms | 2.9372 KOps/s | 2.9619 KOps/s | |
test_add_one[memmap_tensor0] | 71.9610μs | 23.4717μs | 42.6045 KOps/s | 39.7465 KOps/s | |
test_add_one[memmap_tensor1] | 0.2073ms | 72.6632μs | 13.7621 KOps/s | 13.4872 KOps/s | |
test_contiguous[memmap_tensor0] | 36.1710μs | 5.8779μs | 170.1278 KOps/s | 168.4631 KOps/s | |
test_contiguous[memmap_tensor1] | 49.7110μs | 21.3424μs | 46.8550 KOps/s | 45.2468 KOps/s | |
test_stack[memmap_tensor0] | 48.7810μs | 19.8176μs | 50.4602 KOps/s | 48.2755 KOps/s | |
test_stack[memmap_tensor1] | 0.1588ms | 75.4282μs | 13.2576 KOps/s | 13.3119 KOps/s | |
test_memmaptd_index | 0.2787ms | 0.2312ms | 4.3253 KOps/s | 4.0514 KOps/s | |
test_memmaptd_index_astensor | 0.3179ms | 0.2886ms | 3.4656 KOps/s | 3.2567 KOps/s | |
test_memmaptd_index_op | 0.6270ms | 0.5645ms | 1.7715 KOps/s | 1.6534 KOps/s | |
test_reshape_pytree | 54.2610μs | 20.5521μs | 48.6569 KOps/s | 46.5656 KOps/s | |
test_reshape_td | 50.1910μs | 30.1735μs | 33.1417 KOps/s | 31.6320 KOps/s | |
test_view_pytree | 0.3601ms | 20.5351μs | 48.6972 KOps/s | 46.9877 KOps/s | |
test_view_td | 18.4400μs | 4.0356μs | 247.7927 KOps/s | 248.7242 KOps/s | |
test_unbind_pytree | 45.0800μs | 25.2305μs | 39.6346 KOps/s | 37.8337 KOps/s | |
test_unbind_td | 84.2410μs | 56.3395μs | 17.7495 KOps/s | 17.0004 KOps/s | |
test_split_pytree | 40.2200μs | 23.7962μs | 42.0235 KOps/s | 40.3359 KOps/s | |
test_split_td | 69.1110μs | 42.8427μs | 23.3412 KOps/s | 21.9260 KOps/s | |
test_add_pytree | 51.6500μs | 31.6844μs | 31.5613 KOps/s | 29.2840 KOps/s | |
test_add_td | 75.5020μs | 45.6111μs | 21.9245 KOps/s | 20.5375 KOps/s | |
test_distributed | 20.4700μs | 5.5633μs | 179.7489 KOps/s | 182.1728 KOps/s | |
test_tdmodule | 33.0200μs | 16.6078μs | 60.2126 KOps/s | 58.3780 KOps/s | |
test_tdmodule_dispatch | 0.1334ms | 32.8319μs | 30.4582 KOps/s | 29.7790 KOps/s | |
test_tdseq | 35.5600μs | 19.8340μs | 50.4184 KOps/s | 49.7682 KOps/s | |
test_tdseq_dispatch | 56.5610μs | 36.2428μs | 27.5917 KOps/s | 27.1695 KOps/s | |
test_instantiation_functorch | 1.9746ms | 1.6563ms | 603.7714 Ops/s | 582.8394 Ops/s | |
test_instantiation_td | 1.7000ms | 1.1544ms | 866.2190 Ops/s | 844.6354 Ops/s | |
test_exec_functorch | 0.2036ms | 0.1551ms | 6.4464 KOps/s | 6.1126 KOps/s | |
test_exec_functional_call | 0.2129ms | 0.1568ms | 6.3763 KOps/s | 6.1643 KOps/s | |
test_exec_td | 0.1793ms | 0.1483ms | 6.7432 KOps/s | 6.5354 KOps/s | |
test_exec_td_decorator | 0.8534ms | 0.1844ms | 5.4238 KOps/s | 5.1998 KOps/s | |
test_vmap_mlp_speed[True-True] | 1.2860ms | 1.0891ms | 918.1929 Ops/s | 915.4228 Ops/s | |
test_vmap_mlp_speed[True-False] | 0.6753ms | 0.6234ms | 1.6040 KOps/s | 1.5898 KOps/s | |
test_vmap_mlp_speed[False-True] | 1.1643ms | 1.0006ms | 999.4174 Ops/s | 946.7073 Ops/s | |
test_vmap_mlp_speed[False-False] | 0.6119ms | 0.5535ms | 1.8068 KOps/s | 1.7157 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 3.0797ms | 2.0732ms | 482.3367 Ops/s | 468.6838 Ops/s | |
test_vmap_mlp_speed_decorator[True-False] | 1.2251ms | 0.6684ms | 1.4962 KOps/s | 1.4974 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 2.2328ms | 1.8018ms | 555.0051 Ops/s | 546.1029 Ops/s | |
test_vmap_mlp_speed_decorator[False-False] | 1.0516ms | 0.5678ms | 1.7613 KOps/s | 1.7433 KOps/s | |
test_vmap_transformer_speed[True-True] | 12.9594ms | 12.8713ms | 77.6919 Ops/s | 77.4304 Ops/s | |
test_vmap_transformer_speed[True-False] | 8.4914ms | 8.3931ms | 119.1451 Ops/s | 119.0909 Ops/s | |
test_vmap_transformer_speed[False-True] | 12.8107ms | 12.7070ms | 78.6971 Ops/s | 78.5455 Ops/s | |
test_vmap_transformer_speed[False-False] | 8.3901ms | 8.3066ms | 120.3868 Ops/s | 118.9734 Ops/s | |
test_vmap_transformer_speed_decorator[True-True] | 0.1441s | 70.5959ms | 14.1651 Ops/s | 14.8834 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 22.5718ms | 20.3111ms | 49.2342 Ops/s | 49.5731 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 60.6943ms | 59.5541ms | 16.7915 Ops/s | 16.5465 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 22.2465ms | 19.8524ms | 50.3718 Ops/s | 50.5151 Ops/s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Performance
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Updated version of #510