-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Faster to_module #670
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
facebook-github-bot
added
the
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
label
Feb 7, 2024
vmoens
added
bug
Something isn't working
Performance
and removed
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
labels
Feb 7, 2024
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 53.6000μs | 18.0033μs | 55.5453 KOps/s | 58.8017 KOps/s | |
test_plain_set_stack_nested | 0.2750ms | 0.1456ms | 6.8704 KOps/s | 6.7871 KOps/s | |
test_plain_set_nested_inplace | 92.3620μs | 20.2030μs | 49.4975 KOps/s | 50.8877 KOps/s | |
test_plain_set_stack_nested_inplace | 0.3514ms | 0.1808ms | 5.5313 KOps/s | 5.5650 KOps/s | |
test_items | 24.1750μs | 2.5004μs | 399.9429 KOps/s | 381.9039 KOps/s | |
test_items_nested | 0.7460ms | 0.2675ms | 3.7380 KOps/s | 3.6496 KOps/s | |
test_items_nested_locked | 0.4972ms | 0.2657ms | 3.7632 KOps/s | 3.6447 KOps/s | |
test_items_nested_leaf | 0.5889ms | 0.1666ms | 6.0032 KOps/s | 5.9124 KOps/s | |
test_items_stack_nested | 1.4136ms | 1.3049ms | 766.3231 Ops/s | 746.4975 Ops/s | |
test_items_stack_nested_leaf | 2.4271ms | 1.1864ms | 842.8616 Ops/s | 835.4304 Ops/s | |
test_items_stack_nested_locked | 1.5362ms | 0.8696ms | 1.1500 KOps/s | 1.1408 KOps/s | |
test_keys | 30.0160μs | 3.8066μs | 262.7023 KOps/s | 248.7447 KOps/s | |
test_keys_nested | 1.7078ms | 0.1471ms | 6.7977 KOps/s | 6.5215 KOps/s | |
test_keys_nested_locked | 0.2857ms | 0.1497ms | 6.6802 KOps/s | 6.4639 KOps/s | |
test_keys_nested_leaf | 0.2468ms | 0.1296ms | 7.7164 KOps/s | 7.4526 KOps/s | |
test_keys_stack_nested | 1.5248ms | 1.2529ms | 798.1217 Ops/s | 769.1691 Ops/s | |
test_keys_stack_nested_leaf | 1.8772ms | 1.2574ms | 795.2800 Ops/s | 770.9818 Ops/s | |
test_keys_stack_nested_locked | 0.9825ms | 0.8009ms | 1.2486 KOps/s | 1.2276 KOps/s | |
test_values | 6.3995μs | 1.1841μs | 844.5308 KOps/s | 855.4499 KOps/s | |
test_values_nested | 0.1006ms | 52.1098μs | 19.1902 KOps/s | 19.0130 KOps/s | |
test_values_nested_locked | 0.1037ms | 52.5131μs | 19.0429 KOps/s | 19.1352 KOps/s | |
test_values_nested_leaf | 0.1031ms | 46.5383μs | 21.4877 KOps/s | 21.5069 KOps/s | |
test_values_stack_nested | 1.6422ms | 1.0272ms | 973.5244 Ops/s | 912.5977 Ops/s | |
test_values_stack_nested_leaf | 1.2725ms | 1.0150ms | 985.2264 Ops/s | 956.2000 Ops/s | |
test_values_stack_nested_locked | 0.8648ms | 0.6034ms | 1.6573 KOps/s | 1.6952 KOps/s | |
test_membership | 17.0630μs | 1.3878μs | 720.5518 KOps/s | 731.9382 KOps/s | |
test_membership_nested | 22.8530μs | 3.4096μs | 293.2933 KOps/s | 285.6917 KOps/s | |
test_membership_nested_leaf | 25.3070μs | 3.4323μs | 291.3486 KOps/s | 290.3875 KOps/s | |
test_membership_stacked_nested | 41.7580μs | 11.8172μs | 84.6227 KOps/s | 83.1155 KOps/s | |
test_membership_stacked_nested_leaf | 42.7100μs | 11.9351μs | 83.7863 KOps/s | 82.5880 KOps/s | |
test_membership_nested_last | 31.5690μs | 6.6986μs | 149.2840 KOps/s | 149.0288 KOps/s | |
test_membership_nested_leaf_last | 47.1180μs | 6.7002μs | 149.2486 KOps/s | 149.7351 KOps/s | |
test_membership_stacked_nested_last | 0.3447ms | 0.1793ms | 5.5783 KOps/s | 5.5874 KOps/s | |
test_membership_stacked_nested_leaf_last | 37.7010μs | 13.8237μs | 72.3397 KOps/s | 70.1145 KOps/s | |
test_nested_getleaf | 33.1120μs | 11.0866μs | 90.1992 KOps/s | 94.3157 KOps/s | |
test_nested_get | 31.7100μs | 10.4790μs | 95.4287 KOps/s | 99.1645 KOps/s | |
test_stacked_getleaf | 0.6867ms | 0.3943ms | 2.5360 KOps/s | 2.4697 KOps/s | |
test_stacked_get | 0.5973ms | 0.3624ms | 2.7594 KOps/s | 2.7428 KOps/s | |
test_nested_getitemleaf | 66.4550μs | 12.3420μs | 81.0244 KOps/s | 82.2894 KOps/s | |
test_nested_getitem | 44.6640μs | 11.7299μs | 85.2520 KOps/s | 85.2207 KOps/s | |
test_stacked_getitemleaf | 0.7463ms | 0.3991ms | 2.5058 KOps/s | 2.4612 KOps/s | |
test_stacked_getitem | 0.7270ms | 0.3691ms | 2.7093 KOps/s | 2.6773 KOps/s | |
test_lock_nested | 0.9024ms | 0.3399ms | 2.9423 KOps/s | 2.9301 KOps/s | |
test_lock_stack_nested | 95.7549ms | 6.3271ms | 158.0491 Ops/s | 156.3939 Ops/s | |
test_unlock_nested | 79.4882ms | 0.4189ms | 2.3871 KOps/s | 2.9215 KOps/s | |
test_unlock_stack_nested | 92.3185ms | 6.0636ms | 164.9179 Ops/s | 153.3544 Ops/s | |
test_flatten_speed | 0.6471ms | 0.3744ms | 2.6708 KOps/s | 2.7107 KOps/s | |
test_unflatten_speed | 0.5807ms | 0.4650ms | 2.1504 KOps/s | 2.1532 KOps/s | |
test_common_ops | 1.1686ms | 0.7069ms | 1.4146 KOps/s | 1.4413 KOps/s | |
test_creation | 39.9450μs | 1.8907μs | 528.9084 KOps/s | 543.5454 KOps/s | |
test_creation_empty | 38.1310μs | 11.4694μs | 87.1885 KOps/s | 96.9829 KOps/s | |
test_creation_nested_1 | 35.9470μs | 14.1706μs | 70.5687 KOps/s | 78.8994 KOps/s | |
test_creation_nested_2 | 47.8090μs | 17.6539μs | 56.6446 KOps/s | 63.8874 KOps/s | |
test_clone | 73.3680μs | 13.2312μs | 75.5791 KOps/s | 77.0275 KOps/s | |
test_getitem[int] | 33.7030μs | 11.0547μs | 90.4593 KOps/s | 90.2676 KOps/s | |
test_getitem[slice_int] | 59.8220μs | 22.0785μs | 45.2929 KOps/s | 45.0496 KOps/s | |
test_getitem[range] | 0.1389ms | 41.6026μs | 24.0369 KOps/s | 23.6794 KOps/s | |
test_getitem[tuple] | 54.2110μs | 18.0195μs | 55.4954 KOps/s | 55.2040 KOps/s | |
test_getitem[list] | 0.1356ms | 36.9422μs | 27.0693 KOps/s | 27.0457 KOps/s | |
test_setitem_dim[int] | 58.0890μs | 32.7008μs | 30.5803 KOps/s | 33.6619 KOps/s | |
test_setitem_dim[slice_int] | 96.8110μs | 57.3292μs | 17.4431 KOps/s | 18.2491 KOps/s | |
test_setitem_dim[range] | 0.1561ms | 77.2227μs | 12.9496 KOps/s | 13.1665 KOps/s | |
test_setitem_dim[tuple] | 87.1530μs | 47.1029μs | 21.2301 KOps/s | 22.1517 KOps/s | |
test_setitem | 68.4590μs | 20.6473μs | 48.4324 KOps/s | 52.0020 KOps/s | |
test_set | 60.5840μs | 19.5944μs | 51.0350 KOps/s | 54.0395 KOps/s | |
test_set_shared | 1.5759ms | 0.1372ms | 7.2895 KOps/s | 7.0461 KOps/s | |
test_update | 91.0100μs | 23.1683μs | 43.1625 KOps/s | 46.5333 KOps/s | |
test_update_nested | 0.1360ms | 32.2294μs | 31.0276 KOps/s | 34.6737 KOps/s | |
test_set_nested | 0.1332ms | 21.9861μs | 45.4833 KOps/s | 48.3093 KOps/s | |
test_set_nested_new | 0.1112ms | 25.7565μs | 38.8251 KOps/s | 41.3613 KOps/s | |
test_select | 97.0410μs | 38.2721μs | 26.1287 KOps/s | 26.9904 KOps/s | |
test_select_nested | 0.1146ms | 59.1150μs | 16.9162 KOps/s | 17.3817 KOps/s | |
test_exclude_nested | 0.2120ms | 0.1175ms | 8.5102 KOps/s | 8.5480 KOps/s | |
test_empty[True] | 0.7388ms | 0.4175ms | 2.3952 KOps/s | 2.4371 KOps/s | |
test_empty[False] | 4.2680μs | 1.0444μs | 957.4836 KOps/s | 953.0688 KOps/s | |
test_unbind_speed | 0.4250ms | 0.2455ms | 4.0726 KOps/s | 4.0746 KOps/s | |
test_unbind_speed_stack0 | 74.9682ms | 3.3293ms | 300.3653 Ops/s | 323.8962 Ops/s | |
test_unbind_speed_stack1 | 35.2750μs | 1.9495μs | 512.9616 KOps/s | 497.7862 KOps/s | |
test_split | 2.2595ms | 1.4399ms | 694.4733 Ops/s | 612.3385 Ops/s | |
test_chunk | 70.6711ms | 1.5413ms | 648.7906 Ops/s | 640.8012 Ops/s | |
test_creation[device0] | 0.1788ms | 0.1002ms | 9.9801 KOps/s | 9.8505 KOps/s | |
test_creation_from_tensor | 3.8031ms | 80.0424μs | 12.4934 KOps/s | 11.9530 KOps/s | |
test_add_one[memmap_tensor0] | 0.2187ms | 5.3652μs | 186.3861 KOps/s | 189.1750 KOps/s | |
test_contiguous[memmap_tensor0] | 10.2900μs | 0.6317μs | 1.5830 MOps/s | 1.5611 MOps/s | |
test_stack[memmap_tensor0] | 53.5100μs | 3.7521μs | 266.5181 KOps/s | 276.0825 KOps/s | |
test_memmaptd_index | 0.9795ms | 0.2371ms | 4.2176 KOps/s | 4.2081 KOps/s | |
test_memmaptd_index_astensor | 0.5339ms | 0.2999ms | 3.3345 KOps/s | 3.3416 KOps/s | |
test_memmaptd_index_op | 1.0394ms | 0.6257ms | 1.5982 KOps/s | 1.7065 KOps/s | |
test_serialize_model | 0.1822s | 0.1077s | 9.2891 Ops/s | 8.5473 Ops/s | |
test_serialize_model_pickle | 0.4507s | 0.3791s | 2.6377 Ops/s | 2.5697 Ops/s | |
test_serialize_weights | 0.1719s | 0.1065s | 9.3929 Ops/s | 8.9890 Ops/s | |
test_serialize_weights_returnearly | 0.1986s | 0.1298s | 7.7018 Ops/s | 8.1739 Ops/s | |
test_serialize_weights_pickle | 1.2485s | 0.5569s | 1.7958 Ops/s | 2.2942 Ops/s | |
test_serialize_weights_filesystem | 96.4647ms | 91.6051ms | 10.9164 Ops/s | 9.7001 Ops/s | |
test_serialize_model_filesystem | 99.5745ms | 95.0330ms | 10.5227 Ops/s | 10.3290 Ops/s | |
test_reshape_pytree | 46.4070μs | 20.9921μs | 47.6370 KOps/s | 46.6372 KOps/s | |
test_reshape_td | 78.7880μs | 31.2921μs | 31.9569 KOps/s | 31.7944 KOps/s | |
test_view_pytree | 57.3570μs | 20.8999μs | 47.8470 KOps/s | 48.2074 KOps/s | |
test_view_td | 74.9954ms | 10.6995μs | 93.4625 KOps/s | 87.7485 KOps/s | |
test_unbind_pytree | 52.2680μs | 24.1843μs | 41.3491 KOps/s | 41.4045 KOps/s | |
test_unbind_td | 0.1159ms | 35.7538μs | 27.9690 KOps/s | 27.3610 KOps/s | |
test_split_pytree | 65.8430μs | 24.0229μs | 41.6269 KOps/s | 42.1983 KOps/s | |
test_split_td | 0.5103ms | 39.5599μs | 25.2781 KOps/s | 25.2680 KOps/s | |
test_add_pytree | 65.7730μs | 29.8738μs | 33.4741 KOps/s | 33.2455 KOps/s | |
test_add_td | 0.1276ms | 55.7139μs | 17.9488 KOps/s | 19.3541 KOps/s | |
test_distributed | 0.1946ms | 99.1730μs | 10.0834 KOps/s | 9.8160 KOps/s | |
test_tdmodule | 0.2712ms | 23.9995μs | 41.6675 KOps/s | 44.9528 KOps/s | |
test_tdmodule_dispatch | 0.2220ms | 46.3633μs | 21.5688 KOps/s | 22.4329 KOps/s | |
test_tdseq | 69.4190μs | 26.9914μs | 37.0488 KOps/s | 37.6952 KOps/s | |
test_tdseq_dispatch | 0.3933ms | 49.2713μs | 20.2958 KOps/s | 20.3836 KOps/s | |
test_instantiation_functorch | 1.7431ms | 1.3241ms | 755.2149 Ops/s | 756.4994 Ops/s | |
test_instantiation_td | 1.5243ms | 1.0209ms | 979.5112 Ops/s | 977.2675 Ops/s | |
test_exec_functorch | 0.3018ms | 0.1580ms | 6.3278 KOps/s | 6.2944 KOps/s | |
test_exec_functional_call | 0.2514ms | 0.1432ms | 6.9850 KOps/s | 6.8257 KOps/s | |
test_exec_td | 0.1911ms | 0.1385ms | 7.2204 KOps/s | 6.9453 KOps/s | |
test_exec_td_decorator | 0.8384ms | 0.1732ms | 5.7740 KOps/s | 5.0629 KOps/s | |
test_vmap_mlp_speed[True-True] | 1.0955ms | 0.8953ms | 1.1169 KOps/s | 1.1102 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.6856ms | 0.4740ms | 2.1096 KOps/s | 2.1249 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.9114ms | 0.7823ms | 1.2784 KOps/s | 1.2778 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.5996ms | 0.3888ms | 2.5720 KOps/s | 2.5809 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 1.5223ms | 1.3915ms | 718.6696 Ops/s | 425.8795 Ops/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.8213ms | 0.5195ms | 1.9248 KOps/s | 1.8233 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 1.5247ms | 1.1248ms | 889.0603 Ops/s | 520.1488 Ops/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.5871ms | 0.3949ms | 2.5326 KOps/s | 2.3464 KOps/s | |
test_to_module_speed[True] | 1.3470ms | 1.1185ms | 894.0604 Ops/s | 11.7540 Ops/s | |
test_to_module_speed[False] | 1.1918ms | 1.1005ms | 908.6912 Ops/s | 558.9516 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 0.1185ms | 13.5603μs | 73.7446 KOps/s | 70.4983 KOps/s | |
test_plain_set_stack_nested | 0.1814ms | 0.1214ms | 8.2365 KOps/s | 8.2401 KOps/s | |
test_plain_set_nested_inplace | 43.2320μs | 14.7441μs | 67.8237 KOps/s | 64.5315 KOps/s | |
test_plain_set_stack_nested_inplace | 0.1757ms | 0.1486ms | 6.7306 KOps/s | 6.7380 KOps/s | |
test_items | 24.1910μs | 4.7506μs | 210.4990 KOps/s | 207.6720 KOps/s | |
test_items_nested | 0.3591ms | 0.3384ms | 2.9552 KOps/s | 2.9368 KOps/s | |
test_items_nested_locked | 0.3813ms | 0.3419ms | 2.9245 KOps/s | 2.9020 KOps/s | |
test_items_nested_leaf | 0.2468ms | 0.1992ms | 5.0200 KOps/s | 4.9757 KOps/s | |
test_items_stack_nested | 1.3672ms | 1.3155ms | 760.1552 Ops/s | 757.7172 Ops/s | |
test_items_stack_nested_leaf | 1.2118ms | 1.1513ms | 868.5805 Ops/s | 851.0511 Ops/s | |
test_items_stack_nested_locked | 1.1275ms | 0.9001ms | 1.1110 KOps/s | 1.1001 KOps/s | |
test_keys | 28.2610μs | 4.5985μs | 217.4621 KOps/s | 206.6301 KOps/s | |
test_keys_nested | 0.8460ms | 95.3862μs | 10.4837 KOps/s | 10.4982 KOps/s | |
test_keys_nested_locked | 0.1330ms | 98.9851μs | 10.1025 KOps/s | 10.1135 KOps/s | |
test_keys_nested_leaf | 0.1807ms | 79.2926μs | 12.6115 KOps/s | 12.7041 KOps/s | |
test_keys_stack_nested | 1.2282ms | 1.1561ms | 865.0130 Ops/s | 865.2477 Ops/s | |
test_keys_stack_nested_leaf | 1.2155ms | 1.1329ms | 882.6615 Ops/s | 879.7479 Ops/s | |
test_keys_stack_nested_locked | 0.7710ms | 0.7202ms | 1.3886 KOps/s | 1.3845 KOps/s | |
test_values | 9.4740μs | 1.9047μs | 525.0140 KOps/s | 524.7419 KOps/s | |
test_values_nested | 73.0630μs | 45.2728μs | 22.0883 KOps/s | 21.9477 KOps/s | |
test_values_nested_locked | 69.6030μs | 47.5646μs | 21.0241 KOps/s | 20.7860 KOps/s | |
test_values_nested_leaf | 61.3530μs | 39.3949μs | 25.3840 KOps/s | 25.1331 KOps/s | |
test_values_stack_nested | 1.0317ms | 0.9659ms | 1.0353 KOps/s | 1.0369 KOps/s | |
test_values_stack_nested_leaf | 1.2919ms | 0.9584ms | 1.0434 KOps/s | 1.0413 KOps/s | |
test_values_stack_nested_locked | 0.6345ms | 0.5723ms | 1.7474 KOps/s | 1.7377 KOps/s | |
test_membership | 6.0204μs | 0.9623μs | 1.0391 MOps/s | 1.0370 MOps/s | |
test_membership_nested | 30.8310μs | 2.9709μs | 336.5995 KOps/s | 341.4024 KOps/s | |
test_membership_nested_leaf | 18.6910μs | 2.9674μs | 336.9961 KOps/s | 340.3234 KOps/s | |
test_membership_stacked_nested | 45.1620μs | 11.2740μs | 88.7000 KOps/s | 87.6082 KOps/s | |
test_membership_stacked_nested_leaf | 41.5520μs | 11.2340μs | 89.0152 KOps/s | 86.8838 KOps/s | |
test_membership_nested_last | 36.8610μs | 5.3673μs | 186.3136 KOps/s | 185.9201 KOps/s | |
test_membership_nested_leaf_last | 34.0420μs | 5.3809μs | 185.8436 KOps/s | 185.7053 KOps/s | |
test_membership_stacked_nested_last | 0.1907ms | 0.1573ms | 6.3561 KOps/s | 6.3462 KOps/s | |
test_membership_stacked_nested_leaf_last | 51.2630μs | 13.2763μs | 75.3219 KOps/s | 74.2949 KOps/s | |
test_nested_getleaf | 32.0210μs | 8.4654μs | 118.1281 KOps/s | 118.4940 KOps/s | |
test_nested_get | 32.3010μs | 7.9947μs | 125.0834 KOps/s | 125.2686 KOps/s | |
test_stacked_getleaf | 0.3775ms | 0.3297ms | 3.0330 KOps/s | 3.0268 KOps/s | |
test_stacked_get | 0.3394ms | 0.3010ms | 3.3220 KOps/s | 3.3761 KOps/s | |
test_nested_getitemleaf | 32.6410μs | 9.8272μs | 101.7588 KOps/s | 101.6861 KOps/s | |
test_nested_getitem | 37.9610μs | 9.3828μs | 106.5782 KOps/s | 106.5135 KOps/s | |
test_stacked_getitemleaf | 0.3729ms | 0.3339ms | 2.9948 KOps/s | 3.0053 KOps/s | |
test_stacked_getitem | 0.3342ms | 0.3017ms | 3.3143 KOps/s | 3.3532 KOps/s | |
test_lock_nested | 1.2830ms | 0.3516ms | 2.8440 KOps/s | 2.8018 KOps/s | |
test_lock_stack_nested | 86.5822ms | 6.3648ms | 157.1140 Ops/s | 158.6337 Ops/s | |
test_unlock_nested | 78.8698ms | 0.4313ms | 2.3183 KOps/s | 2.8641 KOps/s | |
test_unlock_stack_nested | 86.9765ms | 6.4529ms | 154.9702 Ops/s | 154.2172 Ops/s | |
test_flatten_speed | 0.3521ms | 0.2620ms | 3.8174 KOps/s | 3.8483 KOps/s | |
test_unflatten_speed | 0.3970ms | 0.3639ms | 2.7478 KOps/s | 2.7788 KOps/s | |
test_common_ops | 1.0726ms | 0.5875ms | 1.7020 KOps/s | 1.6185 KOps/s | |
test_creation | 17.3710μs | 1.5946μs | 627.1148 KOps/s | 637.8006 KOps/s | |
test_creation_empty | 29.2520μs | 7.7463μs | 129.0941 KOps/s | 107.6475 KOps/s | |
test_creation_nested_1 | 27.9820μs | 9.4385μs | 105.9486 KOps/s | 91.6434 KOps/s | |
test_creation_nested_2 | 43.6220μs | 11.8500μs | 84.3881 KOps/s | 75.0658 KOps/s | |
test_clone | 66.9730μs | 13.9705μs | 71.5793 KOps/s | 74.0051 KOps/s | |
test_getitem[int] | 63.1730μs | 10.8536μs | 92.1356 KOps/s | 92.4172 KOps/s | |
test_getitem[slice_int] | 40.7820μs | 21.2742μs | 47.0052 KOps/s | 47.1444 KOps/s | |
test_getitem[range] | 0.1084ms | 39.9886μs | 25.0071 KOps/s | 24.8483 KOps/s | |
test_getitem[tuple] | 39.0920μs | 18.7100μs | 53.4473 KOps/s | 54.0126 KOps/s | |
test_getitem[list] | 0.1426ms | 35.8717μs | 27.8772 KOps/s | 27.2076 KOps/s | |
test_setitem_dim[int] | 41.2520μs | 25.4094μs | 39.3555 KOps/s | 36.7651 KOps/s | |
test_setitem_dim[slice_int] | 64.3420μs | 47.2320μs | 21.1721 KOps/s | 20.1791 KOps/s | |
test_setitem_dim[range] | 84.2840μs | 66.3289μs | 15.0764 KOps/s | 14.4743 KOps/s | |
test_setitem_dim[tuple] | 57.1130μs | 40.4486μs | 24.7228 KOps/s | 24.3979 KOps/s | |
test_setitem | 52.2830μs | 18.2281μs | 54.8603 KOps/s | 51.9647 KOps/s | |
test_set | 49.5320μs | 18.5398μs | 53.9380 KOps/s | 51.9823 KOps/s | |
test_set_shared | 2.9129ms | 0.1048ms | 9.5448 KOps/s | 9.3641 KOps/s | |
test_update | 99.7450μs | 19.8421μs | 50.3978 KOps/s | 43.9893 KOps/s | |
test_update_nested | 95.1650μs | 26.5095μs | 37.7223 KOps/s | 34.4984 KOps/s | |
test_set_nested | 57.2130μs | 18.9737μs | 52.7045 KOps/s | 51.0296 KOps/s | |
test_set_nested_new | 63.6930μs | 21.7872μs | 45.8985 KOps/s | 44.6587 KOps/s | |
test_select | 80.5640μs | 34.2480μs | 29.1988 KOps/s | 27.3312 KOps/s | |
test_select_nested | 75.0530μs | 53.0505μs | 18.8500 KOps/s | 18.7605 KOps/s | |
test_exclude_nested | 0.1467ms | 0.1138ms | 8.7873 KOps/s | 8.8556 KOps/s | |
test_empty[True] | 0.4277ms | 0.3852ms | 2.5962 KOps/s | 2.5883 KOps/s | |
test_empty[False] | 2.7591μs | 0.8631μs | 1.1586 MOps/s | 1.1811 MOps/s | |
test_to | 73.9730μs | 56.1026μs | 17.8245 KOps/s | 18.5376 KOps/s | |
test_to_nonblocking | 70.2130μs | 34.4637μs | 29.0160 KOps/s | 29.1478 KOps/s | |
test_unbind_speed | 0.3043ms | 0.2720ms | 3.6765 KOps/s | 3.7364 KOps/s | |
test_unbind_speed_stack0 | 87.1292ms | 3.7809ms | 264.4848 Ops/s | 284.6167 Ops/s | |
test_unbind_speed_stack1 | 37.3420μs | 1.8025μs | 554.7903 KOps/s | 568.7409 KOps/s | |
test_split | 81.1855ms | 1.7195ms | 581.5630 Ops/s | 656.5390 Ops/s | |
test_chunk | 1.5590ms | 1.5246ms | 655.9261 Ops/s | 607.7340 Ops/s | |
test_creation[device0] | 0.1294ms | 73.6648μs | 13.5750 KOps/s | 13.5229 KOps/s | |
test_creation_from_tensor | 0.1365ms | 54.2956μs | 18.4177 KOps/s | 18.2793 KOps/s | |
test_add_one[memmap_tensor0] | 0.1374ms | 7.1947μs | 138.9910 KOps/s | 138.0654 KOps/s | |
test_contiguous[memmap_tensor0] | 11.5810μs | 0.6436μs | 1.5537 MOps/s | 1.5096 MOps/s | |
test_stack[memmap_tensor0] | 38.9210μs | 4.4718μs | 223.6258 KOps/s | 214.1885 KOps/s | |
test_memmaptd_index | 1.0231ms | 0.2688ms | 3.7196 KOps/s | 3.7549 KOps/s | |
test_memmaptd_index_astensor | 0.6525ms | 0.3253ms | 3.0738 KOps/s | 3.1002 KOps/s | |
test_memmaptd_index_op | 0.8777ms | 0.6113ms | 1.6359 KOps/s | 1.5635 KOps/s | |
test_serialize_model | 93.2023ms | 89.0878ms | 11.2249 Ops/s | 9.6930 Ops/s | |
test_serialize_model_pickle | 1.3688s | 1.2388s | 0.8072 Ops/s | 0.8084 Ops/s | |
test_serialize_weights | 0.1723s | 96.0652ms | 10.4096 Ops/s | 10.0019 Ops/s | |
test_serialize_weights_returnearly | 0.1593s | 70.3277ms | 14.2191 Ops/s | 11.9568 Ops/s | |
test_serialize_weights_pickle | 1.3506s | 1.2490s | 0.8006 Ops/s | 0.8089 Ops/s | |
test_reshape_pytree | 55.7320μs | 24.5720μs | 40.6968 KOps/s | 40.3372 KOps/s | |
test_reshape_td | 0.1308ms | 30.6840μs | 32.5903 KOps/s | 32.3698 KOps/s | |
test_view_pytree | 0.1644ms | 25.2434μs | 39.6143 KOps/s | 40.8953 KOps/s | |
test_view_td | 0.3971ms | 6.9119μs | 144.6784 KOps/s | 146.5344 KOps/s | |
test_unbind_pytree | 79.7730μs | 30.3341μs | 32.9662 KOps/s | 32.3667 KOps/s | |
test_unbind_td | 73.7330μs | 40.2314μs | 24.8562 KOps/s | 23.7868 KOps/s | |
test_split_pytree | 55.0120μs | 28.3597μs | 35.2613 KOps/s | 33.2212 KOps/s | |
test_split_td | 0.1045ms | 38.7021μs | 25.8384 KOps/s | 25.3732 KOps/s | |
test_add_pytree | 58.6920μs | 36.5124μs | 27.3880 KOps/s | 27.3375 KOps/s | |
test_add_td | 90.0830μs | 49.2869μs | 20.2894 KOps/s | 19.5456 KOps/s | |
test_distributed | 1.8397ms | 73.6521μs | 13.5773 KOps/s | 14.3901 KOps/s | |
test_tdmodule | 74.9830μs | 17.5820μs | 56.8764 KOps/s | 54.2518 KOps/s | |
test_tdmodule_dispatch | 0.2476ms | 36.3898μs | 27.4802 KOps/s | 25.9127 KOps/s | |
test_tdseq | 39.7820μs | 20.4829μs | 48.8212 KOps/s | 47.0474 KOps/s | |
test_tdseq_dispatch | 59.0720μs | 38.3361μs | 26.0850 KOps/s | 24.8688 KOps/s | |
test_instantiation_functorch | 1.7893ms | 1.6665ms | 600.0746 Ops/s | 598.9359 Ops/s | |
test_instantiation_td | 1.7224ms | 1.1493ms | 870.0820 Ops/s | 860.0573 Ops/s | |
test_exec_functorch | 0.2115ms | 0.1630ms | 6.1355 KOps/s | 6.2441 KOps/s | |
test_exec_functional_call | 0.2119ms | 0.1588ms | 6.2961 KOps/s | 6.3527 KOps/s | |
test_exec_td | 0.1782ms | 0.1485ms | 6.7339 KOps/s | 6.7927 KOps/s | |
test_exec_td_decorator | 0.7917ms | 0.1818ms | 5.5018 KOps/s | 4.9022 KOps/s | |
test_vmap_mlp_speed[True-True] | 1.1123ms | 1.0450ms | 956.9394 Ops/s | 956.7862 Ops/s | |
test_vmap_mlp_speed[True-False] | 0.6498ms | 0.6023ms | 1.6603 KOps/s | 1.6546 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.9913ms | 0.9635ms | 1.0379 KOps/s | 1.0075 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.5627ms | 0.5362ms | 1.8648 KOps/s | 1.8179 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 1.9606ms | 1.5442ms | 647.5781 Ops/s | 421.9918 Ops/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.8118ms | 0.6456ms | 1.5488 KOps/s | 1.4960 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 1.6147ms | 1.3174ms | 759.0493 Ops/s | 504.9767 Ops/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.6539ms | 0.5463ms | 1.8304 KOps/s | 1.7363 KOps/s | |
test_vmap_transformer_speed[True-True] | 12.7322ms | 12.4508ms | 80.3164 Ops/s | 81.5044 Ops/s | |
test_vmap_transformer_speed[True-False] | 8.3226ms | 8.1220ms | 123.1217 Ops/s | 119.9567 Ops/s | |
test_vmap_transformer_speed[False-True] | 12.4387ms | 12.2310ms | 81.7591 Ops/s | 82.5461 Ops/s | |
test_vmap_transformer_speed[False-False] | 8.1931ms | 8.0555ms | 124.1383 Ops/s | 124.1002 Ops/s | |
test_vmap_transformer_speed_decorator[True-True] | 55.0276ms | 53.5612ms | 18.6702 Ops/s | 13.5474 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 19.9144ms | 19.4227ms | 51.4860 Ops/s | 50.8020 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 49.1920ms | 47.9874ms | 20.8388 Ops/s | 15.0706 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 19.0065ms | 18.8867ms | 52.9473 Ops/s | 45.7768 Ops/s | |
test_to_module_speed[True] | 1.1365ms | 1.0201ms | 980.3184 Ops/s | 12.1016 Ops/s | |
test_to_module_speed[False] | 1.1010ms | 0.9909ms | 1.0091 KOps/s | 579.8780 Ops/s |
facebook-github-bot
added
the
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
label
Feb 7, 2024
@matteobettini this is some serious speedup |
vmoens
added a commit
that referenced
this pull request
Feb 26, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Something isn't working
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Performance
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a quick fix for to_module runspeed until we find a better way of doing this.
cc @matteobettini