Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Faster to_module #670

Merged
merged 7 commits into from
Feb 7, 2024
Merged

[BugFix] Faster to_module #670

merged 7 commits into from
Feb 7, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 7, 2024

This is a quick fix for to_module runspeed until we find a better way of doing this.

cc @matteobettini

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 7, 2024
@vmoens vmoens added bug Something isn't working Performance and removed CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. labels Feb 7, 2024
@vmoens vmoens linked an issue Feb 7, 2024 that may be closed by this pull request
Copy link

github-actions bot commented Feb 7, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 126. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}18$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 53.6000μs 18.0033μs 55.5453 KOps/s 58.8017 KOps/s $\textbf{\color{#d91a1a}-5.54\%}$
test_plain_set_stack_nested 0.2750ms 0.1456ms 6.8704 KOps/s 6.7871 KOps/s $\color{#35bf28}+1.23\%$
test_plain_set_nested_inplace 92.3620μs 20.2030μs 49.4975 KOps/s 50.8877 KOps/s $\color{#d91a1a}-2.73\%$
test_plain_set_stack_nested_inplace 0.3514ms 0.1808ms 5.5313 KOps/s 5.5650 KOps/s $\color{#d91a1a}-0.61\%$
test_items 24.1750μs 2.5004μs 399.9429 KOps/s 381.9039 KOps/s $\color{#35bf28}+4.72\%$
test_items_nested 0.7460ms 0.2675ms 3.7380 KOps/s 3.6496 KOps/s $\color{#35bf28}+2.42\%$
test_items_nested_locked 0.4972ms 0.2657ms 3.7632 KOps/s 3.6447 KOps/s $\color{#35bf28}+3.25\%$
test_items_nested_leaf 0.5889ms 0.1666ms 6.0032 KOps/s 5.9124 KOps/s $\color{#35bf28}+1.54\%$
test_items_stack_nested 1.4136ms 1.3049ms 766.3231 Ops/s 746.4975 Ops/s $\color{#35bf28}+2.66\%$
test_items_stack_nested_leaf 2.4271ms 1.1864ms 842.8616 Ops/s 835.4304 Ops/s $\color{#35bf28}+0.89\%$
test_items_stack_nested_locked 1.5362ms 0.8696ms 1.1500 KOps/s 1.1408 KOps/s $\color{#35bf28}+0.80\%$
test_keys 30.0160μs 3.8066μs 262.7023 KOps/s 248.7447 KOps/s $\textbf{\color{#35bf28}+5.61\%}$
test_keys_nested 1.7078ms 0.1471ms 6.7977 KOps/s 6.5215 KOps/s $\color{#35bf28}+4.24\%$
test_keys_nested_locked 0.2857ms 0.1497ms 6.6802 KOps/s 6.4639 KOps/s $\color{#35bf28}+3.35\%$
test_keys_nested_leaf 0.2468ms 0.1296ms 7.7164 KOps/s 7.4526 KOps/s $\color{#35bf28}+3.54\%$
test_keys_stack_nested 1.5248ms 1.2529ms 798.1217 Ops/s 769.1691 Ops/s $\color{#35bf28}+3.76\%$
test_keys_stack_nested_leaf 1.8772ms 1.2574ms 795.2800 Ops/s 770.9818 Ops/s $\color{#35bf28}+3.15\%$
test_keys_stack_nested_locked 0.9825ms 0.8009ms 1.2486 KOps/s 1.2276 KOps/s $\color{#35bf28}+1.71\%$
test_values 6.3995μs 1.1841μs 844.5308 KOps/s 855.4499 KOps/s $\color{#d91a1a}-1.28\%$
test_values_nested 0.1006ms 52.1098μs 19.1902 KOps/s 19.0130 KOps/s $\color{#35bf28}+0.93\%$
test_values_nested_locked 0.1037ms 52.5131μs 19.0429 KOps/s 19.1352 KOps/s $\color{#d91a1a}-0.48\%$
test_values_nested_leaf 0.1031ms 46.5383μs 21.4877 KOps/s 21.5069 KOps/s $\color{#d91a1a}-0.09\%$
test_values_stack_nested 1.6422ms 1.0272ms 973.5244 Ops/s 912.5977 Ops/s $\textbf{\color{#35bf28}+6.68\%}$
test_values_stack_nested_leaf 1.2725ms 1.0150ms 985.2264 Ops/s 956.2000 Ops/s $\color{#35bf28}+3.04\%$
test_values_stack_nested_locked 0.8648ms 0.6034ms 1.6573 KOps/s 1.6952 KOps/s $\color{#d91a1a}-2.24\%$
test_membership 17.0630μs 1.3878μs 720.5518 KOps/s 731.9382 KOps/s $\color{#d91a1a}-1.56\%$
test_membership_nested 22.8530μs 3.4096μs 293.2933 KOps/s 285.6917 KOps/s $\color{#35bf28}+2.66\%$
test_membership_nested_leaf 25.3070μs 3.4323μs 291.3486 KOps/s 290.3875 KOps/s $\color{#35bf28}+0.33\%$
test_membership_stacked_nested 41.7580μs 11.8172μs 84.6227 KOps/s 83.1155 KOps/s $\color{#35bf28}+1.81\%$
test_membership_stacked_nested_leaf 42.7100μs 11.9351μs 83.7863 KOps/s 82.5880 KOps/s $\color{#35bf28}+1.45\%$
test_membership_nested_last 31.5690μs 6.6986μs 149.2840 KOps/s 149.0288 KOps/s $\color{#35bf28}+0.17\%$
test_membership_nested_leaf_last 47.1180μs 6.7002μs 149.2486 KOps/s 149.7351 KOps/s $\color{#d91a1a}-0.32\%$
test_membership_stacked_nested_last 0.3447ms 0.1793ms 5.5783 KOps/s 5.5874 KOps/s $\color{#d91a1a}-0.16\%$
test_membership_stacked_nested_leaf_last 37.7010μs 13.8237μs 72.3397 KOps/s 70.1145 KOps/s $\color{#35bf28}+3.17\%$
test_nested_getleaf 33.1120μs 11.0866μs 90.1992 KOps/s 94.3157 KOps/s $\color{#d91a1a}-4.36\%$
test_nested_get 31.7100μs 10.4790μs 95.4287 KOps/s 99.1645 KOps/s $\color{#d91a1a}-3.77\%$
test_stacked_getleaf 0.6867ms 0.3943ms 2.5360 KOps/s 2.4697 KOps/s $\color{#35bf28}+2.68\%$
test_stacked_get 0.5973ms 0.3624ms 2.7594 KOps/s 2.7428 KOps/s $\color{#35bf28}+0.61\%$
test_nested_getitemleaf 66.4550μs 12.3420μs 81.0244 KOps/s 82.2894 KOps/s $\color{#d91a1a}-1.54\%$
test_nested_getitem 44.6640μs 11.7299μs 85.2520 KOps/s 85.2207 KOps/s $\color{#35bf28}+0.04\%$
test_stacked_getitemleaf 0.7463ms 0.3991ms 2.5058 KOps/s 2.4612 KOps/s $\color{#35bf28}+1.81\%$
test_stacked_getitem 0.7270ms 0.3691ms 2.7093 KOps/s 2.6773 KOps/s $\color{#35bf28}+1.20\%$
test_lock_nested 0.9024ms 0.3399ms 2.9423 KOps/s 2.9301 KOps/s $\color{#35bf28}+0.42\%$
test_lock_stack_nested 95.7549ms 6.3271ms 158.0491 Ops/s 156.3939 Ops/s $\color{#35bf28}+1.06\%$
test_unlock_nested 79.4882ms 0.4189ms 2.3871 KOps/s 2.9215 KOps/s $\textbf{\color{#d91a1a}-18.29\%}$
test_unlock_stack_nested 92.3185ms 6.0636ms 164.9179 Ops/s 153.3544 Ops/s $\textbf{\color{#35bf28}+7.54\%}$
test_flatten_speed 0.6471ms 0.3744ms 2.6708 KOps/s 2.7107 KOps/s $\color{#d91a1a}-1.47\%$
test_unflatten_speed 0.5807ms 0.4650ms 2.1504 KOps/s 2.1532 KOps/s $\color{#d91a1a}-0.13\%$
test_common_ops 1.1686ms 0.7069ms 1.4146 KOps/s 1.4413 KOps/s $\color{#d91a1a}-1.85\%$
test_creation 39.9450μs 1.8907μs 528.9084 KOps/s 543.5454 KOps/s $\color{#d91a1a}-2.69\%$
test_creation_empty 38.1310μs 11.4694μs 87.1885 KOps/s 96.9829 KOps/s $\textbf{\color{#d91a1a}-10.10\%}$
test_creation_nested_1 35.9470μs 14.1706μs 70.5687 KOps/s 78.8994 KOps/s $\textbf{\color{#d91a1a}-10.56\%}$
test_creation_nested_2 47.8090μs 17.6539μs 56.6446 KOps/s 63.8874 KOps/s $\textbf{\color{#d91a1a}-11.34\%}$
test_clone 73.3680μs 13.2312μs 75.5791 KOps/s 77.0275 KOps/s $\color{#d91a1a}-1.88\%$
test_getitem[int] 33.7030μs 11.0547μs 90.4593 KOps/s 90.2676 KOps/s $\color{#35bf28}+0.21\%$
test_getitem[slice_int] 59.8220μs 22.0785μs 45.2929 KOps/s 45.0496 KOps/s $\color{#35bf28}+0.54\%$
test_getitem[range] 0.1389ms 41.6026μs 24.0369 KOps/s 23.6794 KOps/s $\color{#35bf28}+1.51\%$
test_getitem[tuple] 54.2110μs 18.0195μs 55.4954 KOps/s 55.2040 KOps/s $\color{#35bf28}+0.53\%$
test_getitem[list] 0.1356ms 36.9422μs 27.0693 KOps/s 27.0457 KOps/s $\color{#35bf28}+0.09\%$
test_setitem_dim[int] 58.0890μs 32.7008μs 30.5803 KOps/s 33.6619 KOps/s $\textbf{\color{#d91a1a}-9.15\%}$
test_setitem_dim[slice_int] 96.8110μs 57.3292μs 17.4431 KOps/s 18.2491 KOps/s $\color{#d91a1a}-4.42\%$
test_setitem_dim[range] 0.1561ms 77.2227μs 12.9496 KOps/s 13.1665 KOps/s $\color{#d91a1a}-1.65\%$
test_setitem_dim[tuple] 87.1530μs 47.1029μs 21.2301 KOps/s 22.1517 KOps/s $\color{#d91a1a}-4.16\%$
test_setitem 68.4590μs 20.6473μs 48.4324 KOps/s 52.0020 KOps/s $\textbf{\color{#d91a1a}-6.86\%}$
test_set 60.5840μs 19.5944μs 51.0350 KOps/s 54.0395 KOps/s $\textbf{\color{#d91a1a}-5.56\%}$
test_set_shared 1.5759ms 0.1372ms 7.2895 KOps/s 7.0461 KOps/s $\color{#35bf28}+3.45\%$
test_update 91.0100μs 23.1683μs 43.1625 KOps/s 46.5333 KOps/s $\textbf{\color{#d91a1a}-7.24\%}$
test_update_nested 0.1360ms 32.2294μs 31.0276 KOps/s 34.6737 KOps/s $\textbf{\color{#d91a1a}-10.52\%}$
test_set_nested 0.1332ms 21.9861μs 45.4833 KOps/s 48.3093 KOps/s $\textbf{\color{#d91a1a}-5.85\%}$
test_set_nested_new 0.1112ms 25.7565μs 38.8251 KOps/s 41.3613 KOps/s $\textbf{\color{#d91a1a}-6.13\%}$
test_select 97.0410μs 38.2721μs 26.1287 KOps/s 26.9904 KOps/s $\color{#d91a1a}-3.19\%$
test_select_nested 0.1146ms 59.1150μs 16.9162 KOps/s 17.3817 KOps/s $\color{#d91a1a}-2.68\%$
test_exclude_nested 0.2120ms 0.1175ms 8.5102 KOps/s 8.5480 KOps/s $\color{#d91a1a}-0.44\%$
test_empty[True] 0.7388ms 0.4175ms 2.3952 KOps/s 2.4371 KOps/s $\color{#d91a1a}-1.72\%$
test_empty[False] 4.2680μs 1.0444μs 957.4836 KOps/s 953.0688 KOps/s $\color{#35bf28}+0.46\%$
test_unbind_speed 0.4250ms 0.2455ms 4.0726 KOps/s 4.0746 KOps/s $\color{#d91a1a}-0.05\%$
test_unbind_speed_stack0 74.9682ms 3.3293ms 300.3653 Ops/s 323.8962 Ops/s $\textbf{\color{#d91a1a}-7.26\%}$
test_unbind_speed_stack1 35.2750μs 1.9495μs 512.9616 KOps/s 497.7862 KOps/s $\color{#35bf28}+3.05\%$
test_split 2.2595ms 1.4399ms 694.4733 Ops/s 612.3385 Ops/s $\textbf{\color{#35bf28}+13.41\%}$
test_chunk 70.6711ms 1.5413ms 648.7906 Ops/s 640.8012 Ops/s $\color{#35bf28}+1.25\%$
test_creation[device0] 0.1788ms 0.1002ms 9.9801 KOps/s 9.8505 KOps/s $\color{#35bf28}+1.32\%$
test_creation_from_tensor 3.8031ms 80.0424μs 12.4934 KOps/s 11.9530 KOps/s $\color{#35bf28}+4.52\%$
test_add_one[memmap_tensor0] 0.2187ms 5.3652μs 186.3861 KOps/s 189.1750 KOps/s $\color{#d91a1a}-1.47\%$
test_contiguous[memmap_tensor0] 10.2900μs 0.6317μs 1.5830 MOps/s 1.5611 MOps/s $\color{#35bf28}+1.40\%$
test_stack[memmap_tensor0] 53.5100μs 3.7521μs 266.5181 KOps/s 276.0825 KOps/s $\color{#d91a1a}-3.46\%$
test_memmaptd_index 0.9795ms 0.2371ms 4.2176 KOps/s 4.2081 KOps/s $\color{#35bf28}+0.23\%$
test_memmaptd_index_astensor 0.5339ms 0.2999ms 3.3345 KOps/s 3.3416 KOps/s $\color{#d91a1a}-0.21\%$
test_memmaptd_index_op 1.0394ms 0.6257ms 1.5982 KOps/s 1.7065 KOps/s $\textbf{\color{#d91a1a}-6.35\%}$
test_serialize_model 0.1822s 0.1077s 9.2891 Ops/s 8.5473 Ops/s $\textbf{\color{#35bf28}+8.68\%}$
test_serialize_model_pickle 0.4507s 0.3791s 2.6377 Ops/s 2.5697 Ops/s $\color{#35bf28}+2.65\%$
test_serialize_weights 0.1719s 0.1065s 9.3929 Ops/s 8.9890 Ops/s $\color{#35bf28}+4.49\%$
test_serialize_weights_returnearly 0.1986s 0.1298s 7.7018 Ops/s 8.1739 Ops/s $\textbf{\color{#d91a1a}-5.78\%}$
test_serialize_weights_pickle 1.2485s 0.5569s 1.7958 Ops/s 2.2942 Ops/s $\textbf{\color{#d91a1a}-21.73\%}$
test_serialize_weights_filesystem 96.4647ms 91.6051ms 10.9164 Ops/s 9.7001 Ops/s $\textbf{\color{#35bf28}+12.54\%}$
test_serialize_model_filesystem 99.5745ms 95.0330ms 10.5227 Ops/s 10.3290 Ops/s $\color{#35bf28}+1.88\%$
test_reshape_pytree 46.4070μs 20.9921μs 47.6370 KOps/s 46.6372 KOps/s $\color{#35bf28}+2.14\%$
test_reshape_td 78.7880μs 31.2921μs 31.9569 KOps/s 31.7944 KOps/s $\color{#35bf28}+0.51\%$
test_view_pytree 57.3570μs 20.8999μs 47.8470 KOps/s 48.2074 KOps/s $\color{#d91a1a}-0.75\%$
test_view_td 74.9954ms 10.6995μs 93.4625 KOps/s 87.7485 KOps/s $\textbf{\color{#35bf28}+6.51\%}$
test_unbind_pytree 52.2680μs 24.1843μs 41.3491 KOps/s 41.4045 KOps/s $\color{#d91a1a}-0.13\%$
test_unbind_td 0.1159ms 35.7538μs 27.9690 KOps/s 27.3610 KOps/s $\color{#35bf28}+2.22\%$
test_split_pytree 65.8430μs 24.0229μs 41.6269 KOps/s 42.1983 KOps/s $\color{#d91a1a}-1.35\%$
test_split_td 0.5103ms 39.5599μs 25.2781 KOps/s 25.2680 KOps/s $\color{#35bf28}+0.04\%$
test_add_pytree 65.7730μs 29.8738μs 33.4741 KOps/s 33.2455 KOps/s $\color{#35bf28}+0.69\%$
test_add_td 0.1276ms 55.7139μs 17.9488 KOps/s 19.3541 KOps/s $\textbf{\color{#d91a1a}-7.26\%}$
test_distributed 0.1946ms 99.1730μs 10.0834 KOps/s 9.8160 KOps/s $\color{#35bf28}+2.72\%$
test_tdmodule 0.2712ms 23.9995μs 41.6675 KOps/s 44.9528 KOps/s $\textbf{\color{#d91a1a}-7.31\%}$
test_tdmodule_dispatch 0.2220ms 46.3633μs 21.5688 KOps/s 22.4329 KOps/s $\color{#d91a1a}-3.85\%$
test_tdseq 69.4190μs 26.9914μs 37.0488 KOps/s 37.6952 KOps/s $\color{#d91a1a}-1.71\%$
test_tdseq_dispatch 0.3933ms 49.2713μs 20.2958 KOps/s 20.3836 KOps/s $\color{#d91a1a}-0.43\%$
test_instantiation_functorch 1.7431ms 1.3241ms 755.2149 Ops/s 756.4994 Ops/s $\color{#d91a1a}-0.17\%$
test_instantiation_td 1.5243ms 1.0209ms 979.5112 Ops/s 977.2675 Ops/s $\color{#35bf28}+0.23\%$
test_exec_functorch 0.3018ms 0.1580ms 6.3278 KOps/s 6.2944 KOps/s $\color{#35bf28}+0.53\%$
test_exec_functional_call 0.2514ms 0.1432ms 6.9850 KOps/s 6.8257 KOps/s $\color{#35bf28}+2.33\%$
test_exec_td 0.1911ms 0.1385ms 7.2204 KOps/s 6.9453 KOps/s $\color{#35bf28}+3.96\%$
test_exec_td_decorator 0.8384ms 0.1732ms 5.7740 KOps/s 5.0629 KOps/s $\textbf{\color{#35bf28}+14.04\%}$
test_vmap_mlp_speed[True-True] 1.0955ms 0.8953ms 1.1169 KOps/s 1.1102 KOps/s $\color{#35bf28}+0.60\%$
test_vmap_mlp_speed[True-False] 0.6856ms 0.4740ms 2.1096 KOps/s 2.1249 KOps/s $\color{#d91a1a}-0.72\%$
test_vmap_mlp_speed[False-True] 0.9114ms 0.7823ms 1.2784 KOps/s 1.2778 KOps/s $\color{#35bf28}+0.04\%$
test_vmap_mlp_speed[False-False] 0.5996ms 0.3888ms 2.5720 KOps/s 2.5809 KOps/s $\color{#d91a1a}-0.34\%$
test_vmap_mlp_speed_decorator[True-True] 1.5223ms 1.3915ms 718.6696 Ops/s 425.8795 Ops/s $\textbf{\color{#35bf28}+68.75\%}$
test_vmap_mlp_speed_decorator[True-False] 0.8213ms 0.5195ms 1.9248 KOps/s 1.8233 KOps/s $\textbf{\color{#35bf28}+5.57\%}$
test_vmap_mlp_speed_decorator[False-True] 1.5247ms 1.1248ms 889.0603 Ops/s 520.1488 Ops/s $\textbf{\color{#35bf28}+70.92\%}$
test_vmap_mlp_speed_decorator[False-False] 0.5871ms 0.3949ms 2.5326 KOps/s 2.3464 KOps/s $\textbf{\color{#35bf28}+7.94\%}$
test_to_module_speed[True] 1.3470ms 1.1185ms 894.0604 Ops/s 11.7540 Ops/s $\textbf{\color{#35bf28}+7506.41\%}$
test_to_module_speed[False] 1.1918ms 1.1005ms 908.6912 Ops/s 558.9516 Ops/s $\textbf{\color{#35bf28}+62.57\%}$

Copy link

github-actions bot commented Feb 7, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 134. Improved: $\large\color{#35bf28}25$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1185ms 13.5603μs 73.7446 KOps/s 70.4983 KOps/s $\color{#35bf28}+4.60\%$
test_plain_set_stack_nested 0.1814ms 0.1214ms 8.2365 KOps/s 8.2401 KOps/s $\color{#d91a1a}-0.04\%$
test_plain_set_nested_inplace 43.2320μs 14.7441μs 67.8237 KOps/s 64.5315 KOps/s $\textbf{\color{#35bf28}+5.10\%}$
test_plain_set_stack_nested_inplace 0.1757ms 0.1486ms 6.7306 KOps/s 6.7380 KOps/s $\color{#d91a1a}-0.11\%$
test_items 24.1910μs 4.7506μs 210.4990 KOps/s 207.6720 KOps/s $\color{#35bf28}+1.36\%$
test_items_nested 0.3591ms 0.3384ms 2.9552 KOps/s 2.9368 KOps/s $\color{#35bf28}+0.63\%$
test_items_nested_locked 0.3813ms 0.3419ms 2.9245 KOps/s 2.9020 KOps/s $\color{#35bf28}+0.78\%$
test_items_nested_leaf 0.2468ms 0.1992ms 5.0200 KOps/s 4.9757 KOps/s $\color{#35bf28}+0.89\%$
test_items_stack_nested 1.3672ms 1.3155ms 760.1552 Ops/s 757.7172 Ops/s $\color{#35bf28}+0.32\%$
test_items_stack_nested_leaf 1.2118ms 1.1513ms 868.5805 Ops/s 851.0511 Ops/s $\color{#35bf28}+2.06\%$
test_items_stack_nested_locked 1.1275ms 0.9001ms 1.1110 KOps/s 1.1001 KOps/s $\color{#35bf28}+1.00\%$
test_keys 28.2610μs 4.5985μs 217.4621 KOps/s 206.6301 KOps/s $\textbf{\color{#35bf28}+5.24\%}$
test_keys_nested 0.8460ms 95.3862μs 10.4837 KOps/s 10.4982 KOps/s $\color{#d91a1a}-0.14\%$
test_keys_nested_locked 0.1330ms 98.9851μs 10.1025 KOps/s 10.1135 KOps/s $\color{#d91a1a}-0.11\%$
test_keys_nested_leaf 0.1807ms 79.2926μs 12.6115 KOps/s 12.7041 KOps/s $\color{#d91a1a}-0.73\%$
test_keys_stack_nested 1.2282ms 1.1561ms 865.0130 Ops/s 865.2477 Ops/s $\color{#d91a1a}-0.03\%$
test_keys_stack_nested_leaf 1.2155ms 1.1329ms 882.6615 Ops/s 879.7479 Ops/s $\color{#35bf28}+0.33\%$
test_keys_stack_nested_locked 0.7710ms 0.7202ms 1.3886 KOps/s 1.3845 KOps/s $\color{#35bf28}+0.30\%$
test_values 9.4740μs 1.9047μs 525.0140 KOps/s 524.7419 KOps/s $\color{#35bf28}+0.05\%$
test_values_nested 73.0630μs 45.2728μs 22.0883 KOps/s 21.9477 KOps/s $\color{#35bf28}+0.64\%$
test_values_nested_locked 69.6030μs 47.5646μs 21.0241 KOps/s 20.7860 KOps/s $\color{#35bf28}+1.15\%$
test_values_nested_leaf 61.3530μs 39.3949μs 25.3840 KOps/s 25.1331 KOps/s $\color{#35bf28}+1.00\%$
test_values_stack_nested 1.0317ms 0.9659ms 1.0353 KOps/s 1.0369 KOps/s $\color{#d91a1a}-0.15\%$
test_values_stack_nested_leaf 1.2919ms 0.9584ms 1.0434 KOps/s 1.0413 KOps/s $\color{#35bf28}+0.20\%$
test_values_stack_nested_locked 0.6345ms 0.5723ms 1.7474 KOps/s 1.7377 KOps/s $\color{#35bf28}+0.56\%$
test_membership 6.0204μs 0.9623μs 1.0391 MOps/s 1.0370 MOps/s $\color{#35bf28}+0.20\%$
test_membership_nested 30.8310μs 2.9709μs 336.5995 KOps/s 341.4024 KOps/s $\color{#d91a1a}-1.41\%$
test_membership_nested_leaf 18.6910μs 2.9674μs 336.9961 KOps/s 340.3234 KOps/s $\color{#d91a1a}-0.98\%$
test_membership_stacked_nested 45.1620μs 11.2740μs 88.7000 KOps/s 87.6082 KOps/s $\color{#35bf28}+1.25\%$
test_membership_stacked_nested_leaf 41.5520μs 11.2340μs 89.0152 KOps/s 86.8838 KOps/s $\color{#35bf28}+2.45\%$
test_membership_nested_last 36.8610μs 5.3673μs 186.3136 KOps/s 185.9201 KOps/s $\color{#35bf28}+0.21\%$
test_membership_nested_leaf_last 34.0420μs 5.3809μs 185.8436 KOps/s 185.7053 KOps/s $\color{#35bf28}+0.07\%$
test_membership_stacked_nested_last 0.1907ms 0.1573ms 6.3561 KOps/s 6.3462 KOps/s $\color{#35bf28}+0.16\%$
test_membership_stacked_nested_leaf_last 51.2630μs 13.2763μs 75.3219 KOps/s 74.2949 KOps/s $\color{#35bf28}+1.38\%$
test_nested_getleaf 32.0210μs 8.4654μs 118.1281 KOps/s 118.4940 KOps/s $\color{#d91a1a}-0.31\%$
test_nested_get 32.3010μs 7.9947μs 125.0834 KOps/s 125.2686 KOps/s $\color{#d91a1a}-0.15\%$
test_stacked_getleaf 0.3775ms 0.3297ms 3.0330 KOps/s 3.0268 KOps/s $\color{#35bf28}+0.20\%$
test_stacked_get 0.3394ms 0.3010ms 3.3220 KOps/s 3.3761 KOps/s $\color{#d91a1a}-1.60\%$
test_nested_getitemleaf 32.6410μs 9.8272μs 101.7588 KOps/s 101.6861 KOps/s $\color{#35bf28}+0.07\%$
test_nested_getitem 37.9610μs 9.3828μs 106.5782 KOps/s 106.5135 KOps/s $\color{#35bf28}+0.06\%$
test_stacked_getitemleaf 0.3729ms 0.3339ms 2.9948 KOps/s 3.0053 KOps/s $\color{#d91a1a}-0.35\%$
test_stacked_getitem 0.3342ms 0.3017ms 3.3143 KOps/s 3.3532 KOps/s $\color{#d91a1a}-1.16\%$
test_lock_nested 1.2830ms 0.3516ms 2.8440 KOps/s 2.8018 KOps/s $\color{#35bf28}+1.51\%$
test_lock_stack_nested 86.5822ms 6.3648ms 157.1140 Ops/s 158.6337 Ops/s $\color{#d91a1a}-0.96\%$
test_unlock_nested 78.8698ms 0.4313ms 2.3183 KOps/s 2.8641 KOps/s $\textbf{\color{#d91a1a}-19.05\%}$
test_unlock_stack_nested 86.9765ms 6.4529ms 154.9702 Ops/s 154.2172 Ops/s $\color{#35bf28}+0.49\%$
test_flatten_speed 0.3521ms 0.2620ms 3.8174 KOps/s 3.8483 KOps/s $\color{#d91a1a}-0.80\%$
test_unflatten_speed 0.3970ms 0.3639ms 2.7478 KOps/s 2.7788 KOps/s $\color{#d91a1a}-1.12\%$
test_common_ops 1.0726ms 0.5875ms 1.7020 KOps/s 1.6185 KOps/s $\textbf{\color{#35bf28}+5.16\%}$
test_creation 17.3710μs 1.5946μs 627.1148 KOps/s 637.8006 KOps/s $\color{#d91a1a}-1.68\%$
test_creation_empty 29.2520μs 7.7463μs 129.0941 KOps/s 107.6475 KOps/s $\textbf{\color{#35bf28}+19.92\%}$
test_creation_nested_1 27.9820μs 9.4385μs 105.9486 KOps/s 91.6434 KOps/s $\textbf{\color{#35bf28}+15.61\%}$
test_creation_nested_2 43.6220μs 11.8500μs 84.3881 KOps/s 75.0658 KOps/s $\textbf{\color{#35bf28}+12.42\%}$
test_clone 66.9730μs 13.9705μs 71.5793 KOps/s 74.0051 KOps/s $\color{#d91a1a}-3.28\%$
test_getitem[int] 63.1730μs 10.8536μs 92.1356 KOps/s 92.4172 KOps/s $\color{#d91a1a}-0.30\%$
test_getitem[slice_int] 40.7820μs 21.2742μs 47.0052 KOps/s 47.1444 KOps/s $\color{#d91a1a}-0.30\%$
test_getitem[range] 0.1084ms 39.9886μs 25.0071 KOps/s 24.8483 KOps/s $\color{#35bf28}+0.64\%$
test_getitem[tuple] 39.0920μs 18.7100μs 53.4473 KOps/s 54.0126 KOps/s $\color{#d91a1a}-1.05\%$
test_getitem[list] 0.1426ms 35.8717μs 27.8772 KOps/s 27.2076 KOps/s $\color{#35bf28}+2.46\%$
test_setitem_dim[int] 41.2520μs 25.4094μs 39.3555 KOps/s 36.7651 KOps/s $\textbf{\color{#35bf28}+7.05\%}$
test_setitem_dim[slice_int] 64.3420μs 47.2320μs 21.1721 KOps/s 20.1791 KOps/s $\color{#35bf28}+4.92\%$
test_setitem_dim[range] 84.2840μs 66.3289μs 15.0764 KOps/s 14.4743 KOps/s $\color{#35bf28}+4.16\%$
test_setitem_dim[tuple] 57.1130μs 40.4486μs 24.7228 KOps/s 24.3979 KOps/s $\color{#35bf28}+1.33\%$
test_setitem 52.2830μs 18.2281μs 54.8603 KOps/s 51.9647 KOps/s $\textbf{\color{#35bf28}+5.57\%}$
test_set 49.5320μs 18.5398μs 53.9380 KOps/s 51.9823 KOps/s $\color{#35bf28}+3.76\%$
test_set_shared 2.9129ms 0.1048ms 9.5448 KOps/s 9.3641 KOps/s $\color{#35bf28}+1.93\%$
test_update 99.7450μs 19.8421μs 50.3978 KOps/s 43.9893 KOps/s $\textbf{\color{#35bf28}+14.57\%}$
test_update_nested 95.1650μs 26.5095μs 37.7223 KOps/s 34.4984 KOps/s $\textbf{\color{#35bf28}+9.35\%}$
test_set_nested 57.2130μs 18.9737μs 52.7045 KOps/s 51.0296 KOps/s $\color{#35bf28}+3.28\%$
test_set_nested_new 63.6930μs 21.7872μs 45.8985 KOps/s 44.6587 KOps/s $\color{#35bf28}+2.78\%$
test_select 80.5640μs 34.2480μs 29.1988 KOps/s 27.3312 KOps/s $\textbf{\color{#35bf28}+6.83\%}$
test_select_nested 75.0530μs 53.0505μs 18.8500 KOps/s 18.7605 KOps/s $\color{#35bf28}+0.48\%$
test_exclude_nested 0.1467ms 0.1138ms 8.7873 KOps/s 8.8556 KOps/s $\color{#d91a1a}-0.77\%$
test_empty[True] 0.4277ms 0.3852ms 2.5962 KOps/s 2.5883 KOps/s $\color{#35bf28}+0.31\%$
test_empty[False] 2.7591μs 0.8631μs 1.1586 MOps/s 1.1811 MOps/s $\color{#d91a1a}-1.91\%$
test_to 73.9730μs 56.1026μs 17.8245 KOps/s 18.5376 KOps/s $\color{#d91a1a}-3.85\%$
test_to_nonblocking 70.2130μs 34.4637μs 29.0160 KOps/s 29.1478 KOps/s $\color{#d91a1a}-0.45\%$
test_unbind_speed 0.3043ms 0.2720ms 3.6765 KOps/s 3.7364 KOps/s $\color{#d91a1a}-1.60\%$
test_unbind_speed_stack0 87.1292ms 3.7809ms 264.4848 Ops/s 284.6167 Ops/s $\textbf{\color{#d91a1a}-7.07\%}$
test_unbind_speed_stack1 37.3420μs 1.8025μs 554.7903 KOps/s 568.7409 KOps/s $\color{#d91a1a}-2.45\%$
test_split 81.1855ms 1.7195ms 581.5630 Ops/s 656.5390 Ops/s $\textbf{\color{#d91a1a}-11.42\%}$
test_chunk 1.5590ms 1.5246ms 655.9261 Ops/s 607.7340 Ops/s $\textbf{\color{#35bf28}+7.93\%}$
test_creation[device0] 0.1294ms 73.6648μs 13.5750 KOps/s 13.5229 KOps/s $\color{#35bf28}+0.39\%$
test_creation_from_tensor 0.1365ms 54.2956μs 18.4177 KOps/s 18.2793 KOps/s $\color{#35bf28}+0.76\%$
test_add_one[memmap_tensor0] 0.1374ms 7.1947μs 138.9910 KOps/s 138.0654 KOps/s $\color{#35bf28}+0.67\%$
test_contiguous[memmap_tensor0] 11.5810μs 0.6436μs 1.5537 MOps/s 1.5096 MOps/s $\color{#35bf28}+2.92\%$
test_stack[memmap_tensor0] 38.9210μs 4.4718μs 223.6258 KOps/s 214.1885 KOps/s $\color{#35bf28}+4.41\%$
test_memmaptd_index 1.0231ms 0.2688ms 3.7196 KOps/s 3.7549 KOps/s $\color{#d91a1a}-0.94\%$
test_memmaptd_index_astensor 0.6525ms 0.3253ms 3.0738 KOps/s 3.1002 KOps/s $\color{#d91a1a}-0.85\%$
test_memmaptd_index_op 0.8777ms 0.6113ms 1.6359 KOps/s 1.5635 KOps/s $\color{#35bf28}+4.63\%$
test_serialize_model 93.2023ms 89.0878ms 11.2249 Ops/s 9.6930 Ops/s $\textbf{\color{#35bf28}+15.80\%}$
test_serialize_model_pickle 1.3688s 1.2388s 0.8072 Ops/s 0.8084 Ops/s $\color{#d91a1a}-0.14\%$
test_serialize_weights 0.1723s 96.0652ms 10.4096 Ops/s 10.0019 Ops/s $\color{#35bf28}+4.08\%$
test_serialize_weights_returnearly 0.1593s 70.3277ms 14.2191 Ops/s 11.9568 Ops/s $\textbf{\color{#35bf28}+18.92\%}$
test_serialize_weights_pickle 1.3506s 1.2490s 0.8006 Ops/s 0.8089 Ops/s $\color{#d91a1a}-1.03\%$
test_reshape_pytree 55.7320μs 24.5720μs 40.6968 KOps/s 40.3372 KOps/s $\color{#35bf28}+0.89\%$
test_reshape_td 0.1308ms 30.6840μs 32.5903 KOps/s 32.3698 KOps/s $\color{#35bf28}+0.68\%$
test_view_pytree 0.1644ms 25.2434μs 39.6143 KOps/s 40.8953 KOps/s $\color{#d91a1a}-3.13\%$
test_view_td 0.3971ms 6.9119μs 144.6784 KOps/s 146.5344 KOps/s $\color{#d91a1a}-1.27\%$
test_unbind_pytree 79.7730μs 30.3341μs 32.9662 KOps/s 32.3667 KOps/s $\color{#35bf28}+1.85\%$
test_unbind_td 73.7330μs 40.2314μs 24.8562 KOps/s 23.7868 KOps/s $\color{#35bf28}+4.50\%$
test_split_pytree 55.0120μs 28.3597μs 35.2613 KOps/s 33.2212 KOps/s $\textbf{\color{#35bf28}+6.14\%}$
test_split_td 0.1045ms 38.7021μs 25.8384 KOps/s 25.3732 KOps/s $\color{#35bf28}+1.83\%$
test_add_pytree 58.6920μs 36.5124μs 27.3880 KOps/s 27.3375 KOps/s $\color{#35bf28}+0.18\%$
test_add_td 90.0830μs 49.2869μs 20.2894 KOps/s 19.5456 KOps/s $\color{#35bf28}+3.81\%$
test_distributed 1.8397ms 73.6521μs 13.5773 KOps/s 14.3901 KOps/s $\textbf{\color{#d91a1a}-5.65\%}$
test_tdmodule 74.9830μs 17.5820μs 56.8764 KOps/s 54.2518 KOps/s $\color{#35bf28}+4.84\%$
test_tdmodule_dispatch 0.2476ms 36.3898μs 27.4802 KOps/s 25.9127 KOps/s $\textbf{\color{#35bf28}+6.05\%}$
test_tdseq 39.7820μs 20.4829μs 48.8212 KOps/s 47.0474 KOps/s $\color{#35bf28}+3.77\%$
test_tdseq_dispatch 59.0720μs 38.3361μs 26.0850 KOps/s 24.8688 KOps/s $\color{#35bf28}+4.89\%$
test_instantiation_functorch 1.7893ms 1.6665ms 600.0746 Ops/s 598.9359 Ops/s $\color{#35bf28}+0.19\%$
test_instantiation_td 1.7224ms 1.1493ms 870.0820 Ops/s 860.0573 Ops/s $\color{#35bf28}+1.17\%$
test_exec_functorch 0.2115ms 0.1630ms 6.1355 KOps/s 6.2441 KOps/s $\color{#d91a1a}-1.74\%$
test_exec_functional_call 0.2119ms 0.1588ms 6.2961 KOps/s 6.3527 KOps/s $\color{#d91a1a}-0.89\%$
test_exec_td 0.1782ms 0.1485ms 6.7339 KOps/s 6.7927 KOps/s $\color{#d91a1a}-0.87\%$
test_exec_td_decorator 0.7917ms 0.1818ms 5.5018 KOps/s 4.9022 KOps/s $\textbf{\color{#35bf28}+12.23\%}$
test_vmap_mlp_speed[True-True] 1.1123ms 1.0450ms 956.9394 Ops/s 956.7862 Ops/s $\color{#35bf28}+0.02\%$
test_vmap_mlp_speed[True-False] 0.6498ms 0.6023ms 1.6603 KOps/s 1.6546 KOps/s $\color{#35bf28}+0.34\%$
test_vmap_mlp_speed[False-True] 0.9913ms 0.9635ms 1.0379 KOps/s 1.0075 KOps/s $\color{#35bf28}+3.02\%$
test_vmap_mlp_speed[False-False] 0.5627ms 0.5362ms 1.8648 KOps/s 1.8179 KOps/s $\color{#35bf28}+2.58\%$
test_vmap_mlp_speed_decorator[True-True] 1.9606ms 1.5442ms 647.5781 Ops/s 421.9918 Ops/s $\textbf{\color{#35bf28}+53.46\%}$
test_vmap_mlp_speed_decorator[True-False] 0.8118ms 0.6456ms 1.5488 KOps/s 1.4960 KOps/s $\color{#35bf28}+3.53\%$
test_vmap_mlp_speed_decorator[False-True] 1.6147ms 1.3174ms 759.0493 Ops/s 504.9767 Ops/s $\textbf{\color{#35bf28}+50.31\%}$
test_vmap_mlp_speed_decorator[False-False] 0.6539ms 0.5463ms 1.8304 KOps/s 1.7363 KOps/s $\textbf{\color{#35bf28}+5.42\%}$
test_vmap_transformer_speed[True-True] 12.7322ms 12.4508ms 80.3164 Ops/s 81.5044 Ops/s $\color{#d91a1a}-1.46\%$
test_vmap_transformer_speed[True-False] 8.3226ms 8.1220ms 123.1217 Ops/s 119.9567 Ops/s $\color{#35bf28}+2.64\%$
test_vmap_transformer_speed[False-True] 12.4387ms 12.2310ms 81.7591 Ops/s 82.5461 Ops/s $\color{#d91a1a}-0.95\%$
test_vmap_transformer_speed[False-False] 8.1931ms 8.0555ms 124.1383 Ops/s 124.1002 Ops/s $\color{#35bf28}+0.03\%$
test_vmap_transformer_speed_decorator[True-True] 55.0276ms 53.5612ms 18.6702 Ops/s 13.5474 Ops/s $\textbf{\color{#35bf28}+37.81\%}$
test_vmap_transformer_speed_decorator[True-False] 19.9144ms 19.4227ms 51.4860 Ops/s 50.8020 Ops/s $\color{#35bf28}+1.35\%$
test_vmap_transformer_speed_decorator[False-True] 49.1920ms 47.9874ms 20.8388 Ops/s 15.0706 Ops/s $\textbf{\color{#35bf28}+38.27\%}$
test_vmap_transformer_speed_decorator[False-False] 19.0065ms 18.8867ms 52.9473 Ops/s 45.7768 Ops/s $\textbf{\color{#35bf28}+15.66\%}$
test_to_module_speed[True] 1.1365ms 1.0201ms 980.3184 Ops/s 12.1016 Ops/s $\textbf{\color{#35bf28}+8000.74\%}$
test_to_module_speed[False] 1.1010ms 0.9909ms 1.0091 KOps/s 579.8780 Ops/s $\textbf{\color{#35bf28}+74.03\%}$

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 7, 2024
@vmoens
Copy link
Contributor Author

vmoens commented Feb 7, 2024

image @matteobettini this is some serious speedup

@vmoens vmoens merged commit 517300a into main Feb 7, 2024
47 of 48 checks passed
vmoens added a commit that referenced this pull request Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Functional call with TensorDictParams is slow
2 participants