Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] from_modules method for MOE / ensemble learning #677

Merged
merged 5 commits into from
Feb 15, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 15, 2024

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 15, 2024
Copy link

github-actions bot commented Feb 15, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 126. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 36.0470μs 16.8004μs 59.5224 KOps/s 60.2421 KOps/s $\color{#d91a1a}-1.19\%$
test_plain_set_stack_nested 0.1815ms 0.1473ms 6.7880 KOps/s 6.8367 KOps/s $\color{#d91a1a}-0.71\%$
test_plain_set_nested_inplace 43.6720μs 18.9388μs 52.8018 KOps/s 50.9032 KOps/s $\color{#35bf28}+3.73\%$
test_plain_set_stack_nested_inplace 0.3442ms 0.1797ms 5.5637 KOps/s 5.5893 KOps/s $\color{#d91a1a}-0.46\%$
test_items 25.9090μs 2.5751μs 388.3386 KOps/s 400.6723 KOps/s $\color{#d91a1a}-3.08\%$
test_items_nested 0.4697ms 0.2704ms 3.6987 KOps/s 3.6968 KOps/s $\color{#35bf28}+0.05\%$
test_items_nested_locked 0.8942ms 0.2751ms 3.6349 KOps/s 3.7013 KOps/s $\color{#d91a1a}-1.79\%$
test_items_nested_leaf 0.3224ms 0.1682ms 5.9465 KOps/s 5.9892 KOps/s $\color{#d91a1a}-0.71\%$
test_items_stack_nested 1.4319ms 1.2922ms 773.8781 Ops/s 760.1493 Ops/s $\color{#35bf28}+1.81\%$
test_items_stack_nested_leaf 1.8957ms 1.1769ms 849.6768 Ops/s 836.2088 Ops/s $\color{#35bf28}+1.61\%$
test_items_stack_nested_locked 4.7646ms 0.8907ms 1.1227 KOps/s 1.1422 KOps/s $\color{#d91a1a}-1.71\%$
test_keys 24.7060μs 3.8634μs 258.8404 KOps/s 256.9359 KOps/s $\color{#35bf28}+0.74\%$
test_keys_nested 1.5489ms 0.1488ms 6.7209 KOps/s 6.6769 KOps/s $\color{#35bf28}+0.66\%$
test_keys_nested_locked 0.2701ms 0.1523ms 6.5669 KOps/s 6.5708 KOps/s $\color{#d91a1a}-0.06\%$
test_keys_nested_leaf 0.2598ms 0.1297ms 7.7126 KOps/s 7.6516 KOps/s $\color{#35bf28}+0.80\%$
test_keys_stack_nested 2.2334ms 1.2405ms 806.1121 Ops/s 791.6395 Ops/s $\color{#35bf28}+1.83\%$
test_keys_stack_nested_leaf 1.9487ms 1.2432ms 804.3542 Ops/s 793.6043 Ops/s $\color{#35bf28}+1.35\%$
test_keys_stack_nested_locked 1.4059ms 0.7901ms 1.2656 KOps/s 1.2626 KOps/s $\color{#35bf28}+0.24\%$
test_values 4.7970μs 1.1262μs 887.9206 KOps/s 855.9266 KOps/s $\color{#35bf28}+3.74\%$
test_values_nested 92.0620μs 51.8730μs 19.2779 KOps/s 19.1943 KOps/s $\color{#35bf28}+0.44\%$
test_values_nested_locked 93.8250μs 52.0475μs 19.2132 KOps/s 18.9044 KOps/s $\color{#35bf28}+1.63\%$
test_values_nested_leaf 92.4320μs 46.2899μs 21.6030 KOps/s 21.1116 KOps/s $\color{#35bf28}+2.33\%$
test_values_stack_nested 1.3076ms 1.0162ms 984.1000 Ops/s 971.6305 Ops/s $\color{#35bf28}+1.28\%$
test_values_stack_nested_leaf 1.5159ms 1.0039ms 996.1075 Ops/s 978.4493 Ops/s $\color{#35bf28}+1.80\%$
test_values_stack_nested_locked 1.1604ms 0.6025ms 1.6596 KOps/s 1.6743 KOps/s $\color{#d91a1a}-0.87\%$
test_membership 16.5010μs 1.3249μs 754.7489 KOps/s 717.1842 KOps/s $\textbf{\color{#35bf28}+5.24\%}$
test_membership_nested 25.0770μs 3.4502μs 289.8416 KOps/s 287.3597 KOps/s $\color{#35bf28}+0.86\%$
test_membership_nested_leaf 27.7420μs 3.4723μs 287.9899 KOps/s 283.4719 KOps/s $\color{#35bf28}+1.59\%$
test_membership_stacked_nested 41.3170μs 11.6703μs 85.6878 KOps/s 84.0758 KOps/s $\color{#35bf28}+1.92\%$
test_membership_stacked_nested_leaf 35.7260μs 11.5995μs 86.2109 KOps/s 82.0520 KOps/s $\textbf{\color{#35bf28}+5.07\%}$
test_membership_nested_last 28.9440μs 6.5778μs 152.0267 KOps/s 147.1585 KOps/s $\color{#35bf28}+3.31\%$
test_membership_nested_leaf_last 33.5530μs 6.6473μs 150.4367 KOps/s 149.6524 KOps/s $\color{#35bf28}+0.52\%$
test_membership_stacked_nested_last 0.3059ms 0.1761ms 5.6788 KOps/s 5.6964 KOps/s $\color{#d91a1a}-0.31\%$
test_membership_stacked_nested_leaf_last 41.2970μs 13.6060μs 73.4972 KOps/s 69.9110 KOps/s $\textbf{\color{#35bf28}+5.13\%}$
test_nested_getleaf 50.3940μs 10.5687μs 94.6193 KOps/s 94.3113 KOps/s $\color{#35bf28}+0.33\%$
test_nested_get 50.4750μs 9.9977μs 100.0231 KOps/s 99.4120 KOps/s $\color{#35bf28}+0.61\%$
test_stacked_getleaf 0.7258ms 0.3946ms 2.5343 KOps/s 2.5339 KOps/s $\color{#35bf28}+0.01\%$
test_stacked_get 0.5197ms 0.3604ms 2.7746 KOps/s 2.7303 KOps/s $\color{#35bf28}+1.62\%$
test_nested_getitemleaf 36.3880μs 11.8332μs 84.5081 KOps/s 82.4771 KOps/s $\color{#35bf28}+2.46\%$
test_nested_getitem 44.0720μs 11.3262μs 88.2905 KOps/s 86.3916 KOps/s $\color{#35bf28}+2.20\%$
test_stacked_getitemleaf 0.6035ms 0.3972ms 2.5178 KOps/s 2.4580 KOps/s $\color{#35bf28}+2.43\%$
test_stacked_getitem 0.8610ms 0.3692ms 2.7084 KOps/s 2.7236 KOps/s $\color{#d91a1a}-0.56\%$
test_lock_nested 2.8404ms 0.3358ms 2.9784 KOps/s 3.0103 KOps/s $\color{#d91a1a}-1.06\%$
test_lock_stack_nested 75.7950ms 5.4575ms 183.2343 Ops/s 182.9858 Ops/s $\color{#35bf28}+0.14\%$
test_unlock_nested 60.6595ms 0.3935ms 2.5411 KOps/s 3.0104 KOps/s $\textbf{\color{#d91a1a}-15.59\%}$
test_unlock_stack_nested 79.6129ms 5.6268ms 177.7199 Ops/s 178.6593 Ops/s $\color{#d91a1a}-0.53\%$
test_flatten_speed 0.6595ms 0.3610ms 2.7698 KOps/s 2.7587 KOps/s $\color{#35bf28}+0.40\%$
test_unflatten_speed 0.6098ms 0.4620ms 2.1647 KOps/s 2.2021 KOps/s $\color{#d91a1a}-1.70\%$
test_common_ops 4.8115ms 0.6619ms 1.5108 KOps/s 1.4731 KOps/s $\color{#35bf28}+2.56\%$
test_creation 12.4330μs 1.8425μs 542.7313 KOps/s 530.9676 KOps/s $\color{#35bf28}+2.22\%$
test_creation_empty 29.4450μs 8.8891μs 112.4970 KOps/s 101.6193 KOps/s $\textbf{\color{#35bf28}+10.70\%}$
test_creation_nested_1 39.9550μs 11.4064μs 87.6698 KOps/s 80.9084 KOps/s $\textbf{\color{#35bf28}+8.36\%}$
test_creation_nested_2 36.6580μs 14.8316μs 67.4237 KOps/s 62.7774 KOps/s $\textbf{\color{#35bf28}+7.40\%}$
test_clone 50.1640μs 13.0568μs 76.5883 KOps/s 75.2975 KOps/s $\color{#35bf28}+1.71\%$
test_getitem[int] 37.4600μs 11.2548μs 88.8507 KOps/s 91.2594 KOps/s $\color{#d91a1a}-2.64\%$
test_getitem[slice_int] 53.5600μs 22.1775μs 45.0908 KOps/s 44.4957 KOps/s $\color{#35bf28}+1.34\%$
test_getitem[range] 96.8410μs 40.3189μs 24.8023 KOps/s 24.2225 KOps/s $\color{#35bf28}+2.39\%$
test_getitem[tuple] 50.0530μs 18.2447μs 54.8105 KOps/s 55.0083 KOps/s $\color{#d91a1a}-0.36\%$
test_getitem[list] 0.1203ms 36.2943μs 27.5525 KOps/s 27.1099 KOps/s $\color{#35bf28}+1.63\%$
test_setitem_dim[int] 43.2910μs 27.5267μs 36.3284 KOps/s 31.5921 KOps/s $\textbf{\color{#35bf28}+14.99\%}$
test_setitem_dim[slice_int] 95.0470μs 51.4058μs 19.4531 KOps/s 17.7081 KOps/s $\textbf{\color{#35bf28}+9.85\%}$
test_setitem_dim[range] 0.1683ms 71.6020μs 13.9661 KOps/s 13.0045 KOps/s $\textbf{\color{#35bf28}+7.39\%}$
test_setitem_dim[tuple] 89.8180μs 41.6040μs 24.0361 KOps/s 21.6615 KOps/s $\textbf{\color{#35bf28}+10.96\%}$
test_setitem 58.1290μs 18.3878μs 54.3840 KOps/s 51.9954 KOps/s $\color{#35bf28}+4.59\%$
test_set 55.5840μs 17.7979μs 56.1863 KOps/s 53.5683 KOps/s $\color{#35bf28}+4.89\%$
test_set_shared 4.7097ms 0.1377ms 7.2639 KOps/s 7.2796 KOps/s $\color{#d91a1a}-0.22\%$
test_update 0.1421ms 20.1686μs 49.5820 KOps/s 46.9204 KOps/s $\textbf{\color{#35bf28}+5.67\%}$
test_update_nested 72.5860μs 27.7459μs 36.0414 KOps/s 35.0564 KOps/s $\color{#35bf28}+2.81\%$
test_set_nested 63.6490μs 19.6937μs 50.7777 KOps/s 48.4815 KOps/s $\color{#35bf28}+4.74\%$
test_set_nested_new 61.7250μs 23.5951μs 42.3817 KOps/s 41.4826 KOps/s $\color{#35bf28}+2.17\%$
test_select 87.8440μs 36.4030μs 27.4703 KOps/s 27.0891 KOps/s $\color{#35bf28}+1.41\%$
test_select_nested 0.1292ms 60.1275μs 16.6313 KOps/s 17.0832 KOps/s $\color{#d91a1a}-2.65\%$
test_exclude_nested 0.2199ms 0.1191ms 8.3943 KOps/s 8.4708 KOps/s $\color{#d91a1a}-0.90\%$
test_empty[True] 0.7858ms 0.4207ms 2.3772 KOps/s 2.3944 KOps/s $\color{#d91a1a}-0.72\%$
test_empty[False] 5.8148μs 1.0534μs 949.3221 KOps/s 954.3972 KOps/s $\color{#d91a1a}-0.53\%$
test_unbind_speed 0.3843ms 0.2428ms 4.1181 KOps/s 4.0840 KOps/s $\color{#35bf28}+0.83\%$
test_unbind_speed_stack0 78.2066ms 3.3420ms 299.2219 Ops/s 300.0898 Ops/s $\color{#d91a1a}-0.29\%$
test_unbind_speed_stack1 14.7670μs 1.9832μs 504.2247 KOps/s 515.9744 KOps/s $\color{#d91a1a}-2.28\%$
test_split 69.5579ms 1.6296ms 613.6464 Ops/s 607.5244 Ops/s $\color{#35bf28}+1.01\%$
test_chunk 1.6659ms 1.4666ms 681.8495 Ops/s 681.6282 Ops/s $\color{#35bf28}+0.03\%$
test_creation[device0] 0.1758ms 0.1006ms 9.9448 KOps/s 9.9111 KOps/s $\color{#35bf28}+0.34\%$
test_creation_from_tensor 3.6097ms 83.9355μs 11.9139 KOps/s 12.2871 KOps/s $\color{#d91a1a}-3.04\%$
test_add_one[memmap_tensor0] 0.1826ms 5.3839μs 185.7388 KOps/s 182.0780 KOps/s $\color{#35bf28}+2.01\%$
test_contiguous[memmap_tensor0] 18.7650μs 0.6365μs 1.5710 MOps/s 1.5527 MOps/s $\color{#35bf28}+1.18\%$
test_stack[memmap_tensor0] 39.7540μs 3.5725μs 279.9142 KOps/s 277.5367 KOps/s $\color{#35bf28}+0.86\%$
test_memmaptd_index 0.9516ms 0.2314ms 4.3218 KOps/s 4.2103 KOps/s $\color{#35bf28}+2.65\%$
test_memmaptd_index_astensor 71.1809ms 0.3177ms 3.1478 KOps/s 3.3693 KOps/s $\textbf{\color{#d91a1a}-6.57\%}$
test_memmaptd_index_op 0.9465ms 0.5575ms 1.7938 KOps/s 1.6841 KOps/s $\textbf{\color{#35bf28}+6.51\%}$
test_serialize_model 0.1070s 0.1027s 9.7376 Ops/s 9.7551 Ops/s $\color{#d91a1a}-0.18\%$
test_serialize_model_pickle 0.4493s 0.3758s 2.6607 Ops/s 2.6387 Ops/s $\color{#35bf28}+0.83\%$
test_serialize_weights 0.1752s 0.1065s 9.3885 Ops/s 9.2522 Ops/s $\color{#35bf28}+1.47\%$
test_serialize_weights_returnearly 0.1321s 0.1208s 8.2780 Ops/s 7.4026 Ops/s $\textbf{\color{#35bf28}+11.83\%}$
test_serialize_weights_pickle 0.6557s 0.4736s 2.1117 Ops/s 2.3801 Ops/s $\textbf{\color{#d91a1a}-11.28\%}$
test_serialize_weights_filesystem 0.1622s 97.0700ms 10.3018 Ops/s 10.7183 Ops/s $\color{#d91a1a}-3.89\%$
test_serialize_model_filesystem 98.2320ms 92.7483ms 10.7819 Ops/s 9.9955 Ops/s $\textbf{\color{#35bf28}+7.87\%}$
test_reshape_pytree 50.8350μs 21.0958μs 47.4029 KOps/s 47.9314 KOps/s $\color{#d91a1a}-1.10\%$
test_reshape_td 69.2700μs 31.4844μs 31.7617 KOps/s 32.4100 KOps/s $\color{#d91a1a}-2.00\%$
test_view_pytree 57.6780μs 21.0310μs 47.5489 KOps/s 48.3789 KOps/s $\color{#d91a1a}-1.72\%$
test_view_td 76.8841ms 10.9185μs 91.5876 KOps/s 90.2304 KOps/s $\color{#35bf28}+1.50\%$
test_unbind_pytree 49.8430μs 24.0725μs 41.5412 KOps/s 41.8365 KOps/s $\color{#d91a1a}-0.71\%$
test_unbind_td 0.1135ms 35.9912μs 27.7846 KOps/s 28.1322 KOps/s $\color{#d91a1a}-1.24\%$
test_split_pytree 60.7240μs 23.8343μs 41.9564 KOps/s 42.2360 KOps/s $\color{#d91a1a}-0.66\%$
test_split_td 0.1127ms 39.6495μs 25.2210 KOps/s 25.3330 KOps/s $\color{#d91a1a}-0.44\%$
test_add_pytree 71.8040μs 29.6271μs 33.7529 KOps/s 33.2295 KOps/s $\color{#35bf28}+1.58\%$
test_add_td 0.1137ms 47.8606μs 20.8940 KOps/s 18.0371 KOps/s $\textbf{\color{#35bf28}+15.84\%}$
test_distributed 0.1780ms 99.7171μs 10.0284 KOps/s 9.9044 KOps/s $\color{#35bf28}+1.25\%$
test_tdmodule 0.6571ms 22.4046μs 44.6337 KOps/s 44.0292 KOps/s $\color{#35bf28}+1.37\%$
test_tdmodule_dispatch 0.1863ms 42.5263μs 23.5148 KOps/s 23.4998 KOps/s $\color{#35bf28}+0.06\%$
test_tdseq 0.1131ms 24.2603μs 41.2197 KOps/s 37.2199 KOps/s $\textbf{\color{#35bf28}+10.75\%}$
test_tdseq_dispatch 0.3677ms 45.9622μs 21.7570 KOps/s 20.7115 KOps/s $\textbf{\color{#35bf28}+5.05\%}$
test_instantiation_functorch 1.5127ms 1.3079ms 764.6032 Ops/s 760.1991 Ops/s $\color{#35bf28}+0.58\%$
test_instantiation_td 1.4735ms 1.0098ms 990.2614 Ops/s 993.3861 Ops/s $\color{#d91a1a}-0.31\%$
test_exec_functorch 0.2191ms 0.1550ms 6.4510 KOps/s 6.3950 KOps/s $\color{#35bf28}+0.87\%$
test_exec_functional_call 0.2284ms 0.1445ms 6.9225 KOps/s 6.6871 KOps/s $\color{#35bf28}+3.52\%$
test_exec_td 0.3220ms 0.1447ms 6.9101 KOps/s 7.0560 KOps/s $\color{#d91a1a}-2.07\%$
test_exec_td_decorator 0.3160ms 0.1719ms 5.8183 KOps/s 5.7578 KOps/s $\color{#35bf28}+1.05\%$
test_vmap_mlp_speed[True-True] 1.3119ms 0.8866ms 1.1279 KOps/s 1.1165 KOps/s $\color{#35bf28}+1.02\%$
test_vmap_mlp_speed[True-False] 0.6934ms 0.4583ms 2.1822 KOps/s 2.1440 KOps/s $\color{#35bf28}+1.78\%$
test_vmap_mlp_speed[False-True] 1.2748ms 0.7616ms 1.3130 KOps/s 1.2905 KOps/s $\color{#35bf28}+1.74\%$
test_vmap_mlp_speed[False-False] 0.4589ms 0.3740ms 2.6737 KOps/s 2.6230 KOps/s $\color{#35bf28}+1.93\%$
test_vmap_mlp_speed_decorator[True-True] 2.0445ms 1.5368ms 650.6942 Ops/s 647.7761 Ops/s $\color{#35bf28}+0.45\%$
test_vmap_mlp_speed_decorator[True-False] 0.9482ms 0.4984ms 2.0064 KOps/s 1.9471 KOps/s $\color{#35bf28}+3.05\%$
test_vmap_mlp_speed_decorator[False-True] 1.8908ms 1.3080ms 764.5038 Ops/s 777.8397 Ops/s $\color{#d91a1a}-1.71\%$
test_vmap_mlp_speed_decorator[False-False] 0.6341ms 0.3829ms 2.6116 KOps/s 2.5512 KOps/s $\color{#35bf28}+2.37\%$
test_to_module_speed[True] 1.6709ms 1.1262ms 887.9559 Ops/s 889.6968 Ops/s $\color{#d91a1a}-0.20\%$
test_to_module_speed[False] 1.5752ms 1.0924ms 915.3871 Ops/s 921.7000 Ops/s $\color{#d91a1a}-0.68\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 134. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1215ms 13.3254μs 75.0445 KOps/s 73.3076 KOps/s $\color{#35bf28}+2.37\%$
test_plain_set_stack_nested 0.1408ms 0.1182ms 8.4584 KOps/s 8.4030 KOps/s $\color{#35bf28}+0.66\%$
test_plain_set_nested_inplace 43.8800μs 14.6049μs 68.4702 KOps/s 66.7570 KOps/s $\color{#35bf28}+2.57\%$
test_plain_set_stack_nested_inplace 0.1944ms 0.1472ms 6.7935 KOps/s 6.7695 KOps/s $\color{#35bf28}+0.35\%$
test_items 21.9600μs 4.8453μs 206.3847 KOps/s 208.1768 KOps/s $\color{#d91a1a}-0.86\%$
test_items_nested 0.4062ms 0.3372ms 2.9657 KOps/s 2.9130 KOps/s $\color{#35bf28}+1.81\%$
test_items_nested_locked 0.3864ms 0.3422ms 2.9226 KOps/s 2.8882 KOps/s $\color{#35bf28}+1.19\%$
test_items_nested_leaf 0.2347ms 0.2016ms 4.9606 KOps/s 4.9402 KOps/s $\color{#35bf28}+0.41\%$
test_items_stack_nested 1.3429ms 1.2900ms 775.2191 Ops/s 759.9864 Ops/s $\color{#35bf28}+2.00\%$
test_items_stack_nested_leaf 1.4915ms 1.1331ms 882.5492 Ops/s 871.2590 Ops/s $\color{#35bf28}+1.30\%$
test_items_stack_nested_locked 0.9451ms 0.8940ms 1.1185 KOps/s 1.1053 KOps/s $\color{#35bf28}+1.20\%$
test_keys 23.1800μs 4.5730μs 218.6742 KOps/s 218.3524 KOps/s $\color{#35bf28}+0.15\%$
test_keys_nested 1.5357ms 94.5610μs 10.5752 KOps/s 10.5576 KOps/s $\color{#35bf28}+0.17\%$
test_keys_nested_locked 0.1662ms 97.9080μs 10.2137 KOps/s 10.2244 KOps/s $\color{#d91a1a}-0.10\%$
test_keys_nested_leaf 0.1787ms 78.0252μs 12.8164 KOps/s 12.7732 KOps/s $\color{#35bf28}+0.34\%$
test_keys_stack_nested 1.2464ms 1.1239ms 889.7583 Ops/s 877.9572 Ops/s $\color{#35bf28}+1.34\%$
test_keys_stack_nested_leaf 1.1864ms 1.1179ms 894.5563 Ops/s 885.8140 Ops/s $\color{#35bf28}+0.99\%$
test_keys_stack_nested_locked 0.7696ms 0.7168ms 1.3952 KOps/s 1.4021 KOps/s $\color{#d91a1a}-0.50\%$
test_values 20.5003μs 1.8948μs 527.7670 KOps/s 526.8235 KOps/s $\color{#35bf28}+0.18\%$
test_values_nested 67.1600μs 45.0419μs 22.2016 KOps/s 22.1734 KOps/s $\color{#35bf28}+0.13\%$
test_values_nested_locked 70.7310μs 47.2074μs 21.1831 KOps/s 21.1759 KOps/s $\color{#35bf28}+0.03\%$
test_values_nested_leaf 65.0610μs 39.4616μs 25.3411 KOps/s 25.0790 KOps/s $\color{#35bf28}+1.04\%$
test_values_stack_nested 1.0009ms 0.9404ms 1.0634 KOps/s 1.0503 KOps/s $\color{#35bf28}+1.24\%$
test_values_stack_nested_leaf 0.9893ms 0.9366ms 1.0677 KOps/s 1.0560 KOps/s $\color{#35bf28}+1.11\%$
test_values_stack_nested_locked 0.6118ms 0.5647ms 1.7709 KOps/s 1.7574 KOps/s $\color{#35bf28}+0.77\%$
test_membership 7.8220μs 0.9547μs 1.0475 MOps/s 1.0354 MOps/s $\color{#35bf28}+1.16\%$
test_membership_nested 17.1400μs 2.9101μs 343.6269 KOps/s 341.9939 KOps/s $\color{#35bf28}+0.48\%$
test_membership_nested_leaf 24.1400μs 2.9206μs 342.4006 KOps/s 344.0994 KOps/s $\color{#d91a1a}-0.49\%$
test_membership_stacked_nested 32.8800μs 11.2724μs 88.7122 KOps/s 89.5531 KOps/s $\color{#d91a1a}-0.94\%$
test_membership_stacked_nested_leaf 30.3400μs 11.3062μs 88.4471 KOps/s 89.4377 KOps/s $\color{#d91a1a}-1.11\%$
test_membership_nested_last 22.8500μs 5.4124μs 184.7625 KOps/s 186.8367 KOps/s $\color{#d91a1a}-1.11\%$
test_membership_nested_leaf_last 30.2400μs 5.4161μs 184.6344 KOps/s 187.1520 KOps/s $\color{#d91a1a}-1.35\%$
test_membership_stacked_nested_last 0.1884ms 0.1558ms 6.4171 KOps/s 6.3663 KOps/s $\color{#35bf28}+0.80\%$
test_membership_stacked_nested_leaf_last 75.7610μs 12.9773μs 77.0575 KOps/s 76.5996 KOps/s $\color{#35bf28}+0.60\%$
test_nested_getleaf 25.7810μs 8.4074μs 118.9429 KOps/s 118.0493 KOps/s $\color{#35bf28}+0.76\%$
test_nested_get 25.1310μs 7.9853μs 125.2302 KOps/s 125.5309 KOps/s $\color{#d91a1a}-0.24\%$
test_stacked_getleaf 0.3799ms 0.3291ms 3.0387 KOps/s 3.0469 KOps/s $\color{#d91a1a}-0.27\%$
test_stacked_get 0.3270ms 0.2980ms 3.3554 KOps/s 3.3861 KOps/s $\color{#d91a1a}-0.90\%$
test_nested_getitemleaf 31.2600μs 9.7689μs 102.3653 KOps/s 101.0930 KOps/s $\color{#35bf28}+1.26\%$
test_nested_getitem 26.3610μs 9.3171μs 107.3290 KOps/s 106.7899 KOps/s $\color{#35bf28}+0.50\%$
test_stacked_getitemleaf 0.3594ms 0.3292ms 3.0373 KOps/s 3.0283 KOps/s $\color{#35bf28}+0.30\%$
test_stacked_getitem 0.3455ms 0.2932ms 3.4102 KOps/s 3.3393 KOps/s $\color{#35bf28}+2.12\%$
test_lock_nested 2.6485ms 0.3483ms 2.8711 KOps/s 2.3499 KOps/s $\textbf{\color{#35bf28}+22.18\%}$
test_lock_stack_nested 96.2637ms 6.4648ms 154.6836 Ops/s 155.9922 Ops/s $\color{#d91a1a}-0.84\%$
test_unlock_nested 0.8186ms 0.3479ms 2.8745 KOps/s 2.8861 KOps/s $\color{#d91a1a}-0.40\%$
test_unlock_stack_nested 95.7152ms 6.4762ms 154.4117 Ops/s 153.3394 Ops/s $\color{#35bf28}+0.70\%$
test_flatten_speed 0.3444ms 0.2613ms 3.8267 KOps/s 3.8234 KOps/s $\color{#35bf28}+0.09\%$
test_unflatten_speed 0.3904ms 0.3616ms 2.7658 KOps/s 2.7808 KOps/s $\color{#d91a1a}-0.54\%$
test_common_ops 1.0323ms 0.5863ms 1.7057 KOps/s 1.6909 KOps/s $\color{#35bf28}+0.88\%$
test_creation 33.1900μs 1.5353μs 651.3189 KOps/s 639.8790 KOps/s $\color{#35bf28}+1.79\%$
test_creation_empty 25.0110μs 7.7950μs 128.2877 KOps/s 124.2766 KOps/s $\color{#35bf28}+3.23\%$
test_creation_nested_1 28.1110μs 9.5088μs 105.1655 KOps/s 100.2298 KOps/s $\color{#35bf28}+4.92\%$
test_creation_nested_2 29.7700μs 11.8189μs 84.6104 KOps/s 81.3765 KOps/s $\color{#35bf28}+3.97\%$
test_clone 58.3610μs 13.8959μs 71.9635 KOps/s 73.5707 KOps/s $\color{#d91a1a}-2.18\%$
test_getitem[int] 29.5900μs 10.8930μs 91.8023 KOps/s 93.8011 KOps/s $\color{#d91a1a}-2.13\%$
test_getitem[slice_int] 42.5410μs 20.9036μs 47.8387 KOps/s 48.1401 KOps/s $\color{#d91a1a}-0.63\%$
test_getitem[range] 0.1346ms 38.8002μs 25.7731 KOps/s 25.4951 KOps/s $\color{#35bf28}+1.09\%$
test_getitem[tuple] 45.9010μs 18.2290μs 54.8575 KOps/s 55.2661 KOps/s $\color{#d91a1a}-0.74\%$
test_getitem[list] 0.1487ms 34.5345μs 28.9566 KOps/s 28.7906 KOps/s $\color{#35bf28}+0.58\%$
test_setitem_dim[int] 41.8910μs 26.0418μs 38.3998 KOps/s 39.3214 KOps/s $\color{#d91a1a}-2.34\%$
test_setitem_dim[slice_int] 64.1210μs 46.0085μs 21.7351 KOps/s 21.4771 KOps/s $\color{#35bf28}+1.20\%$
test_setitem_dim[range] 81.2810μs 63.9180μs 15.6451 KOps/s 15.5615 KOps/s $\color{#35bf28}+0.54\%$
test_setitem_dim[tuple] 57.2800μs 40.1855μs 24.8846 KOps/s 24.8568 KOps/s $\color{#35bf28}+0.11\%$
test_setitem 65.9010μs 18.1019μs 55.2430 KOps/s 55.2798 KOps/s $\color{#d91a1a}-0.07\%$
test_set 59.9010μs 17.9255μs 55.7865 KOps/s 57.9350 KOps/s $\color{#d91a1a}-3.71\%$
test_set_shared 2.7703ms 0.1022ms 9.7846 KOps/s 9.8045 KOps/s $\color{#d91a1a}-0.20\%$
test_update 81.5010μs 19.7715μs 50.5779 KOps/s 50.4563 KOps/s $\color{#35bf28}+0.24\%$
test_update_nested 78.1810μs 27.0593μs 36.9559 KOps/s 38.0896 KOps/s $\color{#d91a1a}-2.98\%$
test_set_nested 71.1100μs 18.9243μs 52.8422 KOps/s 53.3689 KOps/s $\color{#d91a1a}-0.99\%$
test_set_nested_new 68.3510μs 21.4822μs 46.5503 KOps/s 46.9311 KOps/s $\color{#d91a1a}-0.81\%$
test_select 67.9210μs 33.7993μs 29.5864 KOps/s 29.1594 KOps/s $\color{#35bf28}+1.46\%$
test_select_nested 70.9700μs 52.9605μs 18.8820 KOps/s 18.9337 KOps/s $\color{#d91a1a}-0.27\%$
test_exclude_nested 0.1429ms 0.1105ms 9.0485 KOps/s 8.8150 KOps/s $\color{#35bf28}+2.65\%$
test_empty[True] 0.4287ms 0.3830ms 2.6107 KOps/s 2.6185 KOps/s $\color{#d91a1a}-0.30\%$
test_empty[False] 2.4871μs 0.8524μs 1.1731 MOps/s 1.1792 MOps/s $\color{#d91a1a}-0.51\%$
test_to 73.6110μs 52.2386μs 19.1429 KOps/s 18.5499 KOps/s $\color{#35bf28}+3.20\%$
test_to_nonblocking 57.8210μs 33.3505μs 29.9845 KOps/s 30.6436 KOps/s $\color{#d91a1a}-2.15\%$
test_unbind_speed 0.2988ms 0.2637ms 3.7922 KOps/s 3.7654 KOps/s $\color{#35bf28}+0.71\%$
test_unbind_speed_stack0 92.7079ms 3.7395ms 267.4137 Ops/s 240.7729 Ops/s $\textbf{\color{#35bf28}+11.06\%}$
test_unbind_speed_stack1 7.5133μs 1.7027μs 587.3169 KOps/s 543.3973 KOps/s $\textbf{\color{#35bf28}+8.08\%}$
test_split 84.4152ms 1.7552ms 569.7436 Ops/s 663.2665 Ops/s $\textbf{\color{#d91a1a}-14.10\%}$
test_chunk 1.5911ms 1.5127ms 661.0570 Ops/s 611.2366 Ops/s $\textbf{\color{#35bf28}+8.15\%}$
test_creation[device0] 0.1367ms 71.5694μs 13.9725 KOps/s 14.0578 KOps/s $\color{#d91a1a}-0.61\%$
test_creation_from_tensor 0.1295ms 51.8118μs 19.3006 KOps/s 19.1402 KOps/s $\color{#35bf28}+0.84\%$
test_add_one[memmap_tensor0] 0.2096ms 6.7788μs 147.5184 KOps/s 153.8976 KOps/s $\color{#d91a1a}-4.15\%$
test_contiguous[memmap_tensor0] 15.1700μs 0.6134μs 1.6302 MOps/s 1.6199 MOps/s $\color{#35bf28}+0.64\%$
test_stack[memmap_tensor0] 41.6700μs 4.4030μs 227.1165 KOps/s 233.0791 KOps/s $\color{#d91a1a}-2.56\%$
test_memmaptd_index 1.1668ms 0.2536ms 3.9427 KOps/s 3.9078 KOps/s $\color{#35bf28}+0.89\%$
test_memmaptd_index_astensor 83.9739ms 0.3387ms 2.9522 KOps/s 3.2338 KOps/s $\textbf{\color{#d91a1a}-8.71\%}$
test_memmaptd_index_op 0.9000ms 0.5893ms 1.6970 KOps/s 1.6982 KOps/s $\color{#d91a1a}-0.07\%$
test_serialize_model 0.1836s 99.2022ms 10.0804 Ops/s 9.5655 Ops/s $\textbf{\color{#35bf28}+5.38\%}$
test_serialize_model_pickle 1.3689s 1.2385s 0.8074 Ops/s 0.8067 Ops/s $\color{#35bf28}+0.09\%$
test_serialize_weights 90.1453ms 86.3122ms 11.5858 Ops/s 9.6725 Ops/s $\textbf{\color{#35bf28}+19.78\%}$
test_serialize_weights_returnearly 0.3000s 73.5115ms 13.6033 Ops/s 14.1840 Ops/s $\color{#d91a1a}-4.09\%$
test_serialize_weights_pickle 1.3486s 1.2477s 0.8015 Ops/s 0.8545 Ops/s $\textbf{\color{#d91a1a}-6.21\%}$
test_reshape_pytree 0.1574ms 25.0914μs 39.8543 KOps/s 40.9365 KOps/s $\color{#d91a1a}-2.64\%$
test_reshape_td 0.1569ms 31.1347μs 32.1185 KOps/s 32.3849 KOps/s $\color{#d91a1a}-0.82\%$
test_view_pytree 0.2238ms 24.0122μs 41.6454 KOps/s 41.6232 KOps/s $\color{#35bf28}+0.05\%$
test_view_td 91.6040ms 11.4199μs 87.5661 KOps/s 148.3977 KOps/s $\textbf{\color{#d91a1a}-40.99\%}$
test_unbind_pytree 0.2448ms 30.5713μs 32.7104 KOps/s 33.3340 KOps/s $\color{#d91a1a}-1.87\%$
test_unbind_td 0.1186ms 39.4565μs 25.3444 KOps/s 25.3649 KOps/s $\color{#d91a1a}-0.08\%$
test_split_pytree 48.9700μs 28.2311μs 35.4219 KOps/s 35.2747 KOps/s $\color{#35bf28}+0.42\%$
test_split_td 0.4502ms 38.4046μs 26.0386 KOps/s 26.0762 KOps/s $\color{#d91a1a}-0.14\%$
test_add_pytree 63.7400μs 36.0310μs 27.7539 KOps/s 28.3258 KOps/s $\color{#d91a1a}-2.02\%$
test_add_td 0.2608ms 48.9079μs 20.4466 KOps/s 20.4425 KOps/s $\color{#35bf28}+0.02\%$
test_distributed 0.1839ms 71.3877μs 14.0080 KOps/s 14.4348 KOps/s $\color{#d91a1a}-2.96\%$
test_tdmodule 0.1973ms 18.1062μs 55.2296 KOps/s 54.6910 KOps/s $\color{#35bf28}+0.98\%$
test_tdmodule_dispatch 0.2070ms 36.5373μs 27.3693 KOps/s 26.5050 KOps/s $\color{#35bf28}+3.26\%$
test_tdseq 0.2456ms 21.0059μs 47.6058 KOps/s 47.7349 KOps/s $\color{#d91a1a}-0.27\%$
test_tdseq_dispatch 59.0610μs 38.5467μs 25.9425 KOps/s 26.2168 KOps/s $\color{#d91a1a}-1.05\%$
test_instantiation_functorch 1.8652ms 1.6556ms 604.0117 Ops/s 607.3178 Ops/s $\color{#d91a1a}-0.54\%$
test_instantiation_td 1.6833ms 1.1561ms 864.9674 Ops/s 866.4471 Ops/s $\color{#d91a1a}-0.17\%$
test_exec_functorch 0.2033ms 0.1601ms 6.2450 KOps/s 6.4126 KOps/s $\color{#d91a1a}-2.61\%$
test_exec_functional_call 0.3702ms 0.1583ms 6.3168 KOps/s 6.3616 KOps/s $\color{#d91a1a}-0.70\%$
test_exec_td 0.2093ms 0.1487ms 6.7268 KOps/s 6.8953 KOps/s $\color{#d91a1a}-2.44\%$
test_exec_td_decorator 0.3839ms 0.1816ms 5.5058 KOps/s 5.6112 KOps/s $\color{#d91a1a}-1.88\%$
test_vmap_mlp_speed[True-True] 1.3222ms 1.0681ms 936.2412 Ops/s 937.1885 Ops/s $\color{#d91a1a}-0.10\%$
test_vmap_mlp_speed[True-False] 0.8219ms 0.6164ms 1.6224 KOps/s 1.6432 KOps/s $\color{#d91a1a}-1.26\%$
test_vmap_mlp_speed[False-True] 1.1713ms 0.9609ms 1.0407 KOps/s 1.0490 KOps/s $\color{#d91a1a}-0.79\%$
test_vmap_mlp_speed[False-False] 0.7492ms 0.5454ms 1.8334 KOps/s 1.8576 KOps/s $\color{#d91a1a}-1.30\%$
test_vmap_mlp_speed_decorator[True-True] 2.0766ms 1.8305ms 546.2918 Ops/s 545.9901 Ops/s $\color{#35bf28}+0.06\%$
test_vmap_mlp_speed_decorator[True-False] 1.0436ms 0.6438ms 1.5534 KOps/s 1.5525 KOps/s $\color{#35bf28}+0.05\%$
test_vmap_mlp_speed_decorator[False-True] 1.8592ms 1.5733ms 635.6015 Ops/s 638.8065 Ops/s $\color{#d91a1a}-0.50\%$
test_vmap_mlp_speed_decorator[False-False] 0.8216ms 0.5572ms 1.7946 KOps/s 1.8179 KOps/s $\color{#d91a1a}-1.28\%$
test_vmap_transformer_speed[True-True] 12.4900ms 12.2846ms 81.4027 Ops/s 81.5125 Ops/s $\color{#d91a1a}-0.13\%$
test_vmap_transformer_speed[True-False] 8.4633ms 8.1987ms 121.9701 Ops/s 123.3748 Ops/s $\color{#d91a1a}-1.14\%$
test_vmap_transformer_speed[False-True] 12.3811ms 12.1860ms 82.0612 Ops/s 82.3279 Ops/s $\color{#d91a1a}-0.32\%$
test_vmap_transformer_speed[False-False] 8.3386ms 8.1115ms 123.2816 Ops/s 121.1530 Ops/s $\color{#35bf28}+1.76\%$
test_vmap_transformer_speed_decorator[True-True] 59.9032ms 59.0585ms 16.9324 Ops/s 16.5266 Ops/s $\color{#35bf28}+2.46\%$
test_vmap_transformer_speed_decorator[True-False] 20.2408ms 19.7411ms 50.6556 Ops/s 50.5790 Ops/s $\color{#35bf28}+0.15\%$
test_vmap_transformer_speed_decorator[False-True] 53.7617ms 53.1578ms 18.8119 Ops/s 18.4677 Ops/s $\color{#35bf28}+1.86\%$
test_vmap_transformer_speed_decorator[False-False] 19.7131ms 19.2133ms 52.0472 Ops/s 51.5293 Ops/s $\color{#35bf28}+1.01\%$
test_to_module_speed[True] 1.2056ms 1.0106ms 989.5237 Ops/s 964.0772 Ops/s $\color{#35bf28}+2.64\%$
test_to_module_speed[False] 1.2249ms 0.9812ms 1.0192 KOps/s 995.3855 Ops/s $\color{#35bf28}+2.39\%$

@vmoens vmoens added the enhancement New feature or request label Feb 15, 2024
@vmoens vmoens merged commit b46f79b into main Feb 15, 2024
47 of 48 checks passed
@vmoens vmoens deleted the from_modules branch February 15, 2024 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants