Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Mean, std, var, prod, sum #751

Merged
merged 3 commits into from
Apr 25, 2024
Merged

[Feature] Mean, std, var, prod, sum #751

merged 3 commits into from
Apr 25, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Apr 25, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 25, 2024
@vmoens vmoens added the enhancement New feature or request label Apr 25, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}26$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 43.6820μs 15.5941μs 64.1268 KOps/s 57.3191 KOps/s $\textbf{\color{#35bf28}+11.88\%}$
test_plain_set_stack_nested 48.5300μs 15.8727μs 63.0011 KOps/s 57.1130 KOps/s $\textbf{\color{#35bf28}+10.31\%}$
test_plain_set_nested_inplace 63.0680μs 17.8497μs 56.0232 KOps/s 50.6504 KOps/s $\textbf{\color{#35bf28}+10.61\%}$
test_plain_set_stack_nested_inplace 44.6040μs 17.9397μs 55.7424 KOps/s 50.3870 KOps/s $\textbf{\color{#35bf28}+10.63\%}$
test_items 34.8860μs 2.5026μs 399.5775 KOps/s 388.3435 KOps/s $\color{#35bf28}+2.89\%$
test_items_nested 0.3715ms 0.2682ms 3.7284 KOps/s 3.7418 KOps/s $\color{#d91a1a}-0.36\%$
test_items_nested_locked 1.1616ms 0.2685ms 3.7244 KOps/s 3.7898 KOps/s $\color{#d91a1a}-1.73\%$
test_items_nested_leaf 0.1246ms 77.4345μs 12.9141 KOps/s 12.9023 KOps/s $\color{#35bf28}+0.09\%$
test_items_stack_nested 1.2751ms 0.2728ms 3.6662 KOps/s 3.7068 KOps/s $\color{#d91a1a}-1.09\%$
test_items_stack_nested_leaf 0.1245ms 79.2081μs 12.6250 KOps/s 12.8871 KOps/s $\color{#d91a1a}-2.03\%$
test_items_stack_nested_locked 0.3513ms 0.2712ms 3.6867 KOps/s 3.7454 KOps/s $\color{#d91a1a}-1.57\%$
test_keys 29.8960μs 3.8118μs 262.3419 KOps/s 263.1771 KOps/s $\color{#d91a1a}-0.32\%$
test_keys_nested 0.2657ms 0.1397ms 7.1577 KOps/s 7.1405 KOps/s $\color{#35bf28}+0.24\%$
test_keys_nested_locked 0.6989ms 0.1453ms 6.8820 KOps/s 6.9348 KOps/s $\color{#d91a1a}-0.76\%$
test_keys_nested_leaf 0.2273ms 0.1207ms 8.2880 KOps/s 8.3756 KOps/s $\color{#d91a1a}-1.05\%$
test_keys_stack_nested 0.2418ms 0.1386ms 7.2151 KOps/s 7.2446 KOps/s $\color{#d91a1a}-0.41\%$
test_keys_stack_nested_leaf 0.2009ms 0.1184ms 8.4491 KOps/s 8.5168 KOps/s $\color{#d91a1a}-0.79\%$
test_keys_stack_nested_locked 0.2881ms 0.1435ms 6.9703 KOps/s 7.0786 KOps/s $\color{#d91a1a}-1.53\%$
test_values 8.4508μs 1.1820μs 846.0056 KOps/s 854.9590 KOps/s $\color{#d91a1a}-1.05\%$
test_values_nested 0.1027ms 51.9874μs 19.2354 KOps/s 19.7762 KOps/s $\color{#d91a1a}-2.73\%$
test_values_nested_locked 0.1040ms 52.2741μs 19.1299 KOps/s 19.5618 KOps/s $\color{#d91a1a}-2.21\%$
test_values_nested_leaf 90.2680μs 47.2207μs 21.1771 KOps/s 21.8647 KOps/s $\color{#d91a1a}-3.14\%$
test_values_stack_nested 92.9830μs 52.9867μs 18.8726 KOps/s 19.0310 KOps/s $\color{#d91a1a}-0.83\%$
test_values_stack_nested_leaf 99.8860μs 47.1640μs 21.2026 KOps/s 21.9455 KOps/s $\color{#d91a1a}-3.38\%$
test_values_stack_nested_locked 96.9210μs 52.5029μs 19.0466 KOps/s 19.0423 KOps/s $\color{#35bf28}+0.02\%$
test_membership 38.3420μs 1.3206μs 757.2047 KOps/s 743.9440 KOps/s $\color{#35bf28}+1.78\%$
test_membership_nested 36.3970μs 3.4455μs 290.2298 KOps/s 291.6152 KOps/s $\color{#d91a1a}-0.48\%$
test_membership_nested_leaf 50.4230μs 3.4669μs 288.4397 KOps/s 288.9608 KOps/s $\color{#d91a1a}-0.18\%$
test_membership_stacked_nested 33.4610μs 3.3715μs 296.6059 KOps/s 289.5696 KOps/s $\color{#35bf28}+2.43\%$
test_membership_stacked_nested_leaf 51.7790μs 3.3750μs 296.2978 KOps/s 288.9864 KOps/s $\color{#35bf28}+2.53\%$
test_membership_nested_last 40.9620μs 4.1193μs 242.7597 KOps/s 239.4789 KOps/s $\color{#35bf28}+1.37\%$
test_membership_nested_leaf_last 28.1620μs 4.1178μs 242.8485 KOps/s 237.1295 KOps/s $\color{#35bf28}+2.41\%$
test_membership_stacked_nested_last 44.1820μs 5.8336μs 171.4222 KOps/s 239.0564 KOps/s $\textbf{\color{#d91a1a}-28.29\%}$
test_membership_stacked_nested_leaf_last 42.1690μs 5.8471μs 171.0236 KOps/s 228.6540 KOps/s $\textbf{\color{#d91a1a}-25.20\%}$
test_nested_getleaf 51.4560μs 10.7764μs 92.7950 KOps/s 92.6408 KOps/s $\color{#35bf28}+0.17\%$
test_nested_get 36.6180μs 10.2307μs 97.7454 KOps/s 100.2448 KOps/s $\color{#d91a1a}-2.49\%$
test_stacked_getleaf 44.0430μs 10.8321μs 92.3180 KOps/s 95.8576 KOps/s $\color{#d91a1a}-3.69\%$
test_stacked_get 48.5100μs 10.1989μs 98.0499 KOps/s 101.4168 KOps/s $\color{#d91a1a}-3.32\%$
test_nested_getitemleaf 57.9280μs 11.3090μs 88.4250 KOps/s 90.0158 KOps/s $\color{#d91a1a}-1.77\%$
test_nested_getitem 50.6050μs 10.5797μs 94.5206 KOps/s 97.9533 KOps/s $\color{#d91a1a}-3.50\%$
test_stacked_getitemleaf 36.2680μs 11.3052μs 88.4545 KOps/s 91.0757 KOps/s $\color{#d91a1a}-2.88\%$
test_stacked_getitem 44.3720μs 10.4776μs 95.4413 KOps/s 99.2857 KOps/s $\color{#d91a1a}-3.87\%$
test_lock_nested 50.0657ms 0.4025ms 2.4842 KOps/s 2.8681 KOps/s $\textbf{\color{#d91a1a}-13.38\%}$
test_lock_stack_nested 0.4439ms 0.3092ms 3.2345 KOps/s 3.3332 KOps/s $\color{#d91a1a}-2.96\%$
test_unlock_nested 0.6714ms 0.3500ms 2.8569 KOps/s 2.5128 KOps/s $\textbf{\color{#35bf28}+13.69\%}$
test_unlock_stack_nested 0.5474ms 0.3156ms 3.1682 KOps/s 3.2277 KOps/s $\color{#d91a1a}-1.84\%$
test_flatten_speed 0.1762ms 96.8101μs 10.3295 KOps/s 10.4196 KOps/s $\color{#d91a1a}-0.87\%$
test_unflatten_speed 0.4660ms 0.4094ms 2.4426 KOps/s 2.4394 KOps/s $\color{#35bf28}+0.13\%$
test_common_ops 3.9118ms 0.6728ms 1.4864 KOps/s 1.3461 KOps/s $\textbf{\color{#35bf28}+10.42\%}$
test_creation 17.9030μs 1.8958μs 527.4780 KOps/s 518.2722 KOps/s $\color{#35bf28}+1.78\%$
test_creation_empty 28.3830μs 8.2923μs 120.5940 KOps/s 83.7641 KOps/s $\textbf{\color{#35bf28}+43.97\%}$
test_creation_nested_1 33.3320μs 10.9038μs 91.7114 KOps/s 68.8736 KOps/s $\textbf{\color{#35bf28}+33.16\%}$
test_creation_nested_2 38.9020μs 14.4878μs 69.0237 KOps/s 56.0587 KOps/s $\textbf{\color{#35bf28}+23.13\%}$
test_clone 80.4500μs 13.1999μs 75.7582 KOps/s 74.3567 KOps/s $\color{#35bf28}+1.88\%$
test_getitem[int] 53.3200μs 11.4845μs 87.0738 KOps/s 88.1567 KOps/s $\color{#d91a1a}-1.23\%$
test_getitem[slice_int] 60.6330μs 22.8411μs 43.7807 KOps/s 44.8077 KOps/s $\color{#d91a1a}-2.29\%$
test_getitem[range] 85.1280μs 59.9076μs 16.6924 KOps/s 16.8666 KOps/s $\color{#d91a1a}-1.03\%$
test_getitem[tuple] 67.0050μs 19.2015μs 52.0793 KOps/s 53.2564 KOps/s $\color{#d91a1a}-2.21\%$
test_getitem[list] 0.1038ms 39.7367μs 25.1656 KOps/s 24.4149 KOps/s $\color{#35bf28}+3.07\%$
test_setitem_dim[int] 63.8890μs 30.9564μs 32.3035 KOps/s 27.3101 KOps/s $\textbf{\color{#35bf28}+18.28\%}$
test_setitem_dim[slice_int] 0.1180ms 57.5301μs 17.3822 KOps/s 16.0635 KOps/s $\textbf{\color{#35bf28}+8.21\%}$
test_setitem_dim[range] 0.1513ms 80.3985μs 12.4380 KOps/s 11.7002 KOps/s $\textbf{\color{#35bf28}+6.31\%}$
test_setitem_dim[tuple] 0.1376ms 47.7225μs 20.9545 KOps/s 19.4062 KOps/s $\textbf{\color{#35bf28}+7.98\%}$
test_setitem 57.9280μs 19.0931μs 52.3749 KOps/s 44.6976 KOps/s $\textbf{\color{#35bf28}+17.18\%}$
test_set 61.8850μs 18.3062μs 54.6263 KOps/s 50.2670 KOps/s $\textbf{\color{#35bf28}+8.67\%}$
test_set_shared 1.5974ms 0.1398ms 7.1512 KOps/s 7.0509 KOps/s $\color{#35bf28}+1.42\%$
test_update 92.6730μs 19.0342μs 52.5369 KOps/s 44.1302 KOps/s $\textbf{\color{#35bf28}+19.05\%}$
test_update_nested 89.3070μs 27.7299μs 36.0622 KOps/s 31.8276 KOps/s $\textbf{\color{#35bf28}+13.30\%}$
test_update__nested 67.7060μs 25.5174μs 39.1889 KOps/s 40.3808 KOps/s $\color{#d91a1a}-2.95\%$
test_set_nested 0.1044ms 20.7640μs 48.1603 KOps/s 41.5844 KOps/s $\textbf{\color{#35bf28}+15.81\%}$
test_set_nested_new 88.1040μs 24.7131μs 40.4644 KOps/s 38.6964 KOps/s $\color{#35bf28}+4.57\%$
test_select 82.6340μs 39.4738μs 25.3332 KOps/s 24.2866 KOps/s $\color{#35bf28}+4.31\%$
test_select_nested 0.1161ms 61.0794μs 16.3721 KOps/s 16.4481 KOps/s $\color{#d91a1a}-0.46\%$
test_exclude_nested 0.2244ms 0.1207ms 8.2859 KOps/s 8.2563 KOps/s $\color{#35bf28}+0.36\%$
test_empty[True] 0.8383ms 0.3957ms 2.5270 KOps/s 2.5223 KOps/s $\color{#35bf28}+0.19\%$
test_empty[False] 6.6382μs 1.0817μs 924.5047 KOps/s 907.0324 KOps/s $\color{#35bf28}+1.93\%$
test_unbind_speed 0.4697ms 0.2724ms 3.6716 KOps/s 3.8828 KOps/s $\textbf{\color{#d91a1a}-5.44\%}$
test_unbind_speed_stack0 0.4623ms 0.2532ms 3.9493 KOps/s 4.0527 KOps/s $\color{#d91a1a}-2.55\%$
test_unbind_speed_stack1 62.6771ms 0.7819ms 1.2789 KOps/s 1.3198 KOps/s $\color{#d91a1a}-3.10\%$
test_split 68.2792ms 1.6072ms 622.1874 Ops/s 631.5247 Ops/s $\color{#d91a1a}-1.48\%$
test_chunk 67.7677ms 1.5992ms 625.3309 Ops/s 627.3008 Ops/s $\color{#d91a1a}-0.31\%$
test_creation[device0] 0.1903ms 0.1024ms 9.7666 KOps/s 9.4645 KOps/s $\color{#35bf28}+3.19\%$
test_creation_from_tensor 3.3669ms 84.3126μs 11.8606 KOps/s 12.1495 KOps/s $\color{#d91a1a}-2.38\%$
test_add_one[memmap_tensor0] 84.4280μs 5.4686μs 182.8620 KOps/s 181.2610 KOps/s $\color{#35bf28}+0.88\%$
test_contiguous[memmap_tensor0] 13.8150μs 0.6306μs 1.5858 MOps/s 1.5272 MOps/s $\color{#35bf28}+3.84\%$
test_stack[memmap_tensor0] 19.7770μs 3.6279μs 275.6393 KOps/s 276.8634 KOps/s $\color{#d91a1a}-0.44\%$
test_memmaptd_index 1.0047ms 0.2441ms 4.0969 KOps/s 4.2115 KOps/s $\color{#d91a1a}-2.72\%$
test_memmaptd_index_astensor 0.5700ms 0.3198ms 3.1269 KOps/s 3.1964 KOps/s $\color{#d91a1a}-2.17\%$
test_memmaptd_index_op 0.9344ms 0.5724ms 1.7470 KOps/s 1.6137 KOps/s $\textbf{\color{#35bf28}+8.26\%}$
test_serialize_model 0.1811s 0.1109s 9.0144 Ops/s 8.8222 Ops/s $\color{#35bf28}+2.18\%$
test_serialize_model_pickle 0.4478s 0.3765s 2.6562 Ops/s 2.5887 Ops/s $\color{#35bf28}+2.61\%$
test_serialize_weights 0.1654s 0.1091s 9.1633 Ops/s 9.2169 Ops/s $\color{#d91a1a}-0.58\%$
test_serialize_weights_returnearly 0.2067s 0.1315s 7.6063 Ops/s 7.1782 Ops/s $\textbf{\color{#35bf28}+5.96\%}$
test_serialize_weights_pickle 1.0343s 0.5776s 1.7314 Ops/s 1.5453 Ops/s $\textbf{\color{#35bf28}+12.04\%}$
test_serialize_weights_filesystem 98.5832ms 93.9074ms 10.6488 Ops/s 10.9545 Ops/s $\color{#d91a1a}-2.79\%$
test_serialize_model_filesystem 0.1729s 0.1010s 9.8975 Ops/s 9.9503 Ops/s $\color{#d91a1a}-0.53\%$
test_reshape_pytree 71.1930μs 25.7518μs 38.8322 KOps/s 39.0240 KOps/s $\color{#d91a1a}-0.49\%$
test_reshape_td 77.9050μs 33.3648μs 29.9717 KOps/s 30.3408 KOps/s $\color{#d91a1a}-1.22\%$
test_view_pytree 87.7830μs 25.6367μs 39.0066 KOps/s 39.3201 KOps/s $\color{#d91a1a}-0.80\%$
test_view_td 79.9990μs 37.1757μs 26.8993 KOps/s 27.3064 KOps/s $\color{#d91a1a}-1.49\%$
test_unbind_pytree 71.8840μs 29.2293μs 34.2123 KOps/s 34.1563 KOps/s $\color{#35bf28}+0.16\%$
test_unbind_td 0.4514ms 38.0342μs 26.2921 KOps/s 26.5659 KOps/s $\color{#d91a1a}-1.03\%$
test_split_pytree 77.3540μs 29.3833μs 34.0329 KOps/s 33.9965 KOps/s $\color{#35bf28}+0.11\%$
test_split_td 0.1291ms 40.7490μs 24.5405 KOps/s 25.1461 KOps/s $\color{#d91a1a}-2.41\%$
test_add_pytree 87.6540μs 34.7568μs 28.7713 KOps/s 28.5071 KOps/s $\color{#35bf28}+0.93\%$
test_add_td 0.1189ms 49.0905μs 20.3705 KOps/s 18.0353 KOps/s $\textbf{\color{#35bf28}+12.95\%}$
test_distributed 0.1852ms 0.1001ms 9.9892 KOps/s 9.7472 KOps/s $\color{#35bf28}+2.48\%$
test_tdmodule 79.4480μs 16.3803μs 61.0489 KOps/s 55.0139 KOps/s $\textbf{\color{#35bf28}+10.97\%}$
test_tdmodule_dispatch 51.7660μs 32.3511μs 30.9108 KOps/s 27.9575 KOps/s $\textbf{\color{#35bf28}+10.56\%}$
test_tdseq 33.3820μs 18.8501μs 53.0501 KOps/s 47.0345 KOps/s $\textbf{\color{#35bf28}+12.79\%}$
test_tdseq_dispatch 72.1850μs 36.5211μs 27.3814 KOps/s 23.6562 KOps/s $\textbf{\color{#35bf28}+15.75\%}$
test_instantiation_functorch 2.2425ms 1.3175ms 759.0398 Ops/s 752.1198 Ops/s $\color{#35bf28}+0.92\%$
test_instantiation_td 1.6292ms 1.0135ms 986.6741 Ops/s 986.3194 Ops/s $\color{#35bf28}+0.04\%$
test_exec_functorch 0.2891ms 0.1589ms 6.2949 KOps/s 6.1634 KOps/s $\color{#35bf28}+2.13\%$
test_exec_functional_call 0.3013ms 0.1458ms 6.8582 KOps/s 6.6787 KOps/s $\color{#35bf28}+2.69\%$
test_exec_td 0.2709ms 0.1443ms 6.9295 KOps/s 6.6676 KOps/s $\color{#35bf28}+3.93\%$
test_exec_td_decorator 0.6230ms 0.2159ms 4.6310 KOps/s 4.4367 KOps/s $\color{#35bf28}+4.38\%$
test_vmap_mlp_speed[True-True] 0.6902ms 0.4762ms 2.1000 KOps/s 2.0297 KOps/s $\color{#35bf28}+3.46\%$
test_vmap_mlp_speed[True-False] 0.9479ms 0.4896ms 2.0425 KOps/s 2.0389 KOps/s $\color{#35bf28}+0.17\%$
test_vmap_mlp_speed[False-True] 0.5931ms 0.3880ms 2.5771 KOps/s 2.4961 KOps/s $\color{#35bf28}+3.25\%$
test_vmap_mlp_speed[False-False] 0.6355ms 0.3890ms 2.5708 KOps/s 2.4899 KOps/s $\color{#35bf28}+3.25\%$
test_vmap_mlp_speed_decorator[True-True] 1.3230ms 0.5449ms 1.8351 KOps/s 1.7778 KOps/s $\color{#35bf28}+3.22\%$
test_vmap_mlp_speed_decorator[True-False] 0.8233ms 0.5467ms 1.8291 KOps/s 1.7635 KOps/s $\color{#35bf28}+3.72\%$
test_vmap_mlp_speed_decorator[False-True] 0.7053ms 0.4536ms 2.2044 KOps/s 2.1695 KOps/s $\color{#35bf28}+1.61\%$
test_vmap_mlp_speed_decorator[False-False] 0.6467ms 0.4526ms 2.2093 KOps/s 2.1613 KOps/s $\color{#35bf28}+2.22\%$
test_to_module_speed[True] 1.8791ms 1.6873ms 592.6665 Ops/s 592.1087 Ops/s $\color{#35bf28}+0.09\%$
test_to_module_speed[False] 1.7683ms 1.6619ms 601.7201 Ops/s 602.8972 Ops/s $\color{#d91a1a}-0.20\%$

@vmoens vmoens merged commit 3f27d79 into main Apr 25, 2024
37 of 46 checks passed
@vmoens vmoens deleted the sum-mean-prod branch April 25, 2024 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants