Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Faster to #524

Merged
merged 11 commits into from
Oct 9, 2023
Merged

[Performance] Faster to #524

merged 11 commits into from
Oct 9, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Sep 12, 2023

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 12, 2023
remove profile
@github-actions
Copy link

github-actions bot commented Sep 12, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 105. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 34.8000μs 20.2308μs 49.4297 KOps/s 50.7316 KOps/s $\color{#d91a1a}-2.57\%$
test_plain_set_stack_nested 0.2084ms 0.1858ms 5.3811 KOps/s 5.5108 KOps/s $\color{#d91a1a}-2.35\%$
test_plain_set_nested_inplace 49.5000μs 23.6173μs 42.3418 KOps/s 43.1572 KOps/s $\color{#d91a1a}-1.89\%$
test_plain_set_stack_nested_inplace 0.3676ms 0.2203ms 4.5384 KOps/s 4.6359 KOps/s $\color{#d91a1a}-2.10\%$
test_items 0.1385ms 3.1021μs 322.3640 KOps/s 320.9234 KOps/s $\color{#35bf28}+0.45\%$
test_items_nested 0.4512ms 0.3998ms 2.5015 KOps/s 2.6172 KOps/s $\color{#d91a1a}-4.42\%$
test_items_nested_locked 0.4854ms 0.3962ms 2.5238 KOps/s 2.7095 KOps/s $\textbf{\color{#d91a1a}-6.86\%}$
test_items_nested_leaf 0.8595ms 0.2377ms 4.2070 KOps/s 4.4756 KOps/s $\textbf{\color{#d91a1a}-6.00\%}$
test_items_stack_nested 1.9654ms 1.9045ms 525.0760 Ops/s 534.0572 Ops/s $\color{#d91a1a}-1.68\%$
test_items_stack_nested_leaf 1.7734ms 1.7255ms 579.5478 Ops/s 592.7155 Ops/s $\color{#d91a1a}-2.22\%$
test_items_stack_nested_locked 1.0487ms 0.9704ms 1.0305 KOps/s 1.0670 KOps/s $\color{#d91a1a}-3.42\%$
test_keys 30.6000μs 4.6122μs 216.8161 KOps/s 208.2717 KOps/s $\color{#35bf28}+4.10\%$
test_keys_nested 0.8103ms 0.1726ms 5.7950 KOps/s 5.4111 KOps/s $\textbf{\color{#35bf28}+7.10\%}$
test_keys_nested_locked 0.2472ms 0.1714ms 5.8336 KOps/s 5.8324 KOps/s $\color{#35bf28}+0.02\%$
test_keys_nested_leaf 0.2795ms 0.1696ms 5.8952 KOps/s 5.7950 KOps/s $\color{#35bf28}+1.73\%$
test_keys_stack_nested 1.8156ms 1.7437ms 573.4861 Ops/s 579.1397 Ops/s $\color{#d91a1a}-0.98\%$
test_keys_stack_nested_leaf 1.9169ms 1.7795ms 561.9645 Ops/s 576.1767 Ops/s $\color{#d91a1a}-2.47\%$
test_keys_stack_nested_locked 0.8231ms 0.7776ms 1.2860 KOps/s 1.2800 KOps/s $\color{#35bf28}+0.47\%$
test_values 32.4000μs 1.3684μs 730.7777 KOps/s 736.8762 KOps/s $\color{#d91a1a}-0.83\%$
test_values_nested 0.1701ms 64.5790μs 15.4849 KOps/s 15.3700 KOps/s $\color{#35bf28}+0.75\%$
test_values_nested_locked 0.1348ms 64.7857μs 15.4355 KOps/s 15.3902 KOps/s $\color{#35bf28}+0.29\%$
test_values_nested_leaf 79.7000μs 56.0727μs 17.8340 KOps/s 17.3484 KOps/s $\color{#35bf28}+2.80\%$
test_values_stack_nested 1.5765ms 1.5218ms 657.1349 Ops/s 664.5738 Ops/s $\color{#d91a1a}-1.12\%$
test_values_stack_nested_leaf 1.6144ms 1.5075ms 663.3497 Ops/s 669.5039 Ops/s $\color{#d91a1a}-0.92\%$
test_values_stack_nested_locked 0.7193ms 0.6333ms 1.5790 KOps/s 1.5821 KOps/s $\color{#d91a1a}-0.20\%$
test_membership 17.0000μs 1.8511μs 540.2073 KOps/s 547.0552 KOps/s $\color{#d91a1a}-1.25\%$
test_membership_nested 25.7990μs 3.7146μs 269.2053 KOps/s 267.9222 KOps/s $\color{#35bf28}+0.48\%$
test_membership_nested_leaf 24.7990μs 3.7509μs 266.6061 KOps/s 267.7303 KOps/s $\color{#d91a1a}-0.42\%$
test_membership_stacked_nested 42.5000μs 14.6363μs 68.3234 KOps/s 67.9492 KOps/s $\color{#35bf28}+0.55\%$
test_membership_stacked_nested_leaf 68.6000μs 14.7768μs 67.6739 KOps/s 68.0597 KOps/s $\color{#d91a1a}-0.57\%$
test_membership_nested_last 29.8000μs 7.5376μs 132.6685 KOps/s 132.0954 KOps/s $\color{#35bf28}+0.43\%$
test_membership_nested_leaf_last 25.0000μs 7.5858μs 131.8251 KOps/s 132.9267 KOps/s $\color{#d91a1a}-0.83\%$
test_membership_stacked_nested_last 0.2523ms 0.2239ms 4.4658 KOps/s 4.4826 KOps/s $\color{#d91a1a}-0.37\%$
test_membership_stacked_nested_leaf_last 43.7000μs 17.0746μs 58.5666 KOps/s 59.0854 KOps/s $\color{#d91a1a}-0.88\%$
test_nested_getleaf 59.8990μs 15.4726μs 64.6303 KOps/s 64.8957 KOps/s $\color{#d91a1a}-0.41\%$
test_nested_get 38.5000μs 14.7708μs 67.7011 KOps/s 68.3613 KOps/s $\color{#d91a1a}-0.97\%$
test_stacked_getleaf 0.9346ms 0.8226ms 1.2156 KOps/s 1.2336 KOps/s $\color{#d91a1a}-1.46\%$
test_stacked_get 0.8174ms 0.7836ms 1.2762 KOps/s 1.2918 KOps/s $\color{#d91a1a}-1.21\%$
test_nested_getitemleaf 56.3000μs 15.4703μs 64.6398 KOps/s 64.8968 KOps/s $\color{#d91a1a}-0.40\%$
test_nested_getitem 36.0000μs 14.7697μs 67.7061 KOps/s 67.9172 KOps/s $\color{#d91a1a}-0.31\%$
test_stacked_getitemleaf 0.9371ms 0.8252ms 1.2118 KOps/s 1.2323 KOps/s $\color{#d91a1a}-1.67\%$
test_stacked_getitem 0.8163ms 0.7841ms 1.2753 KOps/s 1.2895 KOps/s $\color{#d91a1a}-1.10\%$
test_lock_nested 63.6663ms 1.4560ms 686.7906 Ops/s 721.3215 Ops/s $\color{#d91a1a}-4.79\%$
test_lock_stack_nested 83.1413ms 18.7845ms 53.2355 Ops/s 52.5114 Ops/s $\color{#35bf28}+1.38\%$
test_unlock_nested 59.8542ms 1.4587ms 685.5324 Ops/s 688.3126 Ops/s $\color{#d91a1a}-0.40\%$
test_unlock_stack_nested 81.5671ms 19.2343ms 51.9904 Ops/s 51.6508 Ops/s $\color{#35bf28}+0.66\%$
test_flatten_speed 0.9984ms 0.9692ms 1.0318 KOps/s 1.0244 KOps/s $\color{#35bf28}+0.72\%$
test_unflatten_speed 1.7978ms 1.6960ms 589.6255 Ops/s 580.5905 Ops/s $\color{#35bf28}+1.56\%$
test_common_ops 4.6313ms 1.0200ms 980.3550 Ops/s 984.7005 Ops/s $\color{#d91a1a}-0.44\%$
test_creation 30.1000μs 6.0295μs 165.8515 KOps/s 165.4722 KOps/s $\color{#35bf28}+0.23\%$
test_creation_empty 28.8000μs 13.3710μs 74.7885 KOps/s 74.1870 KOps/s $\color{#35bf28}+0.81\%$
test_creation_nested_1 47.3000μs 22.6736μs 44.1042 KOps/s 43.6574 KOps/s $\color{#35bf28}+1.02\%$
test_creation_nested_2 48.6990μs 25.6770μs 38.9453 KOps/s 38.7466 KOps/s $\color{#35bf28}+0.51\%$
test_clone 0.1138ms 24.1760μs 41.3634 KOps/s 41.9313 KOps/s $\color{#d91a1a}-1.35\%$
test_getitem[int] 54.6000μs 26.9507μs 37.1048 KOps/s 37.0899 KOps/s $\color{#35bf28}+0.04\%$
test_getitem[slice_int] 0.1226ms 50.5109μs 19.7977 KOps/s 20.2308 KOps/s $\color{#d91a1a}-2.14\%$
test_getitem[range] 0.1267ms 76.5767μs 13.0588 KOps/s 13.1814 KOps/s $\color{#d91a1a}-0.93\%$
test_getitem[tuple] 89.3000μs 41.3595μs 24.1782 KOps/s 24.2878 KOps/s $\color{#d91a1a}-0.45\%$
test_getitem[list] 0.2942ms 71.7151μs 13.9441 KOps/s 13.9350 KOps/s $\color{#35bf28}+0.07\%$
test_setitem_dim[int] 47.6000μs 33.8322μs 29.5576 KOps/s 31.2000 KOps/s $\textbf{\color{#d91a1a}-5.26\%}$
test_setitem_dim[slice_int] 89.9000μs 59.7446μs 16.7379 KOps/s 17.0793 KOps/s $\color{#d91a1a}-2.00\%$
test_setitem_dim[range] 0.1172ms 79.1585μs 12.6329 KOps/s 12.8776 KOps/s $\color{#d91a1a}-1.90\%$
test_setitem_dim[tuple] 70.0000μs 50.1106μs 19.9559 KOps/s 20.4725 KOps/s $\color{#d91a1a}-2.52\%$
test_setitem 0.1043ms 29.0359μs 34.4401 KOps/s 33.4315 KOps/s $\color{#35bf28}+3.02\%$
test_set 0.1228ms 28.1365μs 35.5411 KOps/s 33.8765 KOps/s $\color{#35bf28}+4.91\%$
test_set_shared 0.3935ms 0.1732ms 5.7722 KOps/s 5.6590 KOps/s $\color{#35bf28}+2.00\%$
test_update 0.1905ms 31.6934μs 31.5523 KOps/s 31.4406 KOps/s $\color{#35bf28}+0.36\%$
test_update_nested 0.1911ms 48.8582μs 20.4674 KOps/s 20.2422 KOps/s $\color{#35bf28}+1.11\%$
test_set_nested 0.1243ms 30.5258μs 32.7592 KOps/s 31.9842 KOps/s $\color{#35bf28}+2.42\%$
test_set_nested_new 0.1914ms 50.3774μs 19.8502 KOps/s 20.2185 KOps/s $\color{#d91a1a}-1.82\%$
test_select 0.1222ms 94.3464μs 10.5992 KOps/s 10.5109 KOps/s $\color{#35bf28}+0.84\%$
test_unbind_speed 0.8246ms 0.6283ms 1.5917 KOps/s 1.5542 KOps/s $\color{#35bf28}+2.41\%$
test_unbind_speed_stack0 72.0030ms 8.0455ms 124.2927 Ops/s 121.6582 Ops/s $\color{#35bf28}+2.17\%$
test_unbind_speed_stack1 15.3600μs 0.9689μs 1.0321 MOps/s 885.6163 KOps/s $\textbf{\color{#35bf28}+16.53\%}$
test_creation[device0] 4.9015ms 0.3340ms 2.9944 KOps/s 3.0297 KOps/s $\color{#d91a1a}-1.17\%$
test_creation_from_tensor 0.5122ms 0.3677ms 2.7193 KOps/s 2.6565 KOps/s $\color{#35bf28}+2.37\%$
test_add_one[memmap_tensor0] 1.5868ms 30.7769μs 32.4919 KOps/s 31.8792 KOps/s $\color{#35bf28}+1.92\%$
test_contiguous[memmap_tensor0] 37.7000μs 8.2932μs 120.5807 KOps/s 111.2167 KOps/s $\textbf{\color{#35bf28}+8.42\%}$
test_stack[memmap_tensor0] 82.4000μs 25.5511μs 39.1373 KOps/s 38.8655 KOps/s $\color{#35bf28}+0.70\%$
test_memmaptd_index 0.3310ms 0.2936ms 3.4062 KOps/s 3.3205 KOps/s $\color{#35bf28}+2.58\%$
test_memmaptd_index_astensor 1.2753ms 1.0864ms 920.4370 Ops/s 821.9821 Ops/s $\textbf{\color{#35bf28}+11.98\%}$
test_memmaptd_index_op 2.4138ms 2.3578ms 424.1268 Ops/s 414.8448 Ops/s $\color{#35bf28}+2.24\%$
test_reshape_pytree 91.3000μs 31.7804μs 31.4660 KOps/s 31.1976 KOps/s $\color{#35bf28}+0.86\%$
test_reshape_td 62.1000μs 39.2508μs 25.4772 KOps/s 25.1460 KOps/s $\color{#35bf28}+1.32\%$
test_view_pytree 0.1121ms 30.9828μs 32.2760 KOps/s 31.4027 KOps/s $\color{#35bf28}+2.78\%$
test_view_td 32.8000μs 8.8720μs 112.7141 KOps/s 116.3612 KOps/s $\color{#d91a1a}-3.13\%$
test_unbind_pytree 70.4000μs 37.3618μs 26.7653 KOps/s 26.6408 KOps/s $\color{#35bf28}+0.47\%$
test_unbind_td 1.0213ms 92.0393μs 10.8649 KOps/s 10.7145 KOps/s $\color{#35bf28}+1.40\%$
test_split_pytree 0.4485ms 36.4782μs 27.4136 KOps/s 27.0955 KOps/s $\color{#35bf28}+1.17\%$
test_split_td 0.7934ms 0.1038ms 9.6337 KOps/s 9.5967 KOps/s $\color{#35bf28}+0.39\%$
test_add_pytree 5.6608ms 46.6852μs 21.4201 KOps/s 19.6244 KOps/s $\textbf{\color{#35bf28}+9.15\%}$
test_add_td 0.2376ms 72.4988μs 13.7933 KOps/s 13.5387 KOps/s $\color{#35bf28}+1.88\%$
test_distributed 22.4000μs 8.1261μs 123.0610 KOps/s 120.7648 KOps/s $\color{#35bf28}+1.90\%$
test_tdmodule 0.2245ms 25.5753μs 39.1002 KOps/s 38.1865 KOps/s $\color{#35bf28}+2.39\%$
test_tdmodule_dispatch 0.3007ms 48.7413μs 20.5165 KOps/s 19.9570 KOps/s $\color{#35bf28}+2.80\%$
test_tdseq 0.4393ms 27.5876μs 36.2482 KOps/s 36.3664 KOps/s $\color{#d91a1a}-0.33\%$
test_tdseq_dispatch 0.4724ms 53.7443μs 18.6066 KOps/s 18.2213 KOps/s $\color{#35bf28}+2.11\%$
test_instantiation_functorch 1.7259ms 1.5391ms 649.7093 Ops/s 645.7889 Ops/s $\color{#35bf28}+0.61\%$
test_instantiation_td 1.8468ms 1.2625ms 792.0568 Ops/s 785.8843 Ops/s $\color{#35bf28}+0.79\%$
test_exec_functorch 0.2279ms 0.1868ms 5.3537 KOps/s 5.4413 KOps/s $\color{#d91a1a}-1.61\%$
test_exec_td 0.2655ms 0.1781ms 5.6148 KOps/s 5.6898 KOps/s $\color{#d91a1a}-1.32\%$
test_vmap_mlp_speed[True-True] 47.5839ms 1.2724ms 785.8887 Ops/s 930.8915 Ops/s $\textbf{\color{#d91a1a}-15.58\%}$
test_vmap_mlp_speed[True-False] 5.9452ms 0.5471ms 1.8278 KOps/s 1.5329 KOps/s $\textbf{\color{#35bf28}+19.24\%}$
test_vmap_mlp_speed[False-True] 6.0389ms 0.9178ms 1.0895 KOps/s 900.4048 Ops/s $\textbf{\color{#35bf28}+21.00\%}$
test_vmap_mlp_speed[False-False] 6.1524ms 0.4381ms 2.2827 KOps/s 2.3341 KOps/s $\color{#d91a1a}-2.20\%$

@vmoens vmoens merged commit d445682 into main Oct 9, 2023
39 of 41 checks passed
@vmoens vmoens deleted the faster_to branch October 21, 2024 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants