Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix, Performance] Fewer imports at root #682

Merged
merged 1 commit into from
Feb 19, 2024
Merged

[BugFix, Performance] Fewer imports at root #682

merged 1 commit into from
Feb 19, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 19, 2024

cc @teopir

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 19, 2024
@vmoens vmoens added bug Something isn't working Performance labels Feb 19, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 126. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}30$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 37.2600μs 17.1645μs 58.2598 KOps/s 56.0834 KOps/s $\color{#35bf28}+3.88\%$
test_plain_set_stack_nested 0.2833ms 0.1444ms 6.9245 KOps/s 55.7354 KOps/s $\textbf{\color{#d91a1a}-87.58\%}$
test_plain_set_nested_inplace 46.5270μs 19.7451μs 50.6455 KOps/s 49.8567 KOps/s $\color{#35bf28}+1.58\%$
test_plain_set_stack_nested_inplace 0.3161ms 0.1773ms 5.6403 KOps/s 49.5280 KOps/s $\textbf{\color{#d91a1a}-88.61\%}$
test_items 16.2010μs 2.4126μs 414.4907 KOps/s 400.9563 KOps/s $\color{#35bf28}+3.38\%$
test_items_nested 0.4615ms 0.2682ms 3.7288 KOps/s 3.6393 KOps/s $\color{#35bf28}+2.46\%$
test_items_nested_locked 0.4157ms 0.2705ms 3.6967 KOps/s 3.7097 KOps/s $\color{#d91a1a}-0.35\%$
test_items_nested_leaf 0.5102ms 0.1667ms 5.9988 KOps/s 6.0648 KOps/s $\color{#d91a1a}-1.09\%$
test_items_stack_nested 2.9080ms 1.3740ms 727.8053 Ops/s 3.6584 KOps/s $\textbf{\color{#d91a1a}-80.11\%}$
test_items_stack_nested_leaf 2.3515ms 1.1912ms 839.4659 Ops/s 6.0623 KOps/s $\textbf{\color{#d91a1a}-86.15\%}$
test_items_stack_nested_locked 1.8360ms 0.8687ms 1.1511 KOps/s 3.6650 KOps/s $\textbf{\color{#d91a1a}-68.59\%}$
test_keys 17.2920μs 3.8712μs 258.3148 KOps/s 260.5960 KOps/s $\color{#d91a1a}-0.88\%$
test_keys_nested 0.5215ms 0.1467ms 6.8144 KOps/s 6.7342 KOps/s $\color{#35bf28}+1.19\%$
test_keys_nested_locked 0.2484ms 0.1508ms 6.6315 KOps/s 6.5605 KOps/s $\color{#35bf28}+1.08\%$
test_keys_nested_leaf 37.8776ms 0.1346ms 7.4291 KOps/s 7.6920 KOps/s $\color{#d91a1a}-3.42\%$
test_keys_stack_nested 2.4397ms 1.2855ms 777.8943 Ops/s 6.5800 KOps/s $\textbf{\color{#d91a1a}-88.18\%}$
test_keys_stack_nested_leaf 2.0499ms 1.2925ms 773.6872 Ops/s 7.5884 KOps/s $\textbf{\color{#d91a1a}-89.80\%}$
test_keys_stack_nested_locked 0.9204ms 0.8077ms 1.2381 KOps/s 6.3888 KOps/s $\textbf{\color{#d91a1a}-80.62\%}$
test_values 4.9365μs 1.1304μs 884.6652 KOps/s 859.6670 KOps/s $\color{#35bf28}+2.91\%$
test_values_nested 0.1012ms 51.2759μs 19.5024 KOps/s 19.3285 KOps/s $\color{#35bf28}+0.90\%$
test_values_nested_locked 0.2342ms 51.6226μs 19.3714 KOps/s 19.2511 KOps/s $\color{#35bf28}+0.62\%$
test_values_nested_leaf 82.6950μs 45.5526μs 21.9526 KOps/s 21.6470 KOps/s $\color{#35bf28}+1.41\%$
test_values_stack_nested 1.6547ms 1.0354ms 965.8531 Ops/s 19.0726 KOps/s $\textbf{\color{#d91a1a}-94.94\%}$
test_values_stack_nested_leaf 1.2620ms 1.0251ms 975.5385 Ops/s 21.7910 KOps/s $\textbf{\color{#d91a1a}-95.52\%}$
test_values_stack_nested_locked 1.1238ms 0.6035ms 1.6570 KOps/s 19.3248 KOps/s $\textbf{\color{#d91a1a}-91.43\%}$
test_membership 9.6480μs 1.3721μs 728.7922 KOps/s 730.5592 KOps/s $\color{#d91a1a}-0.24\%$
test_membership_nested 21.4410μs 3.4843μs 287.0029 KOps/s 284.6026 KOps/s $\color{#35bf28}+0.84\%$
test_membership_nested_leaf 21.7110μs 3.5193μs 284.1471 KOps/s 292.1272 KOps/s $\color{#d91a1a}-2.73\%$
test_membership_stacked_nested 38.1620μs 11.5301μs 86.7292 KOps/s 292.6972 KOps/s $\textbf{\color{#d91a1a}-70.37\%}$
test_membership_stacked_nested_leaf 38.3720μs 11.5929μs 86.2599 KOps/s 286.4749 KOps/s $\textbf{\color{#d91a1a}-69.89\%}$
test_membership_nested_last 32.0200μs 6.6212μs 151.0295 KOps/s 149.7604 KOps/s $\color{#35bf28}+0.85\%$
test_membership_nested_leaf_last 26.3790μs 6.6489μs 150.4012 KOps/s 149.3949 KOps/s $\color{#35bf28}+0.67\%$
test_membership_stacked_nested_last 0.3214ms 0.1754ms 5.7026 KOps/s 62.9593 KOps/s $\textbf{\color{#d91a1a}-90.94\%}$
test_membership_stacked_nested_leaf_last 65.4460μs 13.6715μs 73.1448 KOps/s 62.0608 KOps/s $\textbf{\color{#35bf28}+17.86\%}$
test_nested_getleaf 41.2470μs 10.8812μs 91.9018 KOps/s 92.1120 KOps/s $\color{#d91a1a}-0.23\%$
test_nested_get 43.4210μs 10.3476μs 96.6408 KOps/s 96.3422 KOps/s $\color{#35bf28}+0.31\%$
test_stacked_getleaf 0.6262ms 0.4023ms 2.4859 KOps/s 93.9845 KOps/s $\textbf{\color{#d91a1a}-97.36\%}$
test_stacked_get 0.5864ms 0.3715ms 2.6918 KOps/s 97.2587 KOps/s $\textbf{\color{#d91a1a}-97.23\%}$
test_nested_getitemleaf 46.0760μs 12.2126μs 81.8826 KOps/s 80.0179 KOps/s $\color{#35bf28}+2.33\%$
test_nested_getitem 45.7360μs 11.6681μs 85.7036 KOps/s 84.3607 KOps/s $\color{#35bf28}+1.59\%$
test_stacked_getitemleaf 0.5467ms 0.4043ms 2.4735 KOps/s 81.2763 KOps/s $\textbf{\color{#d91a1a}-96.96\%}$
test_stacked_getitem 0.6604ms 0.3738ms 2.6752 KOps/s 85.7487 KOps/s $\textbf{\color{#d91a1a}-96.88\%}$
test_lock_nested 0.7426ms 0.3349ms 2.9861 KOps/s 2.4796 KOps/s $\textbf{\color{#35bf28}+20.43\%}$
test_lock_stack_nested 65.4888ms 5.2028ms 192.2060 Ops/s 3.4723 KOps/s $\textbf{\color{#d91a1a}-94.46\%}$
test_unlock_nested 49.9916ms 0.3868ms 2.5852 KOps/s 2.9766 KOps/s $\textbf{\color{#d91a1a}-13.15\%}$
test_unlock_stack_nested 66.3088ms 5.4465ms 183.6046 Ops/s 3.3778 KOps/s $\textbf{\color{#d91a1a}-94.56\%}$
test_flatten_speed 60.8896ms 0.3862ms 2.5894 KOps/s 2.6815 KOps/s $\color{#d91a1a}-3.43\%$
test_unflatten_speed 0.6280ms 0.4661ms 2.1453 KOps/s 2.1542 KOps/s $\color{#d91a1a}-0.41\%$
test_common_ops 6.0381ms 0.7069ms 1.4145 KOps/s 1.3996 KOps/s $\color{#35bf28}+1.07\%$
test_creation 48.1200μs 1.8509μs 540.2662 KOps/s 544.4402 KOps/s $\color{#d91a1a}-0.77\%$
test_creation_empty 28.2430μs 10.9707μs 91.1515 KOps/s 87.2378 KOps/s $\color{#35bf28}+4.49\%$
test_creation_nested_1 33.3730μs 13.7002μs 72.9918 KOps/s 71.0783 KOps/s $\color{#35bf28}+2.69\%$
test_creation_nested_2 44.8040μs 16.7643μs 59.6506 KOps/s 57.4865 KOps/s $\color{#35bf28}+3.76\%$
test_clone 0.1214ms 13.5691μs 73.6968 KOps/s 77.3699 KOps/s $\color{#d91a1a}-4.75\%$
test_getitem[int] 29.5950μs 11.0508μs 90.4911 KOps/s 88.5524 KOps/s $\color{#35bf28}+2.19\%$
test_getitem[slice_int] 57.6980μs 22.3601μs 44.7226 KOps/s 44.3702 KOps/s $\color{#35bf28}+0.79\%$
test_getitem[range] 85.0080μs 40.8823μs 24.4605 KOps/s 23.4104 KOps/s $\color{#35bf28}+4.49\%$
test_getitem[tuple] 49.7330μs 18.3364μs 54.5363 KOps/s 54.3979 KOps/s $\color{#35bf28}+0.25\%$
test_getitem[list] 0.2301ms 36.2807μs 27.5628 KOps/s 26.3589 KOps/s $\color{#35bf28}+4.57\%$
test_setitem_dim[int] 52.9590μs 30.9581μs 32.3017 KOps/s 32.5500 KOps/s $\color{#d91a1a}-0.76\%$
test_setitem_dim[slice_int] 93.2850μs 56.2724μs 17.7707 KOps/s 17.8757 KOps/s $\color{#d91a1a}-0.59\%$
test_setitem_dim[range] 0.1513ms 76.4379μs 13.0825 KOps/s 13.3145 KOps/s $\color{#d91a1a}-1.74\%$
test_setitem_dim[tuple] 0.1033ms 46.1735μs 21.6574 KOps/s 22.2856 KOps/s $\color{#d91a1a}-2.82\%$
test_setitem 0.1011ms 20.4171μs 48.9785 KOps/s 49.6486 KOps/s $\color{#d91a1a}-1.35\%$
test_set 0.1038ms 19.7940μs 50.5202 KOps/s 50.3738 KOps/s $\color{#35bf28}+0.29\%$
test_set_shared 4.0685ms 0.1398ms 7.1539 KOps/s 7.1859 KOps/s $\color{#d91a1a}-0.45\%$
test_update 0.1170ms 23.0167μs 43.4466 KOps/s 43.2105 KOps/s $\color{#35bf28}+0.55\%$
test_update_nested 0.1182ms 30.1338μs 33.1854 KOps/s 32.7050 KOps/s $\color{#35bf28}+1.47\%$
test_set_nested 0.2050ms 21.7898μs 45.8930 KOps/s 46.7163 KOps/s $\color{#d91a1a}-1.76\%$
test_set_nested_new 0.1475ms 25.6260μs 39.0229 KOps/s 39.4173 KOps/s $\color{#d91a1a}-1.00\%$
test_select 0.1231ms 39.2524μs 25.4761 KOps/s 25.5374 KOps/s $\color{#d91a1a}-0.24\%$
test_select_nested 0.1080ms 58.2067μs 17.1801 KOps/s 17.0042 KOps/s $\color{#35bf28}+1.03\%$
test_exclude_nested 0.2249ms 0.1174ms 8.5197 KOps/s 8.4360 KOps/s $\color{#35bf28}+0.99\%$
test_empty[True] 0.5789ms 0.4020ms 2.4876 KOps/s 2.4506 KOps/s $\color{#35bf28}+1.51\%$
test_empty[False] 7.9168μs 1.0342μs 966.8847 KOps/s 940.5425 KOps/s $\color{#35bf28}+2.80\%$
test_unbind_speed 0.3009ms 0.2478ms 4.0349 KOps/s 4.1269 KOps/s $\color{#d91a1a}-2.23\%$
test_unbind_speed_stack0 57.4259ms 3.3123ms 301.9022 Ops/s 4.3187 KOps/s $\textbf{\color{#d91a1a}-93.01\%}$
test_unbind_speed_stack1 24.2050μs 1.9678μs 508.1703 KOps/s 1.4883 KOps/s $\textbf{\color{#35bf28}+34043.84\%}$
test_split 2.1844ms 1.4416ms 693.6564 Ops/s 681.0033 Ops/s $\color{#35bf28}+1.86\%$
test_chunk 53.8952ms 1.6078ms 621.9677 Ops/s 678.2164 Ops/s $\textbf{\color{#d91a1a}-8.29\%}$
test_creation[device0] 0.1787ms 0.1024ms 9.7632 KOps/s 9.9032 KOps/s $\color{#d91a1a}-1.41\%$
test_creation_from_tensor 3.5432ms 83.7657μs 11.9381 KOps/s 12.2820 KOps/s $\color{#d91a1a}-2.80\%$
test_add_one[memmap_tensor0] 0.2943ms 5.4757μs 182.6264 KOps/s 188.7538 KOps/s $\color{#d91a1a}-3.25\%$
test_contiguous[memmap_tensor0] 9.0360μs 0.6290μs 1.5899 MOps/s 1.5390 MOps/s $\color{#35bf28}+3.31\%$
test_stack[memmap_tensor0] 59.6520μs 3.6804μs 271.7109 KOps/s 278.8767 KOps/s $\color{#d91a1a}-2.57\%$
test_memmaptd_index 0.9212ms 0.2360ms 4.2379 KOps/s 4.2195 KOps/s $\color{#35bf28}+0.44\%$
test_memmaptd_index_astensor 0.5250ms 0.2986ms 3.3486 KOps/s 3.3227 KOps/s $\color{#35bf28}+0.78\%$
test_memmaptd_index_op 1.2496ms 0.6146ms 1.6272 KOps/s 1.6403 KOps/s $\color{#d91a1a}-0.80\%$
test_serialize_model 0.1606s 0.1109s 9.0134 Ops/s 9.5440 Ops/s $\textbf{\color{#d91a1a}-5.56\%}$
test_serialize_model_pickle 0.4477s 0.3795s 2.6350 Ops/s 2.0607 Ops/s $\textbf{\color{#35bf28}+27.87\%}$
test_serialize_weights 0.1525s 0.1060s 9.4322 Ops/s 10.0466 Ops/s $\textbf{\color{#d91a1a}-6.12\%}$
test_serialize_weights_returnearly 0.1730s 0.1278s 7.8222 Ops/s 6.7299 Ops/s $\textbf{\color{#35bf28}+16.23\%}$
test_serialize_weights_pickle 1.1521s 0.5959s 1.6781 Ops/s 2.4134 Ops/s $\textbf{\color{#d91a1a}-30.47\%}$
test_serialize_weights_filesystem 0.1540s 97.2459ms 10.2832 Ops/s 10.6835 Ops/s $\color{#d91a1a}-3.75\%$
test_serialize_model_filesystem 97.1307ms 92.9169ms 10.7623 Ops/s 10.2491 Ops/s $\textbf{\color{#35bf28}+5.01\%}$
test_reshape_pytree 48.9310μs 20.6341μs 48.4634 KOps/s 47.5101 KOps/s $\color{#35bf28}+2.01\%$
test_reshape_td 94.7270μs 31.4950μs 31.7510 KOps/s 32.0923 KOps/s $\color{#d91a1a}-1.06\%$
test_view_pytree 48.3600μs 20.6871μs 48.3392 KOps/s 47.7382 KOps/s $\color{#35bf28}+1.26\%$
test_view_td 53.8560ms 10.4375μs 95.8087 KOps/s 15.2758 KOps/s $\textbf{\color{#35bf28}+527.19\%}$
test_unbind_pytree 59.5020μs 23.8244μs 41.9738 KOps/s 41.6142 KOps/s $\color{#35bf28}+0.86\%$
test_unbind_td 0.1121ms 35.5266μs 28.1479 KOps/s 28.2458 KOps/s $\color{#d91a1a}-0.35\%$
test_split_pytree 54.3410μs 23.4214μs 42.6960 KOps/s 42.2287 KOps/s $\color{#35bf28}+1.11\%$
test_split_td 1.0418ms 38.8435μs 25.7443 KOps/s 25.2804 KOps/s $\color{#35bf28}+1.84\%$
test_add_pytree 67.7770μs 28.7999μs 34.7223 KOps/s 33.5719 KOps/s $\color{#35bf28}+3.43\%$
test_add_td 0.1004ms 53.6694μs 18.6326 KOps/s 18.2311 KOps/s $\color{#35bf28}+2.20\%$
test_distributed 0.1754ms 98.8539μs 10.1159 KOps/s 9.8774 KOps/s $\color{#35bf28}+2.41\%$
test_tdmodule 0.2685ms 22.8553μs 43.7536 KOps/s 43.0832 KOps/s $\color{#35bf28}+1.56\%$
test_tdmodule_dispatch 0.1477ms 44.1786μs 22.6354 KOps/s 21.9526 KOps/s $\color{#35bf28}+3.11\%$
test_tdseq 0.1142ms 25.4935μs 39.2257 KOps/s 38.0109 KOps/s $\color{#35bf28}+3.20\%$
test_tdseq_dispatch 0.3862ms 48.0756μs 20.8006 KOps/s 20.4326 KOps/s $\color{#35bf28}+1.80\%$
test_instantiation_functorch 2.8284ms 1.3182ms 758.6058 Ops/s 771.7187 Ops/s $\color{#d91a1a}-1.70\%$
test_instantiation_td 1.4747ms 0.9989ms 1.0011 KOps/s 995.7992 Ops/s $\color{#35bf28}+0.53\%$
test_exec_functorch 0.5222ms 0.1594ms 6.2747 KOps/s 6.4054 KOps/s $\color{#d91a1a}-2.04\%$
test_exec_functional_call 0.3823ms 0.1449ms 6.8991 KOps/s 7.0142 KOps/s $\color{#d91a1a}-1.64\%$
test_exec_td 0.2869ms 0.1422ms 7.0330 KOps/s 7.1251 KOps/s $\color{#d91a1a}-1.29\%$
test_exec_td_decorator 0.2543ms 0.1706ms 5.8610 KOps/s 5.9109 KOps/s $\color{#d91a1a}-0.84\%$
test_vmap_mlp_speed[True-True] 1.2956ms 0.9092ms 1.0999 KOps/s 2.1428 KOps/s $\textbf{\color{#d91a1a}-48.67\%}$
test_vmap_mlp_speed[True-False] 0.9144ms 0.4844ms 2.0644 KOps/s 2.1527 KOps/s $\color{#d91a1a}-4.10\%$
test_vmap_mlp_speed[False-True] 1.1382ms 0.7940ms 1.2594 KOps/s 2.6485 KOps/s $\textbf{\color{#d91a1a}-52.45\%}$
test_vmap_mlp_speed[False-False] 0.6903ms 0.3969ms 2.5196 KOps/s 2.6475 KOps/s $\color{#d91a1a}-4.83\%$
test_vmap_mlp_speed_decorator[True-True] 2.1535ms 1.5603ms 640.8941 Ops/s 1.9534 KOps/s $\textbf{\color{#d91a1a}-67.19\%}$
test_vmap_mlp_speed_decorator[True-False] 0.6402ms 0.5207ms 1.9207 KOps/s 1.9834 KOps/s $\color{#d91a1a}-3.16\%$
test_vmap_mlp_speed_decorator[False-True] 2.1038ms 1.3166ms 759.5201 Ops/s 2.5856 KOps/s $\textbf{\color{#d91a1a}-70.62\%}$
test_vmap_mlp_speed_decorator[False-False] 0.5703ms 0.4010ms 2.4938 KOps/s 2.6015 KOps/s $\color{#d91a1a}-4.14\%$
test_to_module_speed[True] 1.7490ms 1.0990ms 909.9132 Ops/s 872.6712 Ops/s $\color{#35bf28}+4.27\%$
test_to_module_speed[False] 1.1598ms 1.0789ms 926.8314 Ops/s 924.2349 Ops/s $\color{#35bf28}+0.28\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 134. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}40$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.8577ms 14.1874μs 70.4849 KOps/s 68.9072 KOps/s $\color{#35bf28}+2.29\%$
test_plain_set_stack_nested 0.1914ms 0.1209ms 8.2735 KOps/s 68.6553 KOps/s $\textbf{\color{#d91a1a}-87.95\%}$
test_plain_set_nested_inplace 46.9710μs 15.5187μs 64.4385 KOps/s 63.4335 KOps/s $\color{#35bf28}+1.58\%$
test_plain_set_stack_nested_inplace 0.2386ms 0.1509ms 6.6256 KOps/s 63.0898 KOps/s $\textbf{\color{#d91a1a}-89.50\%}$
test_items 32.5520μs 4.7491μs 210.5651 KOps/s 209.4437 KOps/s $\color{#35bf28}+0.54\%$
test_items_nested 0.3763ms 0.3393ms 2.9472 KOps/s 2.9365 KOps/s $\color{#35bf28}+0.37\%$
test_items_nested_locked 0.4016ms 0.3439ms 2.9079 KOps/s 2.9156 KOps/s $\color{#d91a1a}-0.27\%$
test_items_nested_leaf 0.2337ms 0.2004ms 4.9908 KOps/s 4.9707 KOps/s $\color{#35bf28}+0.40\%$
test_items_stack_nested 1.5685ms 1.3235ms 755.5678 Ops/s 2.9213 KOps/s $\textbf{\color{#d91a1a}-74.14\%}$
test_items_stack_nested_leaf 1.2586ms 1.1652ms 858.2354 Ops/s 4.9478 KOps/s $\textbf{\color{#d91a1a}-82.65\%}$
test_items_stack_nested_locked 1.2094ms 0.9180ms 1.0893 KOps/s 2.8977 KOps/s $\textbf{\color{#d91a1a}-62.41\%}$
test_keys 65.1610μs 4.7788μs 209.2555 KOps/s 219.9129 KOps/s $\color{#d91a1a}-4.85\%$
test_keys_nested 44.5541ms 0.1000ms 9.9964 KOps/s 10.5529 KOps/s $\textbf{\color{#d91a1a}-5.27\%}$
test_keys_nested_locked 0.1424ms 98.2525μs 10.1779 KOps/s 10.1862 KOps/s $\color{#d91a1a}-0.08\%$
test_keys_nested_leaf 0.1156ms 77.6393μs 12.8801 KOps/s 12.8440 KOps/s $\color{#35bf28}+0.28\%$
test_keys_stack_nested 1.2416ms 1.1518ms 868.2256 Ops/s 10.6360 KOps/s $\textbf{\color{#d91a1a}-91.84\%}$
test_keys_stack_nested_leaf 1.1968ms 1.1386ms 878.2419 Ops/s 12.8791 KOps/s $\textbf{\color{#d91a1a}-93.18\%}$
test_keys_stack_nested_locked 0.9195ms 0.7268ms 1.3760 KOps/s 10.2061 KOps/s $\textbf{\color{#d91a1a}-86.52\%}$
test_values 9.2000μs 1.8913μs 528.7332 KOps/s 531.1263 KOps/s $\color{#d91a1a}-0.45\%$
test_values_nested 67.4410μs 45.3661μs 22.0429 KOps/s 22.1450 KOps/s $\color{#d91a1a}-0.46\%$
test_values_nested_locked 0.1030ms 47.5076μs 21.0492 KOps/s 20.9138 KOps/s $\color{#35bf28}+0.65\%$
test_values_nested_leaf 61.7110μs 39.8719μs 25.0803 KOps/s 25.2976 KOps/s $\color{#d91a1a}-0.86\%$
test_values_stack_nested 1.0658ms 0.9649ms 1.0364 KOps/s 21.5606 KOps/s $\textbf{\color{#d91a1a}-95.19\%}$
test_values_stack_nested_leaf 1.0413ms 0.9596ms 1.0422 KOps/s 24.7079 KOps/s $\textbf{\color{#d91a1a}-95.78\%}$
test_values_stack_nested_locked 0.6438ms 0.5788ms 1.7277 KOps/s 20.6538 KOps/s $\textbf{\color{#d91a1a}-91.63\%}$
test_membership 4.9020μs 0.9456μs 1.0575 MOps/s 1.0443 MOps/s $\color{#35bf28}+1.26\%$
test_membership_nested 22.0600μs 2.8957μs 345.3360 KOps/s 342.6186 KOps/s $\color{#35bf28}+0.79\%$
test_membership_nested_leaf 23.9500μs 2.8684μs 348.6225 KOps/s 344.4675 KOps/s $\color{#35bf28}+1.21\%$
test_membership_stacked_nested 33.0110μs 11.2395μs 88.9720 KOps/s 345.9554 KOps/s $\textbf{\color{#d91a1a}-74.28\%}$
test_membership_stacked_nested_leaf 35.2310μs 11.3493μs 88.1115 KOps/s 344.8552 KOps/s $\textbf{\color{#d91a1a}-74.45\%}$
test_membership_nested_last 66.9110μs 5.2677μs 189.8364 KOps/s 186.7205 KOps/s $\color{#35bf28}+1.67\%$
test_membership_nested_leaf_last 20.4200μs 5.2975μs 188.7689 KOps/s 186.4738 KOps/s $\color{#35bf28}+1.23\%$
test_membership_stacked_nested_last 0.1970ms 0.1560ms 6.4094 KOps/s 146.8461 KOps/s $\textbf{\color{#d91a1a}-95.64\%}$
test_membership_stacked_nested_leaf_last 45.3110μs 13.1356μs 76.1288 KOps/s 146.9113 KOps/s $\textbf{\color{#d91a1a}-48.18\%}$
test_nested_getleaf 38.0110μs 8.3893μs 119.1991 KOps/s 118.9444 KOps/s $\color{#35bf28}+0.21\%$
test_nested_get 67.3410μs 7.9778μs 125.3486 KOps/s 125.9248 KOps/s $\color{#d91a1a}-0.46\%$
test_stacked_getleaf 0.3742ms 0.3299ms 3.0309 KOps/s 118.9930 KOps/s $\textbf{\color{#d91a1a}-97.45\%}$
test_stacked_get 0.3343ms 0.2973ms 3.3637 KOps/s 126.1084 KOps/s $\textbf{\color{#d91a1a}-97.33\%}$
test_nested_getitemleaf 28.2410μs 9.7800μs 102.2493 KOps/s 102.2508 KOps/s $-0.00\%$
test_nested_getitem 24.4410μs 9.3170μs 107.3309 KOps/s 107.5794 KOps/s $\color{#d91a1a}-0.23\%$
test_stacked_getitemleaf 0.3958ms 0.3323ms 3.0093 KOps/s 102.7099 KOps/s $\textbf{\color{#d91a1a}-97.07\%}$
test_stacked_getitem 0.3257ms 0.2973ms 3.3632 KOps/s 107.5553 KOps/s $\textbf{\color{#d91a1a}-96.87\%}$
test_lock_nested 1.2810ms 0.3575ms 2.7970 KOps/s 2.3220 KOps/s $\textbf{\color{#35bf28}+20.46\%}$
test_lock_stack_nested 72.1149ms 5.7819ms 172.9531 Ops/s 3.2812 KOps/s $\textbf{\color{#d91a1a}-94.73\%}$
test_unlock_nested 0.7214ms 0.3541ms 2.8245 KOps/s 2.8848 KOps/s $\color{#d91a1a}-2.09\%$
test_unlock_stack_nested 75.0942ms 6.0423ms 165.5004 Ops/s 3.1785 KOps/s $\textbf{\color{#d91a1a}-94.79\%}$
test_flatten_speed 0.4854ms 0.2599ms 3.8470 KOps/s 3.8634 KOps/s $\color{#d91a1a}-0.42\%$
test_unflatten_speed 0.3805ms 0.3589ms 2.7865 KOps/s 2.7742 KOps/s $\color{#35bf28}+0.44\%$
test_common_ops 1.1050ms 0.6195ms 1.6141 KOps/s 1.5861 KOps/s $\color{#35bf28}+1.76\%$
test_creation 37.1000μs 1.5600μs 641.0288 KOps/s 636.8811 KOps/s $\color{#35bf28}+0.65\%$
test_creation_empty 39.5310μs 9.3099μs 107.4124 KOps/s 99.6868 KOps/s $\textbf{\color{#35bf28}+7.75\%}$
test_creation_nested_1 29.9100μs 11.1624μs 89.5863 KOps/s 85.0510 KOps/s $\textbf{\color{#35bf28}+5.33\%}$
test_creation_nested_2 37.8100μs 13.7066μs 72.9576 KOps/s 69.9322 KOps/s $\color{#35bf28}+4.33\%$
test_clone 80.9110μs 14.2733μs 70.0610 KOps/s 74.4649 KOps/s $\textbf{\color{#d91a1a}-5.91\%}$
test_getitem[int] 23.7510μs 10.6628μs 93.7836 KOps/s 92.5668 KOps/s $\color{#35bf28}+1.31\%$
test_getitem[slice_int] 38.3800μs 21.0256μs 47.5610 KOps/s 48.0907 KOps/s $\color{#d91a1a}-1.10\%$
test_getitem[range] 69.5810μs 51.0628μs 19.5837 KOps/s 25.7268 KOps/s $\textbf{\color{#d91a1a}-23.88\%}$
test_getitem[tuple] 41.4510μs 18.9320μs 52.8206 KOps/s 54.3796 KOps/s $\color{#d91a1a}-2.87\%$
test_getitem[list] 0.1371ms 36.4524μs 27.4330 KOps/s 27.9522 KOps/s $\color{#d91a1a}-1.86\%$
test_setitem_dim[int] 49.1010μs 29.4311μs 33.9776 KOps/s 34.3142 KOps/s $\color{#d91a1a}-0.98\%$
test_setitem_dim[slice_int] 72.2410μs 49.4907μs 20.2058 KOps/s 20.6954 KOps/s $\color{#d91a1a}-2.37\%$
test_setitem_dim[range] 95.5410μs 68.7816μs 14.5388 KOps/s 14.5314 KOps/s $\color{#35bf28}+0.05\%$
test_setitem_dim[tuple] 66.6010μs 43.6062μs 22.9325 KOps/s 23.3897 KOps/s $\color{#d91a1a}-1.95\%$
test_setitem 47.5500μs 19.9756μs 50.0610 KOps/s 52.9906 KOps/s $\textbf{\color{#d91a1a}-5.53\%}$
test_set 49.5200μs 19.6964μs 50.7706 KOps/s 54.2296 KOps/s $\textbf{\color{#d91a1a}-6.38\%}$
test_set_shared 1.6430ms 0.1037ms 9.6441 KOps/s 9.8943 KOps/s $\color{#d91a1a}-2.53\%$
test_update 84.5120μs 22.0021μs 45.4503 KOps/s 44.9152 KOps/s $\color{#35bf28}+1.19\%$
test_update_nested 72.0310μs 28.6898μs 34.8555 KOps/s 35.1884 KOps/s $\color{#d91a1a}-0.95\%$
test_set_nested 66.3100μs 20.5205μs 48.7318 KOps/s 51.2676 KOps/s $\color{#d91a1a}-4.95\%$
test_set_nested_new 67.2320μs 23.3532μs 42.8207 KOps/s 44.5667 KOps/s $\color{#d91a1a}-3.92\%$
test_select 72.9010μs 37.0420μs 26.9964 KOps/s 28.5709 KOps/s $\textbf{\color{#d91a1a}-5.51\%}$
test_select_nested 84.6820μs 53.5621μs 18.6699 KOps/s 18.8240 KOps/s $\color{#d91a1a}-0.82\%$
test_exclude_nested 0.1609ms 0.1139ms 8.7826 KOps/s 8.8573 KOps/s $\color{#d91a1a}-0.84\%$
test_empty[True] 0.9041ms 0.3878ms 2.5785 KOps/s 2.5974 KOps/s $\color{#d91a1a}-0.73\%$
test_empty[False] 3.0271μs 0.8562μs 1.1679 MOps/s 1.1756 MOps/s $\color{#d91a1a}-0.65\%$
test_to 74.5520μs 54.0528μs 18.5004 KOps/s 18.3787 KOps/s $\color{#35bf28}+0.66\%$
test_to_nonblocking 69.8510μs 34.5352μs 28.9559 KOps/s 27.4972 KOps/s $\textbf{\color{#35bf28}+5.31\%}$
test_unbind_speed 0.3204ms 0.2690ms 3.7173 KOps/s 3.7484 KOps/s $\color{#d91a1a}-0.83\%$
test_unbind_speed_stack0 79.8090ms 3.5498ms 281.7095 Ops/s 3.7564 KOps/s $\textbf{\color{#d91a1a}-92.50\%}$
test_unbind_speed_stack1 16.9500μs 1.8433μs 542.5076 KOps/s 1.2632 KOps/s $\textbf{\color{#35bf28}+42846.64\%}$
test_split 77.0626ms 1.6609ms 602.0663 Ops/s 668.2552 Ops/s $\textbf{\color{#d91a1a}-9.90\%}$
test_chunk 1.8936ms 1.4875ms 672.2768 Ops/s 586.6866 Ops/s $\textbf{\color{#35bf28}+14.59\%}$
test_creation[device0] 0.1704ms 71.9045μs 13.9073 KOps/s 13.8709 KOps/s $\color{#35bf28}+0.26\%$
test_creation_from_tensor 0.1811ms 54.4236μs 18.3744 KOps/s 18.1479 KOps/s $\color{#35bf28}+1.25\%$
test_add_one[memmap_tensor0] 87.7710μs 7.2142μs 138.6159 KOps/s 151.5799 KOps/s $\textbf{\color{#d91a1a}-8.55\%}$
test_contiguous[memmap_tensor0] 25.5800μs 0.6359μs 1.5726 MOps/s 1.6075 MOps/s $\color{#d91a1a}-2.18\%$
test_stack[memmap_tensor0] 19.5010μs 4.4844μs 222.9953 KOps/s 239.7819 KOps/s $\textbf{\color{#d91a1a}-7.00\%}$
test_memmaptd_index 0.9994ms 0.2606ms 3.8375 KOps/s 3.8622 KOps/s $\color{#d91a1a}-0.64\%$
test_memmaptd_index_astensor 0.5666ms 0.3195ms 3.1295 KOps/s 3.1692 KOps/s $\color{#d91a1a}-1.25\%$
test_memmaptd_index_op 0.9820ms 0.6379ms 1.5676 KOps/s 1.5853 KOps/s $\color{#d91a1a}-1.12\%$
test_serialize_model 92.9466ms 90.0860ms 11.1005 Ops/s 10.6910 Ops/s $\color{#35bf28}+3.83\%$
test_serialize_model_pickle 1.3506s 1.2354s 0.8095 Ops/s 0.8084 Ops/s $\color{#35bf28}+0.13\%$
test_serialize_weights 0.1779s 95.6430ms 10.4555 Ops/s 10.9571 Ops/s $\color{#d91a1a}-4.58\%$
test_serialize_weights_returnearly 0.3014s 79.5324ms 12.5735 Ops/s 12.5700 Ops/s $\color{#35bf28}+0.03\%$
test_serialize_weights_pickle 1.4123s 1.2553s 0.7966 Ops/s 0.8036 Ops/s $\color{#d91a1a}-0.87\%$
test_reshape_pytree 46.0710μs 24.6141μs 40.6271 KOps/s 40.7952 KOps/s $\color{#d91a1a}-0.41\%$
test_reshape_td 69.7210μs 30.7051μs 32.5679 KOps/s 31.8591 KOps/s $\color{#35bf28}+2.22\%$
test_view_pytree 43.0100μs 24.3987μs 40.9857 KOps/s 41.2593 KOps/s $\color{#d91a1a}-0.66\%$
test_view_td 91.6331ms 10.7538μs 92.9908 KOps/s 16.1062 KOps/s $\textbf{\color{#35bf28}+477.36\%}$
test_unbind_pytree 0.2963ms 30.2183μs 33.0925 KOps/s 33.3883 KOps/s $\color{#d91a1a}-0.89\%$
test_unbind_td 72.6810μs 40.4580μs 24.7170 KOps/s 25.2779 KOps/s $\color{#d91a1a}-2.22\%$
test_split_pytree 45.6200μs 28.4853μs 35.1058 KOps/s 35.8822 KOps/s $\color{#d91a1a}-2.16\%$
test_split_td 0.1023ms 38.3155μs 26.0991 KOps/s 26.3170 KOps/s $\color{#d91a1a}-0.83\%$
test_add_pytree 66.9310μs 36.6114μs 27.3139 KOps/s 28.9226 KOps/s $\textbf{\color{#d91a1a}-5.56\%}$
test_add_td 0.1297ms 51.7435μs 19.3261 KOps/s 18.8914 KOps/s $\color{#35bf28}+2.30\%$
test_distributed 0.1621ms 69.8398μs 14.3185 KOps/s 12.2467 KOps/s $\textbf{\color{#35bf28}+16.92\%}$
test_tdmodule 50.6800μs 18.5119μs 54.0192 KOps/s 53.6378 KOps/s $\color{#35bf28}+0.71\%$
test_tdmodule_dispatch 0.1470ms 37.9391μs 26.3580 KOps/s 25.6148 KOps/s $\color{#35bf28}+2.90\%$
test_tdseq 41.2200μs 21.4014μs 46.7260 KOps/s 46.2533 KOps/s $\color{#35bf28}+1.02\%$
test_tdseq_dispatch 60.9110μs 40.8167μs 24.4998 KOps/s 23.9831 KOps/s $\color{#35bf28}+2.15\%$
test_instantiation_functorch 2.5199ms 1.6847ms 593.5888 Ops/s 602.4006 Ops/s $\color{#d91a1a}-1.46\%$
test_instantiation_td 1.6718ms 1.1548ms 865.9856 Ops/s 722.5366 Ops/s $\textbf{\color{#35bf28}+19.85\%}$
test_exec_functorch 0.3859ms 0.1625ms 6.1557 KOps/s 6.4168 KOps/s $\color{#d91a1a}-4.07\%$
test_exec_functional_call 0.3655ms 0.1596ms 6.2668 KOps/s 6.4451 KOps/s $\color{#d91a1a}-2.77\%$
test_exec_td 0.1821ms 0.1506ms 6.6384 KOps/s 6.9758 KOps/s $\color{#d91a1a}-4.84\%$
test_exec_td_decorator 0.6551ms 0.1788ms 5.5931 KOps/s 5.7512 KOps/s $\color{#d91a1a}-2.75\%$
test_vmap_mlp_speed[True-True] 1.2470ms 1.0394ms 962.0847 Ops/s 1.6406 KOps/s $\textbf{\color{#d91a1a}-41.36\%}$
test_vmap_mlp_speed[True-False] 0.8214ms 0.6100ms 1.6393 KOps/s 1.6601 KOps/s $\color{#d91a1a}-1.25\%$
test_vmap_mlp_speed[False-True] 1.1851ms 0.9486ms 1.0542 KOps/s 1.8913 KOps/s $\textbf{\color{#d91a1a}-44.26\%}$
test_vmap_mlp_speed[False-False] 0.7665ms 0.5328ms 1.8767 KOps/s 1.8576 KOps/s $\color{#35bf28}+1.03\%$
test_vmap_mlp_speed_decorator[True-True] 2.0326ms 1.7980ms 556.1725 Ops/s 1.5668 KOps/s $\textbf{\color{#d91a1a}-64.50\%}$
test_vmap_mlp_speed_decorator[True-False] 0.8346ms 0.6410ms 1.5601 KOps/s 1.5734 KOps/s $\color{#d91a1a}-0.85\%$
test_vmap_mlp_speed_decorator[False-True] 1.8066ms 1.5859ms 630.5736 Ops/s 1.8587 KOps/s $\textbf{\color{#d91a1a}-66.07\%}$
test_vmap_mlp_speed_decorator[False-False] 0.8506ms 0.5473ms 1.8271 KOps/s 1.8645 KOps/s $\color{#d91a1a}-2.00\%$
test_vmap_transformer_speed[True-True] 12.4155ms 12.1786ms 82.1113 Ops/s 123.3384 Ops/s $\textbf{\color{#d91a1a}-33.43\%}$
test_vmap_transformer_speed[True-False] 8.3124ms 8.0302ms 124.5294 Ops/s 124.2484 Ops/s $\color{#35bf28}+0.23\%$
test_vmap_transformer_speed[False-True] 12.3206ms 12.0447ms 83.0238 Ops/s 124.7350 Ops/s $\textbf{\color{#d91a1a}-33.44\%}$
test_vmap_transformer_speed[False-False] 8.3884ms 7.9806ms 125.3044 Ops/s 125.1664 Ops/s $\color{#35bf28}+0.11\%$
test_vmap_transformer_speed_decorator[True-True] 58.7971ms 58.5729ms 17.0728 Ops/s 50.0057 Ops/s $\textbf{\color{#d91a1a}-65.86\%}$
test_vmap_transformer_speed_decorator[True-False] 19.3554ms 19.0717ms 52.4338 Ops/s 49.9880 Ops/s $\color{#35bf28}+4.89\%$
test_vmap_transformer_speed_decorator[False-True] 56.0120ms 53.4675ms 18.7030 Ops/s 51.3182 Ops/s $\textbf{\color{#d91a1a}-63.55\%}$
test_vmap_transformer_speed_decorator[False-False] 18.9157ms 18.6926ms 53.4972 Ops/s 51.2119 Ops/s $\color{#35bf28}+4.46\%$
test_to_module_speed[True] 2.6158ms 1.0138ms 986.3693 Ops/s 1.0007 KOps/s $\color{#d91a1a}-1.43\%$
test_to_module_speed[False] 1.4720ms 0.9745ms 1.0262 KOps/s 1.0317 KOps/s $\color{#d91a1a}-0.53\%$

@vmoens vmoens merged commit 8485755 into main Feb 19, 2024
47 of 48 checks passed
@vmoens vmoens deleted the resolve-imports branch February 19, 2024 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants