Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix stack of tensorclasses (and nontensors) #820

Merged
merged 5 commits into from
Jun 19, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jun 19, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 19, 2024
@vmoens vmoens added bug Something isn't working and removed CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. labels Jun 19, 2024
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 19, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}22$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 34.0930μs 17.1897μs 58.1745 KOps/s 60.8753 KOps/s $\color{#d91a1a}-4.44\%$
test_plain_set_stack_nested 49.8340μs 17.4352μs 57.3552 KOps/s 60.6307 KOps/s $\textbf{\color{#d91a1a}-5.40\%}$
test_plain_set_nested_inplace 57.3780μs 19.5598μs 51.1253 KOps/s 53.0998 KOps/s $\color{#d91a1a}-3.72\%$
test_plain_set_stack_nested_inplace 73.4380μs 19.2301μs 52.0018 KOps/s 53.5923 KOps/s $\color{#d91a1a}-2.97\%$
test_items 36.1380μs 2.7389μs 365.1092 KOps/s 384.6385 KOps/s $\textbf{\color{#d91a1a}-5.08\%}$
test_items_nested 0.4142ms 0.2633ms 3.7976 KOps/s 3.7425 KOps/s $\color{#35bf28}+1.47\%$
test_items_nested_locked 1.1477ms 0.2648ms 3.7764 KOps/s 3.7382 KOps/s $\color{#35bf28}+1.02\%$
test_items_nested_leaf 0.1361ms 76.9078μs 13.0026 KOps/s 12.8973 KOps/s $\color{#35bf28}+0.82\%$
test_items_stack_nested 0.4188ms 0.2666ms 3.7509 KOps/s 3.7272 KOps/s $\color{#35bf28}+0.63\%$
test_items_stack_nested_leaf 0.1428ms 77.5607μs 12.8931 KOps/s 13.1804 KOps/s $\color{#d91a1a}-2.18\%$
test_items_stack_nested_locked 1.1430ms 0.2674ms 3.7401 KOps/s 3.7389 KOps/s $\color{#35bf28}+0.03\%$
test_keys 22.5420μs 4.0271μs 248.3178 KOps/s 260.7721 KOps/s $\color{#d91a1a}-4.78\%$
test_keys_nested 0.2122ms 0.1381ms 7.2434 KOps/s 7.1114 KOps/s $\color{#35bf28}+1.86\%$
test_keys_nested_locked 0.7168ms 0.1432ms 6.9824 KOps/s 6.9589 KOps/s $\color{#35bf28}+0.34\%$
test_keys_nested_leaf 0.2075ms 0.1162ms 8.6034 KOps/s 8.4797 KOps/s $\color{#35bf28}+1.46\%$
test_keys_stack_nested 0.2378ms 0.1387ms 7.2121 KOps/s 7.3019 KOps/s $\color{#d91a1a}-1.23\%$
test_keys_stack_nested_leaf 0.1965ms 0.1169ms 8.5561 KOps/s 8.6310 KOps/s $\color{#d91a1a}-0.87\%$
test_keys_stack_nested_locked 0.3057ms 0.1447ms 6.9102 KOps/s 7.1197 KOps/s $\color{#d91a1a}-2.94\%$
test_values 9.3298μs 1.1735μs 852.1274 KOps/s 880.8301 KOps/s $\color{#d91a1a}-3.26\%$
test_values_nested 0.1048ms 50.3781μs 19.8499 KOps/s 19.4865 KOps/s $\color{#35bf28}+1.86\%$
test_values_nested_locked 0.1010ms 50.3453μs 19.8628 KOps/s 19.4414 KOps/s $\color{#35bf28}+2.17\%$
test_values_nested_leaf 85.7710μs 45.9300μs 21.7723 KOps/s 21.5985 KOps/s $\color{#35bf28}+0.80\%$
test_values_stack_nested 99.8160μs 51.0966μs 19.5708 KOps/s 19.3302 KOps/s $\color{#35bf28}+1.24\%$
test_values_stack_nested_leaf 87.6140μs 45.6974μs 21.8831 KOps/s 22.1686 KOps/s $\color{#d91a1a}-1.29\%$
test_values_stack_nested_locked 0.1102ms 50.2708μs 19.8923 KOps/s 19.5574 KOps/s $\color{#35bf28}+1.71\%$
test_membership 35.9770μs 1.3268μs 753.7140 KOps/s 732.8642 KOps/s $\color{#35bf28}+2.84\%$
test_membership_nested 40.6160μs 3.4294μs 291.5935 KOps/s 280.7667 KOps/s $\color{#35bf28}+3.86\%$
test_membership_nested_leaf 39.0840μs 3.4240μs 292.0522 KOps/s 280.2631 KOps/s $\color{#35bf28}+4.21\%$
test_membership_stacked_nested 38.7630μs 3.4397μs 290.7196 KOps/s 284.0692 KOps/s $\color{#35bf28}+2.34\%$
test_membership_stacked_nested_leaf 29.1150μs 3.4303μs 291.5191 KOps/s 284.2234 KOps/s $\color{#35bf28}+2.57\%$
test_membership_nested_last 40.6360μs 4.2312μs 236.3413 KOps/s 232.5825 KOps/s $\color{#35bf28}+1.62\%$
test_membership_nested_leaf_last 49.7270μs 4.1845μs 238.9746 KOps/s 232.9815 KOps/s $\color{#35bf28}+2.57\%$
test_membership_stacked_nested_last 24.2150μs 4.2971μs 232.7133 KOps/s 73.8089 KOps/s $\textbf{\color{#35bf28}+215.29\%}$
test_membership_stacked_nested_leaf_last 41.4670μs 4.2196μs 236.9900 KOps/s 74.1847 KOps/s $\textbf{\color{#35bf28}+219.46\%}$
test_nested_getleaf 36.7790μs 10.5047μs 95.1956 KOps/s 93.0481 KOps/s $\color{#35bf28}+2.31\%$
test_nested_get 47.8800μs 10.0411μs 99.5910 KOps/s 98.6265 KOps/s $\color{#35bf28}+0.98\%$
test_stacked_getleaf 52.7480μs 10.3842μs 96.3004 KOps/s 93.7722 KOps/s $\color{#35bf28}+2.70\%$
test_stacked_get 45.5850μs 9.9835μs 100.1650 KOps/s 99.9561 KOps/s $\color{#35bf28}+0.21\%$
test_nested_getitemleaf 66.7780μs 11.0328μs 90.6385 KOps/s 91.5765 KOps/s $\color{#d91a1a}-1.02\%$
test_nested_getitem 46.8870μs 10.2331μs 97.7225 KOps/s 97.9164 KOps/s $\color{#d91a1a}-0.20\%$
test_stacked_getitemleaf 45.3250μs 11.2147μs 89.1686 KOps/s 90.4063 KOps/s $\color{#d91a1a}-1.37\%$
test_stacked_getitem 31.8900μs 10.2192μs 97.8548 KOps/s 98.1783 KOps/s $\color{#d91a1a}-0.33\%$
test_lock_nested 0.8087ms 0.3447ms 2.9010 KOps/s 2.9173 KOps/s $\color{#d91a1a}-0.56\%$
test_lock_stack_nested 0.6857ms 0.3137ms 3.1881 KOps/s 3.3438 KOps/s $\color{#d91a1a}-4.65\%$
test_unlock_nested 0.9498ms 0.3477ms 2.8757 KOps/s 2.8355 KOps/s $\color{#35bf28}+1.42\%$
test_unlock_stack_nested 0.6346ms 0.3216ms 3.1099 KOps/s 3.2646 KOps/s $\color{#d91a1a}-4.74\%$
test_flatten_speed 0.5393ms 93.9752μs 10.6411 KOps/s 10.4144 KOps/s $\color{#35bf28}+2.18\%$
test_unflatten_speed 0.6190ms 0.4038ms 2.4765 KOps/s 2.4080 KOps/s $\color{#35bf28}+2.84\%$
test_common_ops 4.6699ms 0.7476ms 1.3376 KOps/s 1.4180 KOps/s $\textbf{\color{#d91a1a}-5.67\%}$
test_creation 51.6870μs 1.9809μs 504.8305 KOps/s 522.7033 KOps/s $\color{#d91a1a}-3.42\%$
test_creation_empty 46.6170μs 11.0524μs 90.4778 KOps/s 108.0304 KOps/s $\textbf{\color{#d91a1a}-16.25\%}$
test_creation_nested_1 41.2970μs 13.7409μs 72.7752 KOps/s 82.7492 KOps/s $\textbf{\color{#d91a1a}-12.05\%}$
test_creation_nested_2 58.4290μs 17.1237μs 58.3984 KOps/s 64.9887 KOps/s $\textbf{\color{#d91a1a}-10.14\%}$
test_clone 0.1894ms 13.4275μs 74.4743 KOps/s 74.9993 KOps/s $\color{#d91a1a}-0.70\%$
test_getitem[int] 52.7880μs 11.3909μs 87.7894 KOps/s 87.8218 KOps/s $\color{#d91a1a}-0.04\%$
test_getitem[slice_int] 72.5650μs 22.8362μs 43.7900 KOps/s 43.6397 KOps/s $\color{#35bf28}+0.34\%$
test_getitem[range] 86.4310μs 61.8414μs 16.1704 KOps/s 16.6605 KOps/s $\color{#d91a1a}-2.94\%$
test_getitem[tuple] 64.7820μs 18.7414μs 53.3577 KOps/s 52.8816 KOps/s $\color{#35bf28}+0.90\%$
test_getitem[list] 0.1778ms 42.7920μs 23.3688 KOps/s 24.2160 KOps/s $\color{#d91a1a}-3.50\%$
test_setitem_dim[int] 63.5790μs 34.9728μs 28.5936 KOps/s 28.5599 KOps/s $\color{#35bf28}+0.12\%$
test_setitem_dim[slice_int] 0.1114ms 63.1980μs 15.8233 KOps/s 15.4684 KOps/s $\color{#35bf28}+2.29\%$
test_setitem_dim[range] 0.2618ms 87.3200μs 11.4521 KOps/s 11.7661 KOps/s $\color{#d91a1a}-2.67\%$
test_setitem_dim[tuple] 0.1181ms 51.1064μs 19.5670 KOps/s 19.7985 KOps/s $\color{#d91a1a}-1.17\%$
test_setitem 71.4140μs 20.5245μs 48.7223 KOps/s 52.1022 KOps/s $\textbf{\color{#d91a1a}-6.49\%}$
test_set 64.4410μs 20.0689μs 49.8283 KOps/s 53.6159 KOps/s $\textbf{\color{#d91a1a}-7.06\%}$
test_set_shared 4.0178ms 0.1466ms 6.8191 KOps/s 6.9158 KOps/s $\color{#d91a1a}-1.40\%$
test_update 0.1681ms 23.1562μs 43.1850 KOps/s 50.8227 KOps/s $\textbf{\color{#d91a1a}-15.03\%}$
test_update_nested 96.6210μs 31.6188μs 31.6267 KOps/s 34.6748 KOps/s $\textbf{\color{#d91a1a}-8.79\%}$
test_update__nested 76.2330μs 25.0452μs 39.9278 KOps/s 38.5926 KOps/s $\color{#35bf28}+3.46\%$
test_set_nested 0.1054ms 22.1874μs 45.0707 KOps/s 47.1343 KOps/s $\color{#d91a1a}-4.38\%$
test_set_nested_new 97.8130μs 26.1974μs 38.1718 KOps/s 38.2946 KOps/s $\color{#d91a1a}-0.32\%$
test_select 0.1240ms 41.7797μs 23.9351 KOps/s 24.8441 KOps/s $\color{#d91a1a}-3.66\%$
test_select_nested 0.1121ms 59.8738μs 16.7018 KOps/s 16.8578 KOps/s $\color{#d91a1a}-0.93\%$
test_exclude_nested 0.2918ms 0.1182ms 8.4605 KOps/s 8.3963 KOps/s $\color{#35bf28}+0.76\%$
test_empty[True] 0.6454ms 0.4030ms 2.4816 KOps/s 2.5359 KOps/s $\color{#d91a1a}-2.14\%$
test_empty[False] 8.1472μs 1.1681μs 856.0737 KOps/s 851.4972 KOps/s $\color{#35bf28}+0.54\%$
test_unbind_speed 0.4437ms 0.2538ms 3.9405 KOps/s 3.8822 KOps/s $\color{#35bf28}+1.50\%$
test_unbind_speed_stack0 0.4097ms 0.2568ms 3.8940 KOps/s 4.0572 KOps/s $\color{#d91a1a}-4.02\%$
test_unbind_speed_stack1 0.8171ms 0.6517ms 1.5344 KOps/s 1.4004 KOps/s $\textbf{\color{#35bf28}+9.57\%}$
test_split 69.6045ms 1.6156ms 618.9544 Ops/s 625.4742 Ops/s $\color{#d91a1a}-1.04\%$
test_chunk 70.8881ms 1.6148ms 619.2741 Ops/s 619.4332 Ops/s $\color{#d91a1a}-0.03\%$
test_creation[device0] 0.2052ms 86.5312μs 11.5565 KOps/s 11.7076 KOps/s $\color{#d91a1a}-1.29\%$
test_creation_from_tensor 3.5792ms 87.8421μs 11.3841 KOps/s 11.3096 KOps/s $\color{#35bf28}+0.66\%$
test_add_one[memmap_tensor0] 0.1261ms 5.6842μs 175.9262 KOps/s 182.0096 KOps/s $\color{#d91a1a}-3.34\%$
test_contiguous[memmap_tensor0] 7.5440μs 0.6337μs 1.5779 MOps/s 1.5731 MOps/s $\color{#35bf28}+0.31\%$
test_stack[memmap_tensor0] 26.5590μs 3.7435μs 267.1298 KOps/s 282.4426 KOps/s $\textbf{\color{#d91a1a}-5.42\%}$
test_memmaptd_index 0.9995ms 0.2522ms 3.9651 KOps/s 4.0007 KOps/s $\color{#d91a1a}-0.89\%$
test_memmaptd_index_astensor 0.7350ms 0.3279ms 3.0501 KOps/s 3.0871 KOps/s $\color{#d91a1a}-1.20\%$
test_memmaptd_index_op 1.0183ms 0.6244ms 1.6014 KOps/s 1.6947 KOps/s $\textbf{\color{#d91a1a}-5.51\%}$
test_serialize_model 0.1821s 0.1153s 8.6741 Ops/s 8.5203 Ops/s $\color{#35bf28}+1.81\%$
test_serialize_model_pickle 0.4475s 0.3799s 2.6323 Ops/s 2.4591 Ops/s $\textbf{\color{#35bf28}+7.05\%}$
test_serialize_weights 0.1759s 0.1122s 8.9154 Ops/s 8.5751 Ops/s $\color{#35bf28}+3.97\%$
test_serialize_weights_returnearly 0.1977s 0.1352s 7.3990 Ops/s 7.1674 Ops/s $\color{#35bf28}+3.23\%$
test_serialize_weights_pickle 0.7076s 0.4853s 2.0605 Ops/s 2.4651 Ops/s $\textbf{\color{#d91a1a}-16.41\%}$
test_serialize_weights_filesystem 94.6158ms 91.8466ms 10.8877 Ops/s 10.7686 Ops/s $\color{#35bf28}+1.11\%$
test_serialize_model_filesystem 0.1631s 0.1023s 9.7789 Ops/s 9.6710 Ops/s $\color{#35bf28}+1.12\%$
test_reshape_pytree 53.7000μs 25.3638μs 39.4263 KOps/s 39.0742 KOps/s $\color{#35bf28}+0.90\%$
test_reshape_td 78.1660μs 33.8176μs 29.5704 KOps/s 28.5995 KOps/s $\color{#35bf28}+3.39\%$
test_view_pytree 68.6990μs 25.4950μs 39.2234 KOps/s 38.8954 KOps/s $\color{#35bf28}+0.84\%$
test_view_td 0.1212ms 38.4239μs 26.0255 KOps/s 25.7361 KOps/s $\color{#35bf28}+1.12\%$
test_unbind_pytree 73.6980μs 29.5471μs 33.8442 KOps/s 33.9126 KOps/s $\color{#d91a1a}-0.20\%$
test_unbind_td 0.3653ms 37.7611μs 26.4823 KOps/s 26.2891 KOps/s $\color{#35bf28}+0.73\%$
test_split_pytree 73.0470μs 29.5196μs 33.8758 KOps/s 34.1940 KOps/s $\color{#d91a1a}-0.93\%$
test_split_td 0.1237ms 40.4752μs 24.7065 KOps/s 24.8254 KOps/s $\color{#d91a1a}-0.48\%$
test_add_pytree 88.3360μs 35.8756μs 27.8741 KOps/s 28.6214 KOps/s $\color{#d91a1a}-2.61\%$
test_add_td 0.1196ms 55.8544μs 17.9037 KOps/s 18.4731 KOps/s $\color{#d91a1a}-3.08\%$
test_distributed 0.2018ms 0.1019ms 9.8172 KOps/s 9.6132 KOps/s $\color{#35bf28}+2.12\%$
test_tdmodule 46.1970μs 18.4091μs 54.3210 KOps/s 58.7876 KOps/s $\textbf{\color{#d91a1a}-7.60\%}$
test_tdmodule_dispatch 69.1190μs 35.2257μs 28.3883 KOps/s 29.9371 KOps/s $\textbf{\color{#d91a1a}-5.17\%}$
test_tdseq 54.8830μs 20.6105μs 48.5191 KOps/s 51.4226 KOps/s $\textbf{\color{#d91a1a}-5.65\%}$
test_tdseq_dispatch 75.7820μs 40.5059μs 24.6877 KOps/s 26.0618 KOps/s $\textbf{\color{#d91a1a}-5.27\%}$
test_instantiation_functorch 1.5852ms 1.2897ms 775.3525 Ops/s 753.5812 Ops/s $\color{#35bf28}+2.89\%$
test_instantiation_td 1.7091ms 1.0221ms 978.4017 Ops/s 971.2304 Ops/s $\color{#35bf28}+0.74\%$
test_exec_functorch 0.3041ms 0.1617ms 6.1833 KOps/s 6.1664 KOps/s $\color{#35bf28}+0.27\%$
test_exec_functional_call 0.2813ms 0.1494ms 6.6944 KOps/s 6.6970 KOps/s $\color{#d91a1a}-0.04\%$
test_exec_td 0.2485ms 0.1452ms 6.8879 KOps/s 6.8474 KOps/s $\color{#35bf28}+0.59\%$
test_exec_td_decorator 0.9500ms 0.2241ms 4.4626 KOps/s 4.3612 KOps/s $\color{#35bf28}+2.32\%$
test_vmap_mlp_speed[True-True] 0.7117ms 0.5000ms 2.0000 KOps/s 2.0449 KOps/s $\color{#d91a1a}-2.19\%$
test_vmap_mlp_speed[True-False] 0.8166ms 0.4973ms 2.0109 KOps/s 2.0845 KOps/s $\color{#d91a1a}-3.53\%$
test_vmap_mlp_speed[False-True] 0.6065ms 0.4050ms 2.4691 KOps/s 2.5202 KOps/s $\color{#d91a1a}-2.03\%$
test_vmap_mlp_speed[False-False] 0.8009ms 0.4090ms 2.4448 KOps/s 2.5198 KOps/s $\color{#d91a1a}-2.97\%$
test_vmap_mlp_speed_decorator[True-True] 1.3417ms 0.5733ms 1.7443 KOps/s 1.7863 KOps/s $\color{#d91a1a}-2.35\%$
test_vmap_mlp_speed_decorator[True-False] 0.7897ms 0.5682ms 1.7599 KOps/s 1.7904 KOps/s $\color{#d91a1a}-1.70\%$
test_vmap_mlp_speed_decorator[False-True] 0.7250ms 0.4682ms 2.1356 KOps/s 2.1770 KOps/s $\color{#d91a1a}-1.90\%$
test_vmap_mlp_speed_decorator[False-False] 0.6578ms 0.4653ms 2.1492 KOps/s 2.1687 KOps/s $\color{#d91a1a}-0.90\%$
test_to_module_speed[True] 2.3785ms 1.6990ms 588.5954 Ops/s 589.1059 Ops/s $\color{#d91a1a}-0.09\%$
test_to_module_speed[False] 72.9227ms 1.8010ms 555.2496 Ops/s 550.5737 Ops/s $\color{#35bf28}+0.85\%$
test_tc_init 64.4610μs 30.1319μs 33.1874 KOps/s 38.1334 KOps/s $\textbf{\color{#d91a1a}-12.97\%}$
test_tc_init_nested 0.1227ms 61.1356μs 16.3571 KOps/s 18.8585 KOps/s $\textbf{\color{#d91a1a}-13.26\%}$
test_tc_first_layer_tensor 3.7749μs 0.6812μs 1.4680 MOps/s 1.4143 MOps/s $\color{#35bf28}+3.80\%$
test_tc_first_layer_nontensor 2.1153μs 0.6657μs 1.5022 MOps/s 1.4439 MOps/s $\color{#35bf28}+4.04\%$
test_tc_second_layer_tensor 18.7980μs 1.8719μs 534.2093 KOps/s 526.8644 KOps/s $\color{#35bf28}+1.39\%$
test_tc_second_layer_nontensor 42.8410μs 1.6297μs 613.6222 KOps/s 653.6085 KOps/s $\textbf{\color{#d91a1a}-6.12\%}$
test_unbind 81.9658ms 7.4390ms 134.4266 Ops/s 138.2737 Ops/s $\color{#d91a1a}-2.78\%$
test_full_like 15.7704ms 10.6495ms 93.9008 Ops/s 94.6739 Ops/s $\color{#d91a1a}-0.82\%$
test_zeros_like 12.1753ms 6.1137ms 163.5678 Ops/s 176.0668 Ops/s $\textbf{\color{#d91a1a}-7.10\%}$
test_ones_like 12.2791ms 6.3496ms 157.4911 Ops/s 162.4875 Ops/s $\color{#d91a1a}-3.07\%$
test_clone 16.5575ms 8.0128ms 124.7999 Ops/s 128.6842 Ops/s $\color{#d91a1a}-3.02\%$
test_squeeze 60.7440μs 13.6691μs 73.1580 KOps/s 69.9096 KOps/s $\color{#35bf28}+4.65\%$
test_unsqueeze 0.1257ms 59.6242μs 16.7717 KOps/s 16.7404 KOps/s $\color{#35bf28}+0.19\%$
test_split 0.1960ms 0.1130ms 8.8484 KOps/s 8.7849 KOps/s $\color{#35bf28}+0.72\%$
test_permute 0.2000ms 0.1286ms 7.7780 KOps/s 7.7739 KOps/s $\color{#35bf28}+0.05\%$
test_stack 27.6823ms 22.6909ms 44.0705 Ops/s 44.9535 Ops/s $\color{#d91a1a}-1.96\%$
test_cat 28.7136ms 24.3236ms 41.1124 Ops/s 44.9512 Ops/s $\textbf{\color{#d91a1a}-8.54\%}$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 24.8710μs 12.5577μs 79.6323 KOps/s 80.7910 KOps/s $\color{#d91a1a}-1.43\%$
test_plain_set_stack_nested 28.5110μs 12.8887μs 77.5873 KOps/s 78.8146 KOps/s $\color{#d91a1a}-1.56\%$
test_plain_set_nested_inplace 37.6120μs 14.0120μs 71.3673 KOps/s 72.8356 KOps/s $\color{#d91a1a}-2.02\%$
test_plain_set_stack_nested_inplace 74.2150μs 14.0182μs 71.3356 KOps/s 72.5156 KOps/s $\color{#d91a1a}-1.63\%$
test_items 28.7620μs 4.7223μs 211.7611 KOps/s 209.8478 KOps/s $\color{#35bf28}+0.91\%$
test_items_nested 0.3633ms 0.3414ms 2.9288 KOps/s 2.9541 KOps/s $\color{#d91a1a}-0.86\%$
test_items_nested_locked 0.3674ms 0.3404ms 2.9373 KOps/s 2.9391 KOps/s $\color{#d91a1a}-0.06\%$
test_items_nested_leaf 0.1012ms 82.2934μs 12.1516 KOps/s 12.0547 KOps/s $\color{#35bf28}+0.80\%$
test_items_stack_nested 0.3585ms 0.3371ms 2.9666 KOps/s 2.9086 KOps/s $\color{#35bf28}+2.00\%$
test_items_stack_nested_leaf 0.1110ms 82.6973μs 12.0923 KOps/s 12.0161 KOps/s $\color{#35bf28}+0.63\%$
test_items_stack_nested_locked 0.3812ms 0.3410ms 2.9322 KOps/s 2.9262 KOps/s $\color{#35bf28}+0.21\%$
test_keys 22.8410μs 4.3647μs 229.1095 KOps/s 229.2003 KOps/s $\color{#d91a1a}-0.04\%$
test_keys_nested 88.8450μs 66.6552μs 15.0026 KOps/s 14.7529 KOps/s $\color{#35bf28}+1.69\%$
test_keys_nested_locked 2.4762ms 71.4138μs 14.0029 KOps/s 13.7910 KOps/s $\color{#35bf28}+1.54\%$
test_keys_nested_leaf 80.8250μs 57.1412μs 17.5005 KOps/s 17.1968 KOps/s $\color{#35bf28}+1.77\%$
test_keys_stack_nested 94.4660μs 65.8606μs 15.1836 KOps/s 14.8218 KOps/s $\color{#35bf28}+2.44\%$
test_keys_stack_nested_leaf 84.0750μs 57.0160μs 17.5389 KOps/s 17.2275 KOps/s $\color{#35bf28}+1.81\%$
test_keys_stack_nested_locked 0.1015ms 70.4269μs 14.1991 KOps/s 13.9533 KOps/s $\color{#35bf28}+1.76\%$
test_values 10.4740μs 1.8258μs 547.7190 KOps/s 537.5097 KOps/s $\color{#35bf28}+1.90\%$
test_values_nested 58.3540μs 35.2060μs 28.4042 KOps/s 28.5012 KOps/s $\color{#d91a1a}-0.34\%$
test_values_nested_locked 56.3040μs 36.9057μs 27.0961 KOps/s 26.9379 KOps/s $\color{#35bf28}+0.59\%$
test_values_nested_leaf 51.3930μs 31.3639μs 31.8838 KOps/s 32.0048 KOps/s $\color{#d91a1a}-0.38\%$
test_values_stack_nested 63.9240μs 35.9578μs 27.8103 KOps/s 27.7501 KOps/s $\color{#35bf28}+0.22\%$
test_values_stack_nested_leaf 55.1540μs 31.8362μs 31.4108 KOps/s 31.2306 KOps/s $\color{#35bf28}+0.58\%$
test_values_stack_nested_locked 62.6640μs 37.4814μs 26.6799 KOps/s 26.4866 KOps/s $\color{#35bf28}+0.73\%$
test_membership 4.3931μs 0.7263μs 1.3768 MOps/s 1.4216 MOps/s $\color{#d91a1a}-3.15\%$
test_membership_nested 19.9210μs 2.5892μs 386.2189 KOps/s 383.5603 KOps/s $\color{#35bf28}+0.69\%$
test_membership_nested_leaf 25.1020μs 2.6154μs 382.3499 KOps/s 380.2530 KOps/s $\color{#35bf28}+0.55\%$
test_membership_stacked_nested 21.4910μs 2.6163μs 382.2127 KOps/s 386.7757 KOps/s $\color{#d91a1a}-1.18\%$
test_membership_stacked_nested_leaf 20.5510μs 2.5689μs 389.2777 KOps/s 387.0145 KOps/s $\color{#35bf28}+0.58\%$
test_membership_nested_last 32.6620μs 3.0998μs 322.6055 KOps/s 319.5239 KOps/s $\color{#35bf28}+0.96\%$
test_membership_nested_leaf_last 21.6610μs 3.0826μs 324.4057 KOps/s 318.1077 KOps/s $\color{#35bf28}+1.98\%$
test_membership_stacked_nested_last 40.3120μs 9.8155μs 101.8801 KOps/s 278.4616 KOps/s $\textbf{\color{#d91a1a}-63.41\%}$
test_membership_stacked_nested_leaf_last 24.3610μs 9.7845μs 102.2021 KOps/s 278.7410 KOps/s $\textbf{\color{#d91a1a}-63.33\%}$
test_nested_getleaf 34.3320μs 8.4383μs 118.5073 KOps/s 119.4465 KOps/s $\color{#d91a1a}-0.79\%$
test_nested_get 34.5630μs 7.9280μs 126.1356 KOps/s 127.3868 KOps/s $\color{#d91a1a}-0.98\%$
test_stacked_getleaf 27.7320μs 8.4314μs 118.6037 KOps/s 118.7666 KOps/s $\color{#d91a1a}-0.14\%$
test_stacked_get 37.8520μs 7.9316μs 126.0776 KOps/s 126.3014 KOps/s $\color{#d91a1a}-0.18\%$
test_nested_getitemleaf 32.6520μs 8.5825μs 116.5164 KOps/s 116.7016 KOps/s $\color{#d91a1a}-0.16\%$
test_nested_getitem 23.8110μs 8.0598μs 124.0720 KOps/s 124.9090 KOps/s $\color{#d91a1a}-0.67\%$
test_stacked_getitemleaf 35.3820μs 8.6591μs 115.4855 KOps/s 116.0807 KOps/s $\color{#d91a1a}-0.51\%$
test_stacked_getitem 29.9120μs 8.1707μs 122.3884 KOps/s 124.3248 KOps/s $\color{#d91a1a}-1.56\%$
test_lock_nested 59.9662ms 0.4000ms 2.4998 KOps/s 2.5272 KOps/s $\color{#d91a1a}-1.09\%$
test_lock_stack_nested 0.3108ms 0.2896ms 3.4532 KOps/s 3.3663 KOps/s $\color{#35bf28}+2.58\%$
test_unlock_nested 61.5180ms 0.4036ms 2.4774 KOps/s 2.4761 KOps/s $\color{#35bf28}+0.05\%$
test_unlock_stack_nested 0.3368ms 0.3003ms 3.3305 KOps/s 3.2672 KOps/s $\color{#35bf28}+1.94\%$
test_flatten_speed 0.3934ms 0.1014ms 9.8666 KOps/s 9.8981 KOps/s $\color{#d91a1a}-0.32\%$
test_unflatten_speed 0.3353ms 0.2924ms 3.4201 KOps/s 3.4369 KOps/s $\color{#d91a1a}-0.49\%$
test_common_ops 1.0374ms 0.5715ms 1.7497 KOps/s 1.7599 KOps/s $\color{#d91a1a}-0.58\%$
test_creation 37.1420μs 1.6643μs 600.8709 KOps/s 600.5082 KOps/s $\color{#35bf28}+0.06\%$
test_creation_empty 23.1420μs 8.2462μs 121.2686 KOps/s 124.7682 KOps/s $\color{#d91a1a}-2.80\%$
test_creation_nested_1 34.3820μs 10.0159μs 99.8414 KOps/s 102.3222 KOps/s $\color{#d91a1a}-2.42\%$
test_creation_nested_2 44.5030μs 12.2744μs 81.4703 KOps/s 83.2697 KOps/s $\color{#d91a1a}-2.16\%$
test_clone 0.1059ms 11.6566μs 85.7886 KOps/s 85.7024 KOps/s $\color{#35bf28}+0.10\%$
test_getitem[int] 26.1110μs 10.7838μs 92.7320 KOps/s 93.0268 KOps/s $\color{#d91a1a}-0.32\%$
test_getitem[slice_int] 57.6030μs 20.3667μs 49.0997 KOps/s 48.2879 KOps/s $\color{#35bf28}+1.68\%$
test_getitem[range] 65.2440μs 47.7970μs 20.9218 KOps/s 21.6972 KOps/s $\color{#d91a1a}-3.57\%$
test_getitem[tuple] 55.7340μs 18.5628μs 53.8711 KOps/s 53.9451 KOps/s $\color{#d91a1a}-0.14\%$
test_getitem[list] 0.1189ms 34.0958μs 29.3291 KOps/s 29.8544 KOps/s $\color{#d91a1a}-1.76\%$
test_setitem_dim[int] 50.3440μs 28.0381μs 35.6658 KOps/s 35.9692 KOps/s $\color{#d91a1a}-0.84\%$
test_setitem_dim[slice_int] 70.5350μs 49.4391μs 20.2269 KOps/s 20.8307 KOps/s $\color{#d91a1a}-2.90\%$
test_setitem_dim[range] 0.1081ms 66.1750μs 15.1114 KOps/s 15.2565 KOps/s $\color{#d91a1a}-0.95\%$
test_setitem_dim[tuple] 62.3740μs 41.5762μs 24.0522 KOps/s 23.7802 KOps/s $\color{#35bf28}+1.14\%$
test_setitem 50.1730μs 16.1784μs 61.8108 KOps/s 63.5600 KOps/s $\color{#d91a1a}-2.75\%$
test_set 0.1377ms 15.6956μs 63.7123 KOps/s 65.7019 KOps/s $\color{#d91a1a}-3.03\%$
test_set_shared 1.6249ms 0.1004ms 9.9605 KOps/s 10.0918 KOps/s $\color{#d91a1a}-1.30\%$
test_update 91.3960μs 18.1164μs 55.1987 KOps/s 58.3043 KOps/s $\textbf{\color{#d91a1a}-5.33\%}$
test_update_nested 58.0240μs 23.6856μs 42.2198 KOps/s 45.0623 KOps/s $\textbf{\color{#d91a1a}-6.31\%}$
test_update__nested 69.4150μs 22.4512μs 44.5410 KOps/s 44.8206 KOps/s $\color{#d91a1a}-0.62\%$
test_set_nested 52.1130μs 16.6743μs 59.9724 KOps/s 61.2423 KOps/s $\color{#d91a1a}-2.07\%$
test_set_nested_new 60.7340μs 19.3520μs 51.6743 KOps/s 52.5873 KOps/s $\color{#d91a1a}-1.74\%$
test_select 76.9750μs 32.4296μs 30.8360 KOps/s 30.5816 KOps/s $\color{#35bf28}+0.83\%$
test_select_nested 0.8763ms 55.8618μs 17.9013 KOps/s 18.4689 KOps/s $\color{#d91a1a}-3.07\%$
test_exclude_nested 0.1331ms 0.1086ms 9.2046 KOps/s 9.2439 KOps/s $\color{#d91a1a}-0.43\%$
test_empty[True] 0.3831ms 0.3484ms 2.8699 KOps/s 2.9225 KOps/s $\color{#d91a1a}-1.80\%$
test_empty[False] 2.3232μs 0.9336μs 1.0712 MOps/s 1.0706 MOps/s $\color{#35bf28}+0.05\%$
test_to 0.1033ms 77.4261μs 12.9155 KOps/s 13.3318 KOps/s $\color{#d91a1a}-3.12\%$
test_to_nonblocking 0.2114ms 62.2499μs 16.0643 KOps/s 16.5890 KOps/s $\color{#d91a1a}-3.16\%$
test_unbind_speed 0.8515ms 0.2562ms 3.9025 KOps/s 3.8484 KOps/s $\color{#35bf28}+1.41\%$
test_unbind_speed_stack0 0.2956ms 0.2561ms 3.9046 KOps/s 3.8443 KOps/s $\color{#35bf28}+1.57\%$
test_unbind_speed_stack1 76.6163ms 0.7929ms 1.2613 KOps/s 1.2532 KOps/s $\color{#35bf28}+0.64\%$
test_split 76.6260ms 1.6779ms 596.0008 Ops/s 595.0581 Ops/s $\color{#35bf28}+0.16\%$
test_chunk 76.8836ms 1.6705ms 598.6310 Ops/s 599.8837 Ops/s $\color{#d91a1a}-0.21\%$
test_creation[device0] 0.1129ms 58.4498μs 17.1087 KOps/s 17.1486 KOps/s $\color{#d91a1a}-0.23\%$
test_creation_from_tensor 0.1621ms 54.2785μs 18.4235 KOps/s 18.5918 KOps/s $\color{#d91a1a}-0.91\%$
test_add_one[memmap_tensor0] 87.2060μs 6.7916μs 147.2418 KOps/s 149.0618 KOps/s $\color{#d91a1a}-1.22\%$
test_contiguous[memmap_tensor0] 25.3410μs 0.7085μs 1.4115 MOps/s 1.4504 MOps/s $\color{#d91a1a}-2.68\%$
test_stack[memmap_tensor0] 33.3920μs 4.7359μs 211.1519 KOps/s 217.9426 KOps/s $\color{#d91a1a}-3.12\%$
test_memmaptd_index 1.0840ms 0.2852ms 3.5060 KOps/s 3.4759 KOps/s $\color{#35bf28}+0.87\%$
test_memmaptd_index_astensor 0.7045ms 0.3557ms 2.8115 KOps/s 2.8025 KOps/s $\color{#35bf28}+0.32\%$
test_memmaptd_index_op 1.0395ms 0.6408ms 1.5605 KOps/s 1.5872 KOps/s $\color{#d91a1a}-1.68\%$
test_serialize_model 0.1829s 0.1099s 9.0955 Ops/s 9.5211 Ops/s $\color{#d91a1a}-4.47\%$
test_serialize_model_pickle 1.3704s 1.2387s 0.8073 Ops/s 0.8071 Ops/s $\color{#35bf28}+0.03\%$
test_serialize_weights 0.1815s 0.1085s 9.2152 Ops/s 8.7908 Ops/s $\color{#35bf28}+4.83\%$
test_serialize_weights_returnearly 0.2577s 0.1001s 9.9853 Ops/s 12.4686 Ops/s $\textbf{\color{#d91a1a}-19.92\%}$
test_serialize_weights_pickle 1.3507s 1.2488s 0.8008 Ops/s 0.8009 Ops/s $\color{#d91a1a}-0.01\%$
test_reshape_pytree 0.1719ms 26.3667μs 37.9266 KOps/s 38.4222 KOps/s $\color{#d91a1a}-1.29\%$
test_reshape_td 0.1610ms 31.4599μs 31.7865 KOps/s 32.8143 KOps/s $\color{#d91a1a}-3.13\%$
test_view_pytree 0.1574ms 26.3161μs 37.9996 KOps/s 38.7853 KOps/s $\color{#d91a1a}-2.03\%$
test_view_td 0.1572ms 36.3987μs 27.4735 KOps/s 27.1546 KOps/s $\color{#35bf28}+1.17\%$
test_unbind_pytree 58.6840μs 31.4727μs 31.7736 KOps/s 29.8248 KOps/s $\textbf{\color{#35bf28}+6.53\%}$
test_unbind_td 0.4608ms 39.8834μs 25.0731 KOps/s 25.1870 KOps/s $\color{#d91a1a}-0.45\%$
test_split_pytree 54.0730μs 33.9993μs 29.4124 KOps/s 27.8065 KOps/s $\textbf{\color{#35bf28}+5.78\%}$
test_split_td 0.1046ms 39.9286μs 25.0447 KOps/s 25.5118 KOps/s $\color{#d91a1a}-1.83\%$
test_add_pytree 72.5750μs 37.1987μs 26.8827 KOps/s 25.6463 KOps/s $\color{#35bf28}+4.82\%$
test_add_td 82.3750μs 50.5184μs 19.7948 KOps/s 20.0095 KOps/s $\color{#d91a1a}-1.07\%$
test_distributed 0.1830ms 70.5772μs 14.1689 KOps/s 13.8423 KOps/s $\color{#35bf28}+2.36\%$
test_tdmodule 0.1360ms 14.5357μs 68.7963 KOps/s 66.9373 KOps/s $\color{#35bf28}+2.78\%$
test_tdmodule_dispatch 43.7620μs 28.6674μs 34.8828 KOps/s 35.2070 KOps/s $\color{#d91a1a}-0.92\%$
test_tdseq 32.5420μs 16.4918μs 60.6363 KOps/s 60.2860 KOps/s $\color{#35bf28}+0.58\%$
test_tdseq_dispatch 47.8830μs 31.8433μs 31.4038 KOps/s 31.0279 KOps/s $\color{#35bf28}+1.21\%$
test_instantiation_functorch 1.6269ms 1.5167ms 659.3109 Ops/s 652.7405 Ops/s $\color{#35bf28}+1.01\%$
test_instantiation_td 1.8628ms 1.0350ms 966.2251 Ops/s 937.4874 Ops/s $\color{#35bf28}+3.07\%$
test_exec_functorch 0.1858ms 0.1459ms 6.8525 KOps/s 6.6546 KOps/s $\color{#35bf28}+2.97\%$
test_exec_functional_call 0.1795ms 0.1310ms 7.6342 KOps/s 7.4389 KOps/s $\color{#35bf28}+2.63\%$
test_exec_td 0.1647ms 0.1301ms 7.6879 KOps/s 7.1609 KOps/s $\textbf{\color{#35bf28}+7.36\%}$
test_exec_td_decorator 0.5109ms 0.2008ms 4.9788 KOps/s 4.7954 KOps/s $\color{#35bf28}+3.82\%$
test_vmap_mlp_speed[True-True] 0.7561ms 0.5658ms 1.7673 KOps/s 1.7208 KOps/s $\color{#35bf28}+2.70\%$
test_vmap_mlp_speed[True-False] 0.6578ms 0.5624ms 1.7779 KOps/s 1.7527 KOps/s $\color{#35bf28}+1.44\%$
test_vmap_mlp_speed[False-True] 0.5621ms 0.5035ms 1.9860 KOps/s 1.9552 KOps/s $\color{#35bf28}+1.57\%$
test_vmap_mlp_speed[False-False] 0.5683ms 0.5127ms 1.9506 KOps/s 1.9567 KOps/s $\color{#d91a1a}-0.32\%$
test_vmap_mlp_speed_decorator[True-True] 1.1656ms 0.6385ms 1.5662 KOps/s 1.5527 KOps/s $\color{#35bf28}+0.87\%$
test_vmap_mlp_speed_decorator[True-False] 0.7920ms 0.6364ms 1.5713 KOps/s 1.5686 KOps/s $\color{#35bf28}+0.18\%$
test_vmap_mlp_speed_decorator[False-True] 0.7044ms 0.5647ms 1.7708 KOps/s 1.5819 KOps/s $\textbf{\color{#35bf28}+11.94\%}$
test_vmap_mlp_speed_decorator[False-False] 0.7598ms 0.5837ms 1.7131 KOps/s 1.7466 KOps/s $\color{#d91a1a}-1.92\%$
test_vmap_transformer_speed[True-True] 8.0190ms 7.5788ms 131.9463 Ops/s 136.6080 Ops/s $\color{#d91a1a}-3.41\%$
test_vmap_transformer_speed[True-False] 8.0025ms 7.5023ms 133.2925 Ops/s 137.0938 Ops/s $\color{#d91a1a}-2.77\%$
test_vmap_transformer_speed[False-True] 8.2188ms 7.4683ms 133.8988 Ops/s 136.6549 Ops/s $\color{#d91a1a}-2.02\%$
test_vmap_transformer_speed[False-False] 7.7921ms 7.4334ms 134.5273 Ops/s 138.0213 Ops/s $\color{#d91a1a}-2.53\%$
test_vmap_transformer_speed_decorator[True-True] 19.0319ms 18.2517ms 54.7894 Ops/s 56.1436 Ops/s $\color{#d91a1a}-2.41\%$
test_vmap_transformer_speed_decorator[True-False] 18.7769ms 18.2600ms 54.7645 Ops/s 55.0494 Ops/s $\color{#d91a1a}-0.52\%$
test_vmap_transformer_speed_decorator[False-True] 18.4784ms 18.1192ms 55.1900 Ops/s 56.7702 Ops/s $\color{#d91a1a}-2.78\%$
test_vmap_transformer_speed_decorator[False-False] 18.8025ms 18.1893ms 54.9774 Ops/s 56.7676 Ops/s $\color{#d91a1a}-3.15\%$
test_to_module_speed[True] 1.8819ms 1.5726ms 635.8944 Ops/s 650.9032 Ops/s $\color{#d91a1a}-2.31\%$
test_to_module_speed[False] 1.8102ms 1.5475ms 646.2233 Ops/s 655.8360 Ops/s $\color{#d91a1a}-1.47\%$
test_tc_init 0.1576ms 24.4753μs 40.8575 KOps/s 41.1552 KOps/s $\color{#d91a1a}-0.72\%$
test_tc_init_nested 0.1903ms 53.4223μs 18.7188 KOps/s 20.9665 KOps/s $\textbf{\color{#d91a1a}-10.72\%}$
test_tc_first_layer_tensor 3.4955μs 0.3649μs 2.7402 MOps/s 2.7547 MOps/s $\color{#d91a1a}-0.52\%$
test_tc_first_layer_nontensor 10.4935μs 0.3992μs 2.5049 MOps/s 2.5375 MOps/s $\color{#d91a1a}-1.29\%$
test_tc_second_layer_tensor 26.0616μs 0.9839μs 1.0163 MOps/s 931.8189 KOps/s $\textbf{\color{#35bf28}+9.07\%}$
test_tc_second_layer_nontensor 21.6780μs 0.8385μs 1.1925 MOps/s 1.2187 MOps/s $\color{#d91a1a}-2.14\%$
test_unbind 0.1061s 6.3835ms 156.6537 Ops/s 126.5784 Ops/s $\textbf{\color{#35bf28}+23.76\%}$
test_full_like 12.3694ms 11.8126ms 84.6552 Ops/s 75.1922 Ops/s $\textbf{\color{#35bf28}+12.59\%}$
test_zeros_like 8.6668ms 7.9469ms 125.8351 Ops/s 126.6928 Ops/s $\color{#d91a1a}-0.68\%$
test_ones_like 8.3081ms 7.8801ms 126.9020 Ops/s 125.3878 Ops/s $\color{#35bf28}+1.21\%$
test_clone 9.8927ms 9.4707ms 105.5885 Ops/s 104.4127 Ops/s $\color{#35bf28}+1.13\%$
test_squeeze 55.5530μs 10.8906μs 91.8221 KOps/s 90.3298 KOps/s $\color{#35bf28}+1.65\%$
test_unsqueeze 95.0160μs 51.0622μs 19.5840 KOps/s 18.8186 KOps/s $\color{#35bf28}+4.07\%$
test_split 0.1562ms 98.1719μs 10.1862 KOps/s 9.9531 KOps/s $\color{#35bf28}+2.34\%$
test_permute 0.1421ms 0.1095ms 9.1350 KOps/s 8.7085 KOps/s $\color{#35bf28}+4.90\%$
test_stack 28.3179ms 27.5490ms 36.2990 Ops/s 36.2740 Ops/s $\color{#35bf28}+0.07\%$
test_cat 27.6968ms 27.2959ms 36.6356 Ops/s 36.5803 Ops/s $\color{#35bf28}+0.15\%$

@vmoens vmoens merged commit 959e46e into main Jun 19, 2024
36 of 38 checks passed
@vmoens vmoens deleted the fix-stack-tensorclass branch October 21, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants