Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] flatten and unflatten as decorators #779

Merged
merged 4 commits into from
May 15, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented May 15, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 15, 2024
@vmoens vmoens added the bug Something isn't working label May 15, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}27$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 30.9570μs 15.3811μs 65.0148 KOps/s 57.4516 KOps/s $\textbf{\color{#35bf28}+13.16\%}$
test_plain_set_stack_nested 42.2680μs 15.5981μs 64.1106 KOps/s 56.7278 KOps/s $\textbf{\color{#35bf28}+13.01\%}$
test_plain_set_nested_inplace 60.6620μs 17.7648μs 56.2910 KOps/s 50.7414 KOps/s $\textbf{\color{#35bf28}+10.94\%}$
test_plain_set_stack_nested_inplace 51.2140μs 17.6613μs 56.6211 KOps/s 50.6293 KOps/s $\textbf{\color{#35bf28}+11.83\%}$
test_items 18.1740μs 2.8729μs 348.0790 KOps/s 377.9296 KOps/s $\textbf{\color{#d91a1a}-7.90\%}$
test_items_nested 0.3978ms 0.2634ms 3.7968 KOps/s 3.7877 KOps/s $\color{#35bf28}+0.24\%$
test_items_nested_locked 1.1324ms 0.2665ms 3.7523 KOps/s 3.7440 KOps/s $\color{#35bf28}+0.22\%$
test_items_nested_leaf 0.1444ms 76.8808μs 13.0071 KOps/s 12.9882 KOps/s $\color{#35bf28}+0.15\%$
test_items_stack_nested 0.3770ms 0.2677ms 3.7355 KOps/s 3.7340 KOps/s $\color{#35bf28}+0.04\%$
test_items_stack_nested_leaf 0.1538ms 79.8509μs 12.5233 KOps/s 12.6213 KOps/s $\color{#d91a1a}-0.78\%$
test_items_stack_nested_locked 0.4535ms 0.2699ms 3.7045 KOps/s 3.7154 KOps/s $\color{#d91a1a}-0.29\%$
test_keys 21.1790μs 3.8751μs 258.0579 KOps/s 259.5593 KOps/s $\color{#d91a1a}-0.58\%$
test_keys_nested 0.1896ms 0.1376ms 7.2682 KOps/s 7.2179 KOps/s $\color{#35bf28}+0.70\%$
test_keys_nested_locked 0.6650ms 0.1444ms 6.9270 KOps/s 6.9690 KOps/s $\color{#d91a1a}-0.60\%$
test_keys_nested_leaf 0.1727ms 0.1184ms 8.4478 KOps/s 8.5096 KOps/s $\color{#d91a1a}-0.73\%$
test_keys_stack_nested 0.2058ms 0.1407ms 7.1090 KOps/s 7.2118 KOps/s $\color{#d91a1a}-1.42\%$
test_keys_stack_nested_leaf 0.2345ms 0.1167ms 8.5654 KOps/s 8.4748 KOps/s $\color{#35bf28}+1.07\%$
test_keys_stack_nested_locked 0.2478ms 0.1435ms 6.9710 KOps/s 6.9973 KOps/s $\color{#d91a1a}-0.38\%$
test_values 8.0028μs 1.1526μs 867.6405 KOps/s 832.5121 KOps/s $\color{#35bf28}+4.22\%$
test_values_nested 95.5660μs 50.6947μs 19.7259 KOps/s 19.4313 KOps/s $\color{#35bf28}+1.52\%$
test_values_nested_locked 0.1031ms 51.0620μs 19.5841 KOps/s 19.6854 KOps/s $\color{#d91a1a}-0.51\%$
test_values_nested_leaf 89.4760μs 46.2217μs 21.6349 KOps/s 21.6746 KOps/s $\color{#d91a1a}-0.18\%$
test_values_stack_nested 0.1052ms 52.7954μs 18.9411 KOps/s 19.0693 KOps/s $\color{#d91a1a}-0.67\%$
test_values_stack_nested_leaf 92.6610μs 46.4933μs 21.5085 KOps/s 21.7362 KOps/s $\color{#d91a1a}-1.05\%$
test_values_stack_nested_locked 84.3160μs 52.9319μs 18.8922 KOps/s 19.2546 KOps/s $\color{#d91a1a}-1.88\%$
test_membership 19.0450μs 1.3597μs 735.4559 KOps/s 745.6313 KOps/s $\color{#d91a1a}-1.36\%$
test_membership_nested 30.6460μs 3.4529μs 289.6150 KOps/s 293.4374 KOps/s $\color{#d91a1a}-1.30\%$
test_membership_nested_leaf 22.7820μs 3.3980μs 294.2907 KOps/s 292.9814 KOps/s $\color{#35bf28}+0.45\%$
test_membership_stacked_nested 26.9290μs 3.3837μs 295.5370 KOps/s 292.9658 KOps/s $\color{#35bf28}+0.88\%$
test_membership_stacked_nested_leaf 23.5140μs 3.3858μs 295.3501 KOps/s 291.1571 KOps/s $\color{#35bf28}+1.44\%$
test_membership_nested_last 20.9890μs 4.2744μs 233.9529 KOps/s 237.4590 KOps/s $\color{#d91a1a}-1.48\%$
test_membership_nested_leaf_last 18.4630μs 4.2332μs 236.2275 KOps/s 238.8010 KOps/s $\color{#d91a1a}-1.08\%$
test_membership_stacked_nested_last 20.4580μs 4.1884μs 238.7572 KOps/s 240.5534 KOps/s $\color{#d91a1a}-0.75\%$
test_membership_stacked_nested_leaf_last 46.0950μs 4.2036μs 237.8893 KOps/s 217.0297 KOps/s $\textbf{\color{#35bf28}+9.61\%}$
test_nested_getleaf 40.2040μs 10.7663μs 92.8827 KOps/s 93.1852 KOps/s $\color{#d91a1a}-0.32\%$
test_nested_get 30.5770μs 10.1908μs 98.1276 KOps/s 99.1755 KOps/s $\color{#d91a1a}-1.06\%$
test_stacked_getleaf 50.9440μs 10.6524μs 93.8755 KOps/s 91.3658 KOps/s $\color{#35bf28}+2.75\%$
test_stacked_get 43.4200μs 10.6626μs 93.7861 KOps/s 98.2857 KOps/s $\color{#d91a1a}-4.58\%$
test_nested_getitemleaf 37.1690μs 11.3695μs 87.9546 KOps/s 86.8309 KOps/s $\color{#35bf28}+1.29\%$
test_nested_getitem 50.1930μs 10.4822μs 95.4000 KOps/s 95.8230 KOps/s $\color{#d91a1a}-0.44\%$
test_stacked_getitemleaf 59.0490μs 11.2126μs 89.1856 KOps/s 87.5760 KOps/s $\color{#35bf28}+1.84\%$
test_stacked_getitem 32.0790μs 10.4068μs 96.0908 KOps/s 94.3961 KOps/s $\color{#35bf28}+1.80\%$
test_lock_nested 48.4435ms 0.3952ms 2.5303 KOps/s 2.8627 KOps/s $\textbf{\color{#d91a1a}-11.61\%}$
test_lock_stack_nested 0.4660ms 0.3031ms 3.2988 KOps/s 3.2423 KOps/s $\color{#35bf28}+1.74\%$
test_unlock_nested 0.7218ms 0.3465ms 2.8856 KOps/s 2.5421 KOps/s $\textbf{\color{#35bf28}+13.51\%}$
test_unlock_stack_nested 0.4743ms 0.3145ms 3.1796 KOps/s 3.1513 KOps/s $\color{#35bf28}+0.90\%$
test_flatten_speed 0.2026ms 95.6341μs 10.4565 KOps/s 10.5101 KOps/s $\color{#d91a1a}-0.51\%$
test_unflatten_speed 0.6325ms 0.4087ms 2.4470 KOps/s 2.4748 KOps/s $\color{#d91a1a}-1.12\%$
test_common_ops 1.4983ms 0.6551ms 1.5264 KOps/s 1.3966 KOps/s $\textbf{\color{#35bf28}+9.29\%}$
test_creation 20.0370μs 1.9096μs 523.6732 KOps/s 534.3682 KOps/s $\color{#d91a1a}-2.00\%$
test_creation_empty 38.1510μs 7.9982μs 125.0287 KOps/s 86.2856 KOps/s $\textbf{\color{#35bf28}+44.90\%}$
test_creation_nested_1 46.9070μs 10.6830μs 93.6066 KOps/s 70.9018 KOps/s $\textbf{\color{#35bf28}+32.02\%}$
test_creation_nested_2 38.7020μs 13.9495μs 71.6873 KOps/s 56.9646 KOps/s $\textbf{\color{#35bf28}+25.85\%}$
test_clone 83.7250μs 13.2776μs 75.3147 KOps/s 76.0321 KOps/s $\color{#d91a1a}-0.94\%$
test_getitem[int] 33.1820μs 11.5168μs 86.8295 KOps/s 87.6176 KOps/s $\color{#d91a1a}-0.90\%$
test_getitem[slice_int] 83.0930μs 22.5404μs 44.3649 KOps/s 45.2324 KOps/s $\color{#d91a1a}-1.92\%$
test_getitem[range] 0.1040ms 66.6694μs 14.9994 KOps/s 16.7098 KOps/s $\textbf{\color{#d91a1a}-10.24\%}$
test_getitem[tuple] 66.5030μs 19.2371μs 51.9830 KOps/s 53.7050 KOps/s $\color{#d91a1a}-3.21\%$
test_getitem[list] 0.1027ms 41.0413μs 24.3657 KOps/s 24.4172 KOps/s $\color{#d91a1a}-0.21\%$
test_setitem_dim[int] 61.9850μs 31.4652μs 31.7811 KOps/s 28.6745 KOps/s $\textbf{\color{#35bf28}+10.83\%}$
test_setitem_dim[slice_int] 0.1391ms 58.5952μs 17.0662 KOps/s 16.4319 KOps/s $\color{#35bf28}+3.86\%$
test_setitem_dim[range] 0.1482ms 79.2315μs 12.6212 KOps/s 11.6425 KOps/s $\textbf{\color{#35bf28}+8.41\%}$
test_setitem_dim[tuple] 92.4310μs 45.4091μs 22.0220 KOps/s 19.4757 KOps/s $\textbf{\color{#35bf28}+13.07\%}$
test_setitem 68.3260μs 18.3846μs 54.3935 KOps/s 49.6682 KOps/s $\textbf{\color{#35bf28}+9.51\%}$
test_set 67.6850μs 17.8785μs 55.9330 KOps/s 50.8254 KOps/s $\textbf{\color{#35bf28}+10.05\%}$
test_set_shared 1.5249ms 0.1399ms 7.1455 KOps/s 7.2627 KOps/s $\color{#d91a1a}-1.61\%$
test_update 91.7700μs 18.8367μs 53.0877 KOps/s 44.4686 KOps/s $\textbf{\color{#35bf28}+19.38\%}$
test_update_nested 0.1299ms 26.8537μs 37.2388 KOps/s 32.8016 KOps/s $\textbf{\color{#35bf28}+13.53\%}$
test_update__nested 0.1050ms 25.3379μs 39.4665 KOps/s 41.2963 KOps/s $\color{#d91a1a}-4.43\%$
test_set_nested 0.1363ms 21.8848μs 45.6939 KOps/s 46.4137 KOps/s $\color{#d91a1a}-1.55\%$
test_set_nested_new 69.6690μs 23.3962μs 42.7419 KOps/s 38.9871 KOps/s $\textbf{\color{#35bf28}+9.63\%}$
test_select 98.0310μs 38.6283μs 25.8878 KOps/s 24.4490 KOps/s $\textbf{\color{#35bf28}+5.88\%}$
test_select_nested 0.1132ms 60.4208μs 16.5506 KOps/s 16.6087 KOps/s $\color{#d91a1a}-0.35\%$
test_exclude_nested 0.2668ms 0.1232ms 8.1185 KOps/s 8.4307 KOps/s $\color{#d91a1a}-3.70\%$
test_empty[True] 0.9348ms 0.3939ms 2.5390 KOps/s 2.5199 KOps/s $\color{#35bf28}+0.76\%$
test_empty[False] 6.9490μs 1.0909μs 916.7021 KOps/s 925.3064 KOps/s $\color{#d91a1a}-0.93\%$
test_unbind_speed 0.4067ms 0.2544ms 3.9314 KOps/s 3.9073 KOps/s $\color{#35bf28}+0.62\%$
test_unbind_speed_stack0 0.7161ms 0.2486ms 4.0233 KOps/s 3.8322 KOps/s $\color{#35bf28}+4.99\%$
test_unbind_speed_stack1 66.9616ms 0.7531ms 1.3278 KOps/s 1.2966 KOps/s $\color{#35bf28}+2.40\%$
test_split 1.7072ms 1.5106ms 661.9717 Ops/s 617.5772 Ops/s $\textbf{\color{#35bf28}+7.19\%}$
test_chunk 61.5259ms 1.5999ms 625.0367 Ops/s 628.7018 Ops/s $\color{#d91a1a}-0.58\%$
test_creation[device0] 3.6889ms 85.8560μs 11.6474 KOps/s 11.8953 KOps/s $\color{#d91a1a}-2.08\%$
test_creation_from_tensor 0.2064ms 85.4966μs 11.6964 KOps/s 11.5758 KOps/s $\color{#35bf28}+1.04\%$
test_add_one[memmap_tensor0] 31.0080μs 5.6208μs 177.9096 KOps/s 182.3025 KOps/s $\color{#d91a1a}-2.41\%$
test_contiguous[memmap_tensor0] 10.7500μs 0.6408μs 1.5605 MOps/s 1.5358 MOps/s $\color{#35bf28}+1.61\%$
test_stack[memmap_tensor0] 20.9790μs 3.6236μs 275.9686 KOps/s 284.5402 KOps/s $\color{#d91a1a}-3.01\%$
test_memmaptd_index 1.0871ms 0.2541ms 3.9356 KOps/s 3.9816 KOps/s $\color{#d91a1a}-1.16\%$
test_memmaptd_index_astensor 0.6599ms 0.3276ms 3.0527 KOps/s 3.0862 KOps/s $\color{#d91a1a}-1.08\%$
test_memmaptd_index_op 63.1779ms 0.6111ms 1.6364 KOps/s 1.6030 KOps/s $\color{#35bf28}+2.08\%$
test_serialize_model 0.1632s 0.1083s 9.2345 Ops/s 8.8453 Ops/s $\color{#35bf28}+4.40\%$
test_serialize_model_pickle 0.4506s 0.3780s 2.6455 Ops/s 2.6070 Ops/s $\color{#35bf28}+1.47\%$
test_serialize_weights 0.1054s 97.4419ms 10.2625 Ops/s 9.1239 Ops/s $\textbf{\color{#35bf28}+12.48\%}$
test_serialize_weights_returnearly 0.1286s 0.1224s 8.1670 Ops/s 7.2053 Ops/s $\textbf{\color{#35bf28}+13.35\%}$
test_serialize_weights_pickle 0.8491s 0.5024s 1.9903 Ops/s 1.5549 Ops/s $\textbf{\color{#35bf28}+28.00\%}$
test_serialize_weights_filesystem 0.1013s 91.4444ms 10.9356 Ops/s 10.8195 Ops/s $\color{#35bf28}+1.07\%$
test_serialize_model_filesystem 0.1540s 97.5286ms 10.2534 Ops/s 10.2914 Ops/s $\color{#d91a1a}-0.37\%$
test_reshape_pytree 79.5670μs 25.1511μs 39.7597 KOps/s 40.2495 KOps/s $\color{#d91a1a}-1.22\%$
test_reshape_td 68.8970μs 32.7039μs 30.5774 KOps/s 30.7102 KOps/s $\color{#d91a1a}-0.43\%$
test_view_pytree 68.3990μs 25.2790μs 39.5585 KOps/s 37.6809 KOps/s $\color{#35bf28}+4.98\%$
test_view_td 98.4800μs 36.0494μs 27.7397 KOps/s 27.6060 KOps/s $\color{#35bf28}+0.48\%$
test_unbind_pytree 67.8050μs 28.8707μs 34.6372 KOps/s 34.6168 KOps/s $\color{#35bf28}+0.06\%$
test_unbind_td 0.4390ms 37.5385μs 26.6393 KOps/s 26.7523 KOps/s $\color{#d91a1a}-0.42\%$
test_split_pytree 87.9830μs 29.2001μs 34.2464 KOps/s 34.4668 KOps/s $\color{#d91a1a}-0.64\%$
test_split_td 0.5334ms 40.9482μs 24.4211 KOps/s 24.7440 KOps/s $\color{#d91a1a}-1.30\%$
test_add_pytree 72.4940μs 34.8353μs 28.7065 KOps/s 29.0178 KOps/s $\color{#d91a1a}-1.07\%$
test_add_td 0.1380ms 51.9610μs 19.2452 KOps/s 18.3456 KOps/s $\color{#35bf28}+4.90\%$
test_distributed 0.1762ms 99.5641μs 10.0438 KOps/s 9.8090 KOps/s $\color{#35bf28}+2.39\%$
test_tdmodule 28.2230μs 15.5143μs 64.4565 KOps/s 55.8425 KOps/s $\textbf{\color{#35bf28}+15.43\%}$
test_tdmodule_dispatch 59.0290μs 30.2408μs 33.0679 KOps/s 28.1602 KOps/s $\textbf{\color{#35bf28}+17.43\%}$
test_tdseq 34.6640μs 17.8207μs 56.1144 KOps/s 46.9878 KOps/s $\textbf{\color{#35bf28}+19.42\%}$
test_tdseq_dispatch 58.8290μs 35.3490μs 28.2894 KOps/s 23.7746 KOps/s $\textbf{\color{#35bf28}+18.99\%}$
test_instantiation_functorch 1.5387ms 1.3166ms 759.5402 Ops/s 766.6765 Ops/s $\color{#d91a1a}-0.93\%$
test_instantiation_td 1.6381ms 1.0151ms 985.1207 Ops/s 989.7415 Ops/s $\color{#d91a1a}-0.47\%$
test_exec_functorch 0.2904ms 0.1603ms 6.2389 KOps/s 6.1846 KOps/s $\color{#35bf28}+0.88\%$
test_exec_functional_call 0.2753ms 0.1502ms 6.6567 KOps/s 6.6327 KOps/s $\color{#35bf28}+0.36\%$
test_exec_td 0.2650ms 0.1452ms 6.8869 KOps/s 6.6988 KOps/s $\color{#35bf28}+2.81\%$
test_exec_td_decorator 0.6987ms 0.2200ms 4.5446 KOps/s 4.4738 KOps/s $\color{#35bf28}+1.58\%$
test_vmap_mlp_speed[True-True] 0.7042ms 0.4906ms 2.0383 KOps/s 2.0107 KOps/s $\color{#35bf28}+1.37\%$
test_vmap_mlp_speed[True-False] 0.7703ms 0.4703ms 2.1261 KOps/s 2.0280 KOps/s $\color{#35bf28}+4.84\%$
test_vmap_mlp_speed[False-True] 0.6317ms 0.3896ms 2.5665 KOps/s 2.4726 KOps/s $\color{#35bf28}+3.80\%$
test_vmap_mlp_speed[False-False] 0.4830ms 0.3904ms 2.5613 KOps/s 2.4776 KOps/s $\color{#35bf28}+3.38\%$
test_vmap_mlp_speed_decorator[True-True] 1.0416ms 0.5421ms 1.8447 KOps/s 1.7691 KOps/s $\color{#35bf28}+4.27\%$
test_vmap_mlp_speed_decorator[True-False] 0.8325ms 0.5418ms 1.8458 KOps/s 1.7629 KOps/s $\color{#35bf28}+4.70\%$
test_vmap_mlp_speed_decorator[False-True] 0.9150ms 0.4563ms 2.1917 KOps/s 2.1515 KOps/s $\color{#35bf28}+1.87\%$
test_vmap_mlp_speed_decorator[False-False] 0.9195ms 0.4687ms 2.1333 KOps/s 2.1610 KOps/s $\color{#d91a1a}-1.28\%$
test_to_module_speed[True] 2.2731ms 1.6839ms 593.8717 Ops/s 588.0856 Ops/s $\color{#35bf28}+0.98\%$
test_to_module_speed[False] 2.1415ms 1.6470ms 607.1516 Ops/s 602.6152 Ops/s $\color{#35bf28}+0.75\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 135. Improved: $\large\color{#35bf28}16$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 69.3410μs 13.7360μs 72.8014 KOps/s 71.1641 KOps/s $\color{#35bf28}+2.30\%$
test_plain_set_stack_nested 38.9910μs 13.8916μs 71.9860 KOps/s 70.5678 KOps/s $\color{#35bf28}+2.01\%$
test_plain_set_nested_inplace 36.2200μs 15.0786μs 66.3189 KOps/s 65.6072 KOps/s $\color{#35bf28}+1.08\%$
test_plain_set_stack_nested_inplace 36.4010μs 14.9789μs 66.7605 KOps/s 65.0642 KOps/s $\color{#35bf28}+2.61\%$
test_items 17.8300μs 4.5885μs 217.9380 KOps/s 216.0232 KOps/s $\color{#35bf28}+0.89\%$
test_items_nested 0.3818ms 0.3351ms 2.9839 KOps/s 2.9968 KOps/s $\color{#d91a1a}-0.43\%$
test_items_nested_locked 0.3953ms 0.3429ms 2.9160 KOps/s 2.8645 KOps/s $\color{#35bf28}+1.80\%$
test_items_nested_leaf 0.1118ms 82.8816μs 12.0654 KOps/s 12.0340 KOps/s $\color{#35bf28}+0.26\%$
test_items_stack_nested 0.3778ms 0.3365ms 2.9720 KOps/s 2.9896 KOps/s $\color{#d91a1a}-0.59\%$
test_items_stack_nested_leaf 0.1135ms 83.4401μs 11.9846 KOps/s 11.8131 KOps/s $\color{#35bf28}+1.45\%$
test_items_stack_nested_locked 0.3950ms 0.3401ms 2.9407 KOps/s 2.9690 KOps/s $\color{#d91a1a}-0.95\%$
test_keys 30.6800μs 4.6901μs 213.2152 KOps/s 211.6754 KOps/s $\color{#35bf28}+0.73\%$
test_keys_nested 0.1207ms 68.6514μs 14.5663 KOps/s 14.4237 KOps/s $\color{#35bf28}+0.99\%$
test_keys_nested_locked 0.7422ms 73.9107μs 13.5298 KOps/s 13.2601 KOps/s $\color{#35bf28}+2.03\%$
test_keys_nested_leaf 83.0920μs 59.4081μs 16.8327 KOps/s 16.6747 KOps/s $\color{#35bf28}+0.95\%$
test_keys_stack_nested 94.5620μs 68.6556μs 14.5655 KOps/s 14.4234 KOps/s $\color{#35bf28}+0.98\%$
test_keys_stack_nested_leaf 83.0020μs 59.5883μs 16.7818 KOps/s 16.6481 KOps/s $\color{#35bf28}+0.80\%$
test_keys_stack_nested_locked 93.6920μs 73.1569μs 13.6692 KOps/s 13.4178 KOps/s $\color{#35bf28}+1.87\%$
test_values 7.8870μs 1.8137μs 551.3724 KOps/s 552.4252 KOps/s $\color{#d91a1a}-0.19\%$
test_values_nested 59.6220μs 36.1756μs 27.6429 KOps/s 27.7432 KOps/s $\color{#d91a1a}-0.36\%$
test_values_nested_locked 57.6810μs 38.6396μs 25.8802 KOps/s 26.3461 KOps/s $\color{#d91a1a}-1.77\%$
test_values_nested_leaf 53.5210μs 32.2708μs 30.9877 KOps/s 31.2604 KOps/s $\color{#d91a1a}-0.87\%$
test_values_stack_nested 62.5910μs 37.3616μs 26.7654 KOps/s 27.3705 KOps/s $\color{#d91a1a}-2.21\%$
test_values_stack_nested_leaf 53.5800μs 33.3528μs 29.9825 KOps/s 30.8255 KOps/s $\color{#d91a1a}-2.73\%$
test_values_stack_nested_locked 61.0010μs 39.4059μs 25.3769 KOps/s 26.0867 KOps/s $\color{#d91a1a}-2.72\%$
test_membership 15.0010μs 0.8139μs 1.2286 MOps/s 1.3927 MOps/s $\textbf{\color{#d91a1a}-11.78\%}$
test_membership_nested 29.5800μs 2.5093μs 398.5193 KOps/s 397.2967 KOps/s $\color{#35bf28}+0.31\%$
test_membership_nested_leaf 26.0000μs 2.5070μs 398.8775 KOps/s 400.8458 KOps/s $\color{#d91a1a}-0.49\%$
test_membership_stacked_nested 19.3510μs 2.5627μs 390.2101 KOps/s 389.9820 KOps/s $\color{#35bf28}+0.06\%$
test_membership_stacked_nested_leaf 32.8410μs 2.5320μs 394.9413 KOps/s 402.2846 KOps/s $\color{#d91a1a}-1.83\%$
test_membership_nested_last 22.2000μs 3.0614μs 326.6457 KOps/s 332.5967 KOps/s $\color{#d91a1a}-1.79\%$
test_membership_nested_leaf_last 16.3700μs 3.0626μs 326.5152 KOps/s 334.3480 KOps/s $\color{#d91a1a}-2.34\%$
test_membership_stacked_nested_last 20.9910μs 3.0425μs 328.6807 KOps/s 289.1748 KOps/s $\textbf{\color{#35bf28}+13.66\%}$
test_membership_stacked_nested_leaf_last 22.0500μs 3.0546μs 327.3713 KOps/s 289.4160 KOps/s $\textbf{\color{#35bf28}+13.11\%}$
test_nested_getleaf 25.3810μs 8.5521μs 116.9301 KOps/s 117.3985 KOps/s $\color{#d91a1a}-0.40\%$
test_nested_get 34.0410μs 8.0168μs 124.7374 KOps/s 125.2460 KOps/s $\color{#d91a1a}-0.41\%$
test_stacked_getleaf 28.0100μs 8.5427μs 117.0593 KOps/s 117.1886 KOps/s $\color{#d91a1a}-0.11\%$
test_stacked_get 37.1000μs 8.0166μs 124.7419 KOps/s 125.1696 KOps/s $\color{#d91a1a}-0.34\%$
test_nested_getitemleaf 18.8610μs 8.7262μs 114.5972 KOps/s 115.3924 KOps/s $\color{#d91a1a}-0.69\%$
test_nested_getitem 32.2610μs 8.1685μs 122.4212 KOps/s 122.6516 KOps/s $\color{#d91a1a}-0.19\%$
test_stacked_getitemleaf 34.7900μs 8.6963μs 114.9909 KOps/s 115.2727 KOps/s $\color{#d91a1a}-0.24\%$
test_stacked_getitem 23.1300μs 8.1505μs 122.6923 KOps/s 123.0950 KOps/s $\color{#d91a1a}-0.33\%$
test_lock_nested 56.3918ms 0.4206ms 2.3776 KOps/s 2.4191 KOps/s $\color{#d91a1a}-1.72\%$
test_lock_stack_nested 0.3594ms 0.3112ms 3.2131 KOps/s 3.2096 KOps/s $\color{#35bf28}+0.11\%$
test_unlock_nested 0.7297ms 0.3580ms 2.7935 KOps/s 2.7753 KOps/s $\color{#35bf28}+0.66\%$
test_unlock_stack_nested 0.3609ms 0.3198ms 3.1271 KOps/s 3.1238 KOps/s $\color{#35bf28}+0.10\%$
test_flatten_speed 0.3227ms 0.1024ms 9.7620 KOps/s 9.8644 KOps/s $\color{#d91a1a}-1.04\%$
test_unflatten_speed 0.3500ms 0.2965ms 3.3729 KOps/s 3.3631 KOps/s $\color{#35bf28}+0.29\%$
test_common_ops 1.0606ms 0.5982ms 1.6718 KOps/s 1.6038 KOps/s $\color{#35bf28}+4.24\%$
test_creation 26.7710μs 1.6445μs 608.1048 KOps/s 615.1557 KOps/s $\color{#d91a1a}-1.15\%$
test_creation_empty 23.4400μs 9.6696μs 103.4173 KOps/s 98.5221 KOps/s $\color{#35bf28}+4.97\%$
test_creation_nested_1 35.2500μs 11.4081μs 87.6567 KOps/s 83.9234 KOps/s $\color{#35bf28}+4.45\%$
test_creation_nested_2 31.6110μs 13.7753μs 72.5938 KOps/s 70.8216 KOps/s $\color{#35bf28}+2.50\%$
test_clone 77.0710μs 11.6074μs 86.1516 KOps/s 79.6538 KOps/s $\textbf{\color{#35bf28}+8.16\%}$
test_getitem[int] 24.9900μs 10.9928μs 90.9683 KOps/s 89.8978 KOps/s $\color{#35bf28}+1.19\%$
test_getitem[slice_int] 50.2310μs 21.3802μs 46.7723 KOps/s 47.0194 KOps/s $\color{#d91a1a}-0.53\%$
test_getitem[range] 73.1220μs 54.0448μs 18.5032 KOps/s 19.3600 KOps/s $\color{#d91a1a}-4.43\%$
test_getitem[tuple] 47.6110μs 19.6605μs 50.8633 KOps/s 50.4089 KOps/s $\color{#35bf28}+0.90\%$
test_getitem[list] 0.1118ms 34.8270μs 28.7134 KOps/s 27.2643 KOps/s $\textbf{\color{#35bf28}+5.31\%}$
test_setitem_dim[int] 48.7300μs 29.7078μs 33.6612 KOps/s 31.1659 KOps/s $\textbf{\color{#35bf28}+8.01\%}$
test_setitem_dim[slice_int] 72.3710μs 50.4359μs 19.8271 KOps/s 18.9855 KOps/s $\color{#35bf28}+4.43\%$
test_setitem_dim[range] 93.1420μs 68.7296μs 14.5498 KOps/s 13.8793 KOps/s $\color{#35bf28}+4.83\%$
test_setitem_dim[tuple] 73.4410μs 44.2064μs 22.6212 KOps/s 20.9376 KOps/s $\textbf{\color{#35bf28}+8.04\%}$
test_setitem 51.0910μs 16.6595μs 60.0258 KOps/s 54.1396 KOps/s $\textbf{\color{#35bf28}+10.87\%}$
test_set 48.2410μs 16.1376μs 61.9671 KOps/s 56.4568 KOps/s $\textbf{\color{#35bf28}+9.76\%}$
test_set_shared 71.5788ms 0.1136ms 8.7990 KOps/s 9.9625 KOps/s $\textbf{\color{#d91a1a}-11.68\%}$
test_update 84.9120μs 19.2573μs 51.9283 KOps/s 48.2712 KOps/s $\textbf{\color{#35bf28}+7.58\%}$
test_update_nested 60.3010μs 24.1134μs 41.4707 KOps/s 38.6097 KOps/s $\textbf{\color{#35bf28}+7.41\%}$
test_update__nested 54.3110μs 22.3116μs 44.8197 KOps/s 42.8198 KOps/s $\color{#35bf28}+4.67\%$
test_set_nested 60.3710μs 17.3686μs 57.5750 KOps/s 53.2552 KOps/s $\textbf{\color{#35bf28}+8.11\%}$
test_set_nested_new 60.4910μs 19.9200μs 50.2008 KOps/s 46.6573 KOps/s $\textbf{\color{#35bf28}+7.59\%}$
test_select 65.3110μs 34.2880μs 29.1647 KOps/s 29.4697 KOps/s $\color{#d91a1a}-1.03\%$
test_select_nested 0.9110ms 55.7505μs 17.9371 KOps/s 17.6723 KOps/s $\color{#35bf28}+1.50\%$
test_exclude_nested 0.1460ms 0.1107ms 9.0371 KOps/s 9.0918 KOps/s $\color{#d91a1a}-0.60\%$
test_empty[True] 0.4078ms 0.3438ms 2.9083 KOps/s 2.8930 KOps/s $\color{#35bf28}+0.53\%$
test_empty[False] 2.6980μs 0.8871μs 1.1273 MOps/s 1.1195 MOps/s $\color{#35bf28}+0.70\%$
test_to 0.1052ms 77.6402μs 12.8799 KOps/s 12.5376 KOps/s $\color{#35bf28}+2.73\%$
test_to_nonblocking 96.1110μs 63.5775μs 15.7288 KOps/s 15.6371 KOps/s $\color{#35bf28}+0.59\%$
test_unbind_speed 0.3162ms 0.2751ms 3.6351 KOps/s 3.6204 KOps/s $\color{#35bf28}+0.41\%$
test_unbind_speed_stack0 0.3239ms 0.2764ms 3.6176 KOps/s 3.6588 KOps/s $\color{#d91a1a}-1.13\%$
test_unbind_speed_stack1 72.5263ms 0.8213ms 1.2177 KOps/s 1.2187 KOps/s $\color{#d91a1a}-0.09\%$
test_split 1.5912ms 1.5322ms 652.6582 Ops/s 648.5266 Ops/s $\color{#35bf28}+0.64\%$
test_chunk 72.8209ms 1.6455ms 607.7167 Ops/s 603.2099 Ops/s $\color{#35bf28}+0.75\%$
test_creation[device0] 0.1309ms 56.9847μs 17.5486 KOps/s 17.1142 KOps/s $\color{#35bf28}+2.54\%$
test_creation_from_tensor 0.1338ms 54.8992μs 18.2152 KOps/s 18.3789 KOps/s $\color{#d91a1a}-0.89\%$
test_add_one[memmap_tensor0] 0.1015ms 7.0279μs 142.2909 KOps/s 133.3631 KOps/s $\textbf{\color{#35bf28}+6.69\%}$
test_contiguous[memmap_tensor0] 27.5210μs 0.6908μs 1.4476 MOps/s 1.4839 MOps/s $\color{#d91a1a}-2.44\%$
test_stack[memmap_tensor0] 27.2400μs 4.8023μs 208.2338 KOps/s 204.2887 KOps/s $\color{#35bf28}+1.93\%$
test_memmaptd_index 1.0727ms 0.2871ms 3.4826 KOps/s 3.4785 KOps/s $\color{#35bf28}+0.12\%$
test_memmaptd_index_astensor 0.6184ms 0.3630ms 2.7548 KOps/s 2.7841 KOps/s $\color{#d91a1a}-1.05\%$
test_memmaptd_index_op 1.1047ms 0.6743ms 1.4829 KOps/s 1.4059 KOps/s $\textbf{\color{#35bf28}+5.48\%}$
test_serialize_model 0.1834s 0.1090s 9.1736 Ops/s 8.9260 Ops/s $\color{#35bf28}+2.77\%$
test_serialize_model_pickle 1.3482s 1.2362s 0.8089 Ops/s 0.8087 Ops/s $\color{#35bf28}+0.03\%$
test_serialize_weights 0.1765s 0.1070s 9.3486 Ops/s 9.0407 Ops/s $\color{#35bf28}+3.41\%$
test_serialize_weights_returnearly 0.2903s 97.8926ms 10.2153 Ops/s 11.3954 Ops/s $\textbf{\color{#d91a1a}-10.36\%}$
test_serialize_weights_pickle 1.3564s 1.2490s 0.8006 Ops/s 0.8036 Ops/s $\color{#d91a1a}-0.36\%$
test_reshape_pytree 46.9610μs 23.8653μs 41.9019 KOps/s 42.7201 KOps/s $\color{#d91a1a}-1.92\%$
test_reshape_td 56.9710μs 31.3626μs 31.8851 KOps/s 31.5171 KOps/s $\color{#35bf28}+1.17\%$
test_view_pytree 48.3310μs 23.2931μs 42.9312 KOps/s 43.2812 KOps/s $\color{#d91a1a}-0.81\%$
test_view_td 73.9510μs 35.0608μs 28.5219 KOps/s 28.2057 KOps/s $\color{#35bf28}+1.12\%$
test_unbind_pytree 57.0310μs 29.8459μs 33.5054 KOps/s 33.7266 KOps/s $\color{#d91a1a}-0.66\%$
test_unbind_td 0.4461ms 41.7675μs 23.9420 KOps/s 23.4216 KOps/s $\color{#35bf28}+2.22\%$
test_split_pytree 52.8410μs 32.5572μs 30.7152 KOps/s 31.2208 KOps/s $\color{#d91a1a}-1.62\%$
test_split_td 0.1048ms 38.7007μs 25.8394 KOps/s 25.2855 KOps/s $\color{#35bf28}+2.19\%$
test_add_pytree 60.3310μs 35.8298μs 27.9097 KOps/s 27.5070 KOps/s $\color{#35bf28}+1.46\%$
test_add_td 76.4410μs 50.5969μs 19.7641 KOps/s 16.9509 KOps/s $\textbf{\color{#35bf28}+16.60\%}$
test_distributed 2.9941ms 73.9037μs 13.5311 KOps/s 14.8913 KOps/s $\textbf{\color{#d91a1a}-9.13\%}$
test_tdmodule 94.0620μs 15.6037μs 64.0874 KOps/s 62.8675 KOps/s $\color{#35bf28}+1.94\%$
test_tdmodule_dispatch 46.0710μs 30.3758μs 32.9209 KOps/s 32.5898 KOps/s $\color{#35bf28}+1.02\%$
test_tdseq 32.0210μs 17.2781μs 57.8768 KOps/s 56.1742 KOps/s $\color{#35bf28}+3.03\%$
test_tdseq_dispatch 55.2410μs 34.3700μs 29.0952 KOps/s 28.4330 KOps/s $\color{#35bf28}+2.33\%$
test_instantiation_functorch 1.6668ms 1.5544ms 643.3497 Ops/s 642.0783 Ops/s $\color{#35bf28}+0.20\%$
test_instantiation_td 1.5338ms 1.0769ms 928.6083 Ops/s 842.5504 Ops/s $\textbf{\color{#35bf28}+10.21\%}$
test_exec_functorch 0.2608ms 0.1542ms 6.4858 KOps/s 6.3397 KOps/s $\color{#35bf28}+2.30\%$
test_exec_functional_call 0.1867ms 0.1440ms 6.9463 KOps/s 6.8379 KOps/s $\color{#35bf28}+1.59\%$
test_exec_td 0.1772ms 0.1431ms 6.9898 KOps/s 6.7137 KOps/s $\color{#35bf28}+4.11\%$
test_exec_td_decorator 0.5864ms 0.2188ms 4.5706 KOps/s 4.5306 KOps/s $\color{#35bf28}+0.88\%$
test_vmap_mlp_speed[True-True] 0.6822ms 0.6207ms 1.6110 KOps/s 1.5968 KOps/s $\color{#35bf28}+0.89\%$
test_vmap_mlp_speed[True-False] 0.6733ms 0.6156ms 1.6245 KOps/s 1.6026 KOps/s $\color{#35bf28}+1.37\%$
test_vmap_mlp_speed[False-True] 0.6149ms 0.5459ms 1.8319 KOps/s 1.8187 KOps/s $\color{#35bf28}+0.73\%$
test_vmap_mlp_speed[False-False] 0.5994ms 0.5459ms 1.8318 KOps/s 1.8189 KOps/s $\color{#35bf28}+0.71\%$
test_vmap_mlp_speed_decorator[True-True] 1.0634ms 0.6857ms 1.4585 KOps/s 1.4527 KOps/s $\color{#35bf28}+0.40\%$
test_vmap_mlp_speed_decorator[True-False] 0.8108ms 0.6815ms 1.4673 KOps/s 1.4368 KOps/s $\color{#35bf28}+2.12\%$
test_vmap_mlp_speed_decorator[False-True] 0.6714ms 0.6034ms 1.6573 KOps/s 1.6124 KOps/s $\color{#35bf28}+2.78\%$
test_vmap_mlp_speed_decorator[False-False] 0.7565ms 0.6054ms 1.6518 KOps/s 1.6446 KOps/s $\color{#35bf28}+0.44\%$
test_vmap_transformer_speed[True-True] 8.4054ms 8.2462ms 121.2680 Ops/s 121.5729 Ops/s $\color{#d91a1a}-0.25\%$
test_vmap_transformer_speed[True-False] 9.1068ms 8.2489ms 121.2285 Ops/s 122.0468 Ops/s $\color{#d91a1a}-0.67\%$
test_vmap_transformer_speed[False-True] 8.2829ms 8.1208ms 123.1400 Ops/s 122.6305 Ops/s $\color{#35bf28}+0.42\%$
test_vmap_transformer_speed[False-False] 8.2185ms 8.1177ms 123.1871 Ops/s 117.9371 Ops/s $\color{#35bf28}+4.45\%$
test_vmap_transformer_speed_decorator[True-True] 20.0029ms 19.8251ms 50.4410 Ops/s 48.7553 Ops/s $\color{#35bf28}+3.46\%$
test_vmap_transformer_speed_decorator[True-False] 0.1114s 21.6531ms 46.1827 Ops/s 48.7441 Ops/s $\textbf{\color{#d91a1a}-5.25\%}$
test_vmap_transformer_speed_decorator[False-True] 19.9123ms 19.7257ms 50.6952 Ops/s 50.2007 Ops/s $\color{#35bf28}+0.99\%$
test_vmap_transformer_speed_decorator[False-False] 19.9858ms 19.7176ms 50.7161 Ops/s 50.3378 Ops/s $\color{#35bf28}+0.75\%$
test_to_module_speed[True] 1.6908ms 1.5648ms 639.0459 Ops/s 635.3151 Ops/s $\color{#35bf28}+0.59\%$
test_to_module_speed[False] 1.6619ms 1.5544ms 643.3175 Ops/s 646.0488 Ops/s $\color{#d91a1a}-0.42\%$

@vmoens vmoens merged commit 1df73e7 into main May 15, 2024
37 of 38 checks passed
@vmoens vmoens deleted the flatten-unflatten-decorators branch May 15, 2024 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants