Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Faster params and buffer registration in TensorDictParams #569

Merged
merged 2 commits into from
Nov 23, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Nov 23, 2023

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 23, 2023
@vmoens vmoens marked this pull request as ready for review November 23, 2023 10:11
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 113. Improved: $\large\color{#35bf28}1$. Worsened: $\large\color{#d91a1a}16$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 50.1130μs 15.8994μs 62.8955 KOps/s 64.7885 KOps/s $\color{#d91a1a}-2.92\%$
test_plain_set_stack_nested 0.2062ms 0.1470ms 6.8048 KOps/s 7.0605 KOps/s $\color{#d91a1a}-3.62\%$
test_plain_set_nested_inplace 42.8200μs 18.9600μs 52.7427 KOps/s 54.3493 KOps/s $\color{#d91a1a}-2.96\%$
test_plain_set_stack_nested_inplace 0.2586ms 0.1758ms 5.6886 KOps/s 5.9193 KOps/s $\color{#d91a1a}-3.90\%$
test_items 23.8350μs 2.3424μs 426.9195 KOps/s 414.5356 KOps/s $\color{#35bf28}+2.99\%$
test_items_nested 0.4182ms 0.2671ms 3.7432 KOps/s 3.8469 KOps/s $\color{#d91a1a}-2.69\%$
test_items_nested_locked 0.9954ms 0.2697ms 3.7074 KOps/s 3.8182 KOps/s $\color{#d91a1a}-2.90\%$
test_items_nested_leaf 0.5435ms 0.1646ms 6.0735 KOps/s 6.2890 KOps/s $\color{#d91a1a}-3.43\%$
test_items_stack_nested 1.5859ms 1.4785ms 676.3412 Ops/s 702.4100 Ops/s $\color{#d91a1a}-3.71\%$
test_items_stack_nested_leaf 1.4354ms 1.3469ms 742.4647 Ops/s 769.4412 Ops/s $\color{#d91a1a}-3.51\%$
test_items_stack_nested_locked 0.8632ms 0.7644ms 1.3081 KOps/s 1.3412 KOps/s $\color{#d91a1a}-2.46\%$
test_keys 26.4890μs 3.9648μs 252.2187 KOps/s 267.7743 KOps/s $\textbf{\color{#d91a1a}-5.81\%}$
test_keys_nested 1.3604ms 0.1416ms 7.0619 KOps/s 7.0025 KOps/s $\color{#35bf28}+0.85\%$
test_keys_nested_locked 0.3255ms 0.1400ms 7.1431 KOps/s 7.2131 KOps/s $\color{#d91a1a}-0.97\%$
test_keys_nested_leaf 0.2969ms 0.1406ms 7.1108 KOps/s 7.0771 KOps/s $\color{#35bf28}+0.48\%$
test_keys_stack_nested 1.8425ms 1.4086ms 709.9382 Ops/s 727.3636 Ops/s $\color{#d91a1a}-2.40\%$
test_keys_stack_nested_leaf 1.6858ms 1.4035ms 712.4884 Ops/s 740.0463 Ops/s $\color{#d91a1a}-3.72\%$
test_keys_stack_nested_locked 0.7542ms 0.6711ms 1.4901 KOps/s 1.5573 KOps/s $\color{#d91a1a}-4.32\%$
test_values 6.8605μs 1.1720μs 853.2122 KOps/s 862.3639 KOps/s $\color{#d91a1a}-1.06\%$
test_values_nested 0.1165ms 49.6535μs 20.1396 KOps/s 20.2775 KOps/s $\color{#d91a1a}-0.68\%$
test_values_nested_locked 78.5360μs 49.5049μs 20.2000 KOps/s 20.1332 KOps/s $\color{#35bf28}+0.33\%$
test_values_nested_leaf 62.2960μs 44.5641μs 22.4396 KOps/s 22.5771 KOps/s $\color{#d91a1a}-0.61\%$
test_values_stack_nested 1.5657ms 1.2142ms 823.5757 Ops/s 863.8438 Ops/s $\color{#d91a1a}-4.66\%$
test_values_stack_nested_leaf 1.2936ms 1.2000ms 833.3159 Ops/s 884.7348 Ops/s $\textbf{\color{#d91a1a}-5.81\%}$
test_values_stack_nested_locked 0.8244ms 0.5128ms 1.9502 KOps/s 1.9844 KOps/s $\color{#d91a1a}-1.72\%$
test_membership 11.9530μs 1.3576μs 736.6166 KOps/s 748.7768 KOps/s $\color{#d91a1a}-1.62\%$
test_membership_nested 26.5600μs 2.7992μs 357.2402 KOps/s 358.6144 KOps/s $\color{#d91a1a}-0.38\%$
test_membership_nested_leaf 28.1520μs 2.8096μs 355.9271 KOps/s 374.9869 KOps/s $\textbf{\color{#d91a1a}-5.08\%}$
test_membership_stacked_nested 51.4760μs 11.8029μs 84.7249 KOps/s 87.8429 KOps/s $\color{#d91a1a}-3.55\%$
test_membership_stacked_nested_leaf 34.9150μs 11.7732μs 84.9390 KOps/s 88.7420 KOps/s $\color{#d91a1a}-4.29\%$
test_membership_nested_last 33.5720μs 5.8564μs 170.7547 KOps/s 176.3483 KOps/s $\color{#d91a1a}-3.17\%$
test_membership_nested_leaf_last 19.4160μs 5.9428μs 168.2698 KOps/s 176.4126 KOps/s $\color{#d91a1a}-4.62\%$
test_membership_stacked_nested_last 0.3385ms 0.1698ms 5.8878 KOps/s 6.2331 KOps/s $\textbf{\color{#d91a1a}-5.54\%}$
test_membership_stacked_nested_leaf_last 50.5840μs 13.9243μs 71.8169 KOps/s 73.7655 KOps/s $\color{#d91a1a}-2.64\%$
test_nested_getleaf 33.2220μs 10.8501μs 92.1648 KOps/s 96.1868 KOps/s $\color{#d91a1a}-4.18\%$
test_nested_get 37.2290μs 10.2687μs 97.3835 KOps/s 99.8696 KOps/s $\color{#d91a1a}-2.49\%$
test_stacked_getleaf 1.1769ms 0.6493ms 1.5401 KOps/s 1.6313 KOps/s $\textbf{\color{#d91a1a}-5.59\%}$
test_stacked_get 1.2903ms 0.6117ms 1.6347 KOps/s 1.7031 KOps/s $\color{#d91a1a}-4.02\%$
test_nested_getitemleaf 38.3010μs 10.7315μs 93.1834 KOps/s 94.7444 KOps/s $\color{#d91a1a}-1.65\%$
test_nested_getitem 40.2350μs 10.2817μs 97.2603 KOps/s 102.3342 KOps/s $\color{#d91a1a}-4.96\%$
test_stacked_getitemleaf 1.2384ms 0.6345ms 1.5760 KOps/s 1.6345 KOps/s $\color{#d91a1a}-3.58\%$
test_stacked_getitem 0.8348ms 0.6008ms 1.6643 KOps/s 1.6937 KOps/s $\color{#d91a1a}-1.73\%$
test_lock_nested 53.5346ms 0.5370ms 1.8623 KOps/s 2.0887 KOps/s $\textbf{\color{#d91a1a}-10.84\%}$
test_lock_stack_nested 69.9764ms 7.8091ms 128.0562 Ops/s 134.8617 Ops/s $\textbf{\color{#d91a1a}-5.05\%}$
test_unlock_nested 58.2626ms 0.5094ms 1.9631 KOps/s 2.0236 KOps/s $\color{#d91a1a}-2.99\%$
test_unlock_stack_nested 62.8419ms 7.5608ms 132.2617 Ops/s 206.5750 Ops/s $\textbf{\color{#d91a1a}-35.97\%}$
test_flatten_speed 0.5367ms 0.2668ms 3.7487 KOps/s 3.8225 KOps/s $\color{#d91a1a}-1.93\%$
test_unflatten_speed 0.8683ms 0.4690ms 2.1324 KOps/s 2.2428 KOps/s $\color{#d91a1a}-4.92\%$
test_common_ops 2.8302ms 0.6848ms 1.4603 KOps/s 1.4982 KOps/s $\color{#d91a1a}-2.53\%$
test_creation 49.8230μs 2.4307μs 411.4026 KOps/s 421.1741 KOps/s $\color{#d91a1a}-2.32\%$
test_creation_empty 39.8950μs 8.1685μs 122.4214 KOps/s 122.8991 KOps/s $\color{#d91a1a}-0.39\%$
test_creation_nested_1 45.7950μs 11.5286μs 86.7406 KOps/s 89.4600 KOps/s $\color{#d91a1a}-3.04\%$
test_creation_nested_2 34.4640μs 15.1832μs 65.8624 KOps/s 67.6852 KOps/s $\color{#d91a1a}-2.69\%$
test_clone 90.0780μs 13.5365μs 73.8743 KOps/s 76.4908 KOps/s $\color{#d91a1a}-3.42\%$
test_getitem[int] 55.9040μs 13.1572μs 76.0042 KOps/s 76.8686 KOps/s $\color{#d91a1a}-1.12\%$
test_getitem[slice_int] 64.0690μs 25.1341μs 39.7866 KOps/s 39.4468 KOps/s $\color{#35bf28}+0.86\%$
test_getitem[range] 0.2022ms 42.2052μs 23.6938 KOps/s 22.3820 KOps/s $\textbf{\color{#35bf28}+5.86\%}$
test_getitem[tuple] 63.8790μs 20.8067μs 48.0615 KOps/s 48.5792 KOps/s $\color{#d91a1a}-1.07\%$
test_getitem[list] 0.2558ms 38.3661μs 26.0647 KOps/s 25.5200 KOps/s $\color{#35bf28}+2.13\%$
test_setitem_dim[int] 53.6600μs 28.1646μs 35.5056 KOps/s 36.9804 KOps/s $\color{#d91a1a}-3.99\%$
test_setitem_dim[slice_int] 0.1204ms 53.8913μs 18.5559 KOps/s 20.3673 KOps/s $\textbf{\color{#d91a1a}-8.89\%}$
test_setitem_dim[range] 0.1206ms 71.3533μs 14.0148 KOps/s 13.7529 KOps/s $\color{#35bf28}+1.90\%$
test_setitem_dim[tuple] 68.9880μs 41.0921μs 24.3356 KOps/s 25.1132 KOps/s $\color{#d91a1a}-3.10\%$
test_setitem 82.5240μs 18.5597μs 53.8803 KOps/s 55.2216 KOps/s $\color{#d91a1a}-2.43\%$
test_set 78.8670μs 18.0650μs 55.3555 KOps/s 58.1057 KOps/s $\color{#d91a1a}-4.73\%$
test_set_shared 1.8502ms 0.1379ms 7.2528 KOps/s 7.3408 KOps/s $\color{#d91a1a}-1.20\%$
test_update 95.6380μs 23.5938μs 42.3841 KOps/s 43.3697 KOps/s $\color{#d91a1a}-2.27\%$
test_update_nested 0.1012ms 34.8542μs 28.6909 KOps/s 30.0434 KOps/s $\color{#d91a1a}-4.50\%$
test_set_nested 79.7190μs 19.8445μs 50.3917 KOps/s 52.6650 KOps/s $\color{#d91a1a}-4.32\%$
test_set_nested_new 97.7320μs 26.3600μs 37.9363 KOps/s 41.2183 KOps/s $\textbf{\color{#d91a1a}-7.96\%}$
test_select 0.1084ms 51.3400μs 19.4780 KOps/s 20.7869 KOps/s $\textbf{\color{#d91a1a}-6.30\%}$
test_unbind_speed 0.6965ms 0.3794ms 2.6354 KOps/s 2.6837 KOps/s $\color{#d91a1a}-1.80\%$
test_unbind_speed_stack0 66.7466ms 5.3629ms 186.4669 Ops/s 250.2493 Ops/s $\textbf{\color{#d91a1a}-25.49\%}$
test_unbind_speed_stack1 2.4145μs 0.6338μs 1.5779 MOps/s 1.6030 MOps/s $\color{#d91a1a}-1.57\%$
test_split 55.9680ms 1.7799ms 561.8276 Ops/s 556.5728 Ops/s $\color{#35bf28}+0.94\%$
test_chunk 50.4416ms 1.7431ms 573.6765 Ops/s 568.6560 Ops/s $\color{#35bf28}+0.88\%$
test_creation[device0] 0.3886ms 0.2946ms 3.3943 KOps/s 3.4662 KOps/s $\color{#d91a1a}-2.07\%$
test_creation_from_tensor 3.4794ms 0.3287ms 3.0425 KOps/s 3.0520 KOps/s $\color{#d91a1a}-0.31\%$
test_add_one[memmap_tensor0] 73.8880μs 25.3452μs 39.4552 KOps/s 40.2115 KOps/s $\color{#d91a1a}-1.88\%$
test_contiguous[memmap_tensor0] 46.4170μs 5.6985μs 175.4851 KOps/s 174.1955 KOps/s $\color{#35bf28}+0.74\%$
test_stack[memmap_tensor0] 60.2220μs 19.5464μs 51.1604 KOps/s 51.7114 KOps/s $\color{#d91a1a}-1.07\%$
test_memmaptd_index 0.2595ms 0.1924ms 5.1974 KOps/s 5.1989 KOps/s $\color{#d91a1a}-0.03\%$
test_memmaptd_index_astensor 0.4076ms 0.2480ms 4.0329 KOps/s 4.0138 KOps/s $\color{#35bf28}+0.48\%$
test_memmaptd_index_op 0.6558ms 0.4885ms 2.0469 KOps/s 2.0261 KOps/s $\color{#35bf28}+1.03\%$
test_reshape_pytree 67.5150μs 23.5601μs 42.4446 KOps/s 44.8024 KOps/s $\textbf{\color{#d91a1a}-5.26\%}$
test_reshape_td 66.6840μs 31.2562μs 31.9937 KOps/s 32.8544 KOps/s $\color{#d91a1a}-2.62\%$
test_view_pytree 55.5730μs 23.0130μs 43.4536 KOps/s 44.7210 KOps/s $\color{#d91a1a}-2.83\%$
test_view_td 29.5550μs 4.8362μs 206.7744 KOps/s 209.4473 KOps/s $\color{#d91a1a}-1.28\%$
test_unbind_pytree 0.6363ms 26.1998μs 38.1682 KOps/s 38.6905 KOps/s $\color{#d91a1a}-1.35\%$
test_unbind_td 0.1108ms 60.0244μs 16.6599 KOps/s 16.8776 KOps/s $\color{#d91a1a}-1.29\%$
test_split_pytree 57.9880μs 26.3791μs 37.9088 KOps/s 39.2936 KOps/s $\color{#d91a1a}-3.52\%$
test_split_td 90.0470μs 46.0257μs 21.7270 KOps/s 21.5111 KOps/s $\color{#35bf28}+1.00\%$
test_add_pytree 79.6790μs 32.3212μs 30.9395 KOps/s 31.2846 KOps/s $\color{#d91a1a}-1.10\%$
test_add_td 95.5980μs 44.3351μs 22.5555 KOps/s 22.7875 KOps/s $\color{#d91a1a}-1.02\%$
test_distributed 20.1980μs 5.9936μs 166.8446 KOps/s 172.0924 KOps/s $\color{#d91a1a}-3.05\%$
test_tdmodule 1.5871ms 22.4976μs 44.4493 KOps/s 47.6618 KOps/s $\textbf{\color{#d91a1a}-6.74\%}$
test_tdmodule_dispatch 0.2051ms 38.6877μs 25.8480 KOps/s 26.1992 KOps/s $\color{#d91a1a}-1.34\%$
test_tdseq 42.5890μs 23.4047μs 42.7265 KOps/s 42.3866 KOps/s $\color{#35bf28}+0.80\%$
test_tdseq_dispatch 0.1366ms 42.0017μs 23.8085 KOps/s 23.8229 KOps/s $\color{#d91a1a}-0.06\%$
test_instantiation_functorch 1.4487ms 1.3375ms 747.6533 Ops/s 791.8239 Ops/s $\textbf{\color{#d91a1a}-5.58\%}$
test_instantiation_td 69.5943ms 1.1217ms 891.5276 Ops/s 950.4616 Ops/s $\textbf{\color{#d91a1a}-6.20\%}$
test_exec_functorch 0.2310ms 0.1610ms 6.2098 KOps/s 6.3443 KOps/s $\color{#d91a1a}-2.12\%$
test_exec_functional_call 0.2278ms 0.1501ms 6.6630 KOps/s 6.7826 KOps/s $\color{#d91a1a}-1.76\%$
test_exec_td 0.2248ms 0.1465ms 6.8267 KOps/s 6.8537 KOps/s $\color{#d91a1a}-0.39\%$
test_exec_td_decorator 0.6893ms 0.2234ms 4.4767 KOps/s 4.6542 KOps/s $\color{#d91a1a}-3.82\%$
test_vmap_mlp_speed[True-True] 0.9700ms 0.8885ms 1.1254 KOps/s 1.1369 KOps/s $\color{#d91a1a}-1.01\%$
test_vmap_mlp_speed[True-False] 0.7681ms 0.4653ms 2.1494 KOps/s 2.1761 KOps/s $\color{#d91a1a}-1.23\%$
test_vmap_mlp_speed[False-True] 1.5787ms 0.7809ms 1.2806 KOps/s 1.3111 KOps/s $\color{#d91a1a}-2.32\%$
test_vmap_mlp_speed[False-False] 0.7039ms 0.3826ms 2.6137 KOps/s 2.6167 KOps/s $\color{#d91a1a}-0.11\%$
test_vmap_mlp_speed_decorator[True-True] 2.2493ms 1.5739ms 635.3515 Ops/s 643.6554 Ops/s $\color{#d91a1a}-1.29\%$
test_vmap_mlp_speed_decorator[True-False] 1.0332ms 0.5455ms 1.8333 KOps/s 1.8423 KOps/s $\color{#d91a1a}-0.49\%$
test_vmap_mlp_speed_decorator[False-True] 1.9228ms 1.3604ms 735.1042 Ops/s 749.8734 Ops/s $\color{#d91a1a}-1.97\%$
test_vmap_mlp_speed_decorator[False-False] 0.8358ms 0.4247ms 2.3548 KOps/s 2.3844 KOps/s $\color{#d91a1a}-1.24\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}1$. Worsened: $\large\color{#d91a1a}13$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.4567ms 12.8286μs 77.9507 KOps/s 78.3521 KOps/s $\color{#d91a1a}-0.51\%$
test_plain_set_stack_nested 0.1961ms 0.1159ms 8.6247 KOps/s 8.3774 KOps/s $\color{#35bf28}+2.95\%$
test_plain_set_nested_inplace 31.6400μs 15.6392μs 63.9417 KOps/s 65.8399 KOps/s $\color{#d91a1a}-2.88\%$
test_plain_set_stack_nested_inplace 0.1759ms 0.1447ms 6.9127 KOps/s 7.0749 KOps/s $\color{#d91a1a}-2.29\%$
test_items 23.6400μs 4.7129μs 212.1835 KOps/s 210.7644 KOps/s $\color{#35bf28}+0.67\%$
test_items_nested 0.3823ms 0.3384ms 2.9549 KOps/s 2.9603 KOps/s $\color{#d91a1a}-0.18\%$
test_items_nested_locked 0.3599ms 0.3373ms 2.9647 KOps/s 2.9468 KOps/s $\color{#35bf28}+0.61\%$
test_items_nested_leaf 0.2235ms 0.1978ms 5.0549 KOps/s 5.0043 KOps/s $\color{#35bf28}+1.01\%$
test_items_stack_nested 1.5352ms 1.4793ms 675.9928 Ops/s 677.6973 Ops/s $\color{#d91a1a}-0.25\%$
test_items_stack_nested_leaf 1.3825ms 1.3186ms 758.3694 Ops/s 764.7870 Ops/s $\color{#d91a1a}-0.84\%$
test_items_stack_nested_locked 0.8913ms 0.8229ms 1.2153 KOps/s 1.2391 KOps/s $\color{#d91a1a}-1.93\%$
test_keys 22.7600μs 4.6217μs 216.3690 KOps/s 212.8548 KOps/s $\color{#35bf28}+1.65\%$
test_keys_nested 0.5788ms 90.9418μs 10.9960 KOps/s 11.0957 KOps/s $\color{#d91a1a}-0.90\%$
test_keys_nested_locked 0.1112ms 89.8591μs 11.1285 KOps/s 11.1918 KOps/s $\color{#d91a1a}-0.57\%$
test_keys_nested_leaf 42.6074ms 86.6209μs 11.5446 KOps/s 12.1860 KOps/s $\textbf{\color{#d91a1a}-5.26\%}$
test_keys_stack_nested 1.3389ms 1.3016ms 768.2778 Ops/s 763.8173 Ops/s $\color{#35bf28}+0.58\%$
test_keys_stack_nested_leaf 1.3866ms 1.2982ms 770.2936 Ops/s 767.4561 Ops/s $\color{#35bf28}+0.37\%$
test_keys_stack_nested_locked 0.7084ms 0.6244ms 1.6015 KOps/s 1.6303 KOps/s $\color{#d91a1a}-1.77\%$
test_values 10.3937μs 1.8829μs 531.0966 KOps/s 526.9657 KOps/s $\color{#35bf28}+0.78\%$
test_values_nested 58.6310μs 43.2600μs 23.1160 KOps/s 23.1413 KOps/s $\color{#d91a1a}-0.11\%$
test_values_nested_locked 73.4110μs 43.0648μs 23.2208 KOps/s 23.1102 KOps/s $\color{#35bf28}+0.48\%$
test_values_nested_leaf 0.1101ms 37.5784μs 26.6110 KOps/s 26.7814 KOps/s $\color{#d91a1a}-0.64\%$
test_values_stack_nested 1.1960ms 1.1407ms 876.6554 Ops/s 890.2080 Ops/s $\color{#d91a1a}-1.52\%$
test_values_stack_nested_leaf 1.1655ms 1.1291ms 885.6979 Ops/s 899.1234 Ops/s $\color{#d91a1a}-1.49\%$
test_values_stack_nested_locked 0.5819ms 0.4978ms 2.0087 KOps/s 2.0517 KOps/s $\color{#d91a1a}-2.09\%$
test_membership 3.9200μs 0.9449μs 1.0583 MOps/s 1.0578 MOps/s $\color{#35bf28}+0.05\%$
test_membership_nested 16.2510μs 2.2105μs 452.3838 KOps/s 445.6173 KOps/s $\color{#35bf28}+1.52\%$
test_membership_nested_leaf 13.3050μs 2.1401μs 467.2646 KOps/s 465.3064 KOps/s $\color{#35bf28}+0.42\%$
test_membership_stacked_nested 45.3010μs 10.8143μs 92.4699 KOps/s 90.9826 KOps/s $\color{#35bf28}+1.63\%$
test_membership_stacked_nested_leaf 30.6410μs 10.8942μs 91.7919 KOps/s 90.6306 KOps/s $\color{#35bf28}+1.28\%$
test_membership_nested_last 38.2710μs 4.6523μs 214.9485 KOps/s 215.5704 KOps/s $\color{#d91a1a}-0.29\%$
test_membership_nested_leaf_last 27.0510μs 4.6591μs 214.6334 KOps/s 216.4922 KOps/s $\color{#d91a1a}-0.86\%$
test_membership_stacked_nested_last 0.2047ms 0.1343ms 7.4478 KOps/s 7.5429 KOps/s $\color{#d91a1a}-1.26\%$
test_membership_stacked_nested_leaf_last 81.8610μs 12.8309μs 77.9369 KOps/s 78.2151 KOps/s $\color{#d91a1a}-0.36\%$
test_nested_getleaf 31.0910μs 8.3599μs 119.6193 KOps/s 119.3469 KOps/s $\color{#35bf28}+0.23\%$
test_nested_get 22.3410μs 7.9436μs 125.8880 KOps/s 125.9552 KOps/s $\color{#d91a1a}-0.05\%$
test_stacked_getleaf 0.6361ms 0.5739ms 1.7425 KOps/s 1.7231 KOps/s $\color{#35bf28}+1.13\%$
test_stacked_get 0.6532ms 0.5352ms 1.8684 KOps/s 1.8468 KOps/s $\color{#35bf28}+1.17\%$
test_nested_getitemleaf 27.8000μs 8.4531μs 118.2997 KOps/s 118.1335 KOps/s $\color{#35bf28}+0.14\%$
test_nested_getitem 30.6100μs 8.0113μs 124.8231 KOps/s 125.1091 KOps/s $\color{#d91a1a}-0.23\%$
test_stacked_getitemleaf 0.6319ms 0.5778ms 1.7306 KOps/s 1.7434 KOps/s $\color{#d91a1a}-0.73\%$
test_stacked_getitem 0.6607ms 0.5362ms 1.8651 KOps/s 1.8465 KOps/s $\color{#35bf28}+1.01\%$
test_lock_nested 4.4657ms 0.4632ms 2.1590 KOps/s 2.1483 KOps/s $\color{#35bf28}+0.50\%$
test_lock_stack_nested 71.5859ms 6.6803ms 149.6929 Ops/s 149.2541 Ops/s $\color{#35bf28}+0.29\%$
test_unlock_nested 1.3050ms 0.4381ms 2.2828 KOps/s 1.9912 KOps/s $\textbf{\color{#35bf28}+14.64\%}$
test_unlock_stack_nested 67.7370ms 7.4377ms 134.4493 Ops/s 135.1676 Ops/s $\color{#d91a1a}-0.53\%$
test_flatten_speed 0.5211ms 0.1877ms 5.3289 KOps/s 5.3870 KOps/s $\color{#d91a1a}-1.08\%$
test_unflatten_speed 0.3925ms 0.3604ms 2.7745 KOps/s 2.7689 KOps/s $\color{#35bf28}+0.20\%$
test_common_ops 1.0912ms 0.6359ms 1.5725 KOps/s 1.6002 KOps/s $\color{#d91a1a}-1.73\%$
test_creation 37.1400μs 1.9575μs 510.8514 KOps/s 508.4584 KOps/s $\color{#35bf28}+0.47\%$
test_creation_empty 33.8310μs 7.1542μs 139.7780 KOps/s 136.7844 KOps/s $\color{#35bf28}+2.19\%$
test_creation_nested_1 32.5010μs 9.5509μs 104.7023 KOps/s 104.0046 KOps/s $\color{#35bf28}+0.67\%$
test_creation_nested_2 71.2920μs 12.1751μs 82.1350 KOps/s 81.9819 KOps/s $\color{#35bf28}+0.19\%$
test_clone 93.9120μs 14.8206μs 67.4735 KOps/s 71.4441 KOps/s $\textbf{\color{#d91a1a}-5.56\%}$
test_getitem[int] 66.4520μs 12.2180μs 81.8466 KOps/s 81.7625 KOps/s $\color{#35bf28}+0.10\%$
test_getitem[slice_int] 48.2710μs 23.8226μs 41.9770 KOps/s 42.4479 KOps/s $\color{#d91a1a}-1.11\%$
test_getitem[range] 68.2810μs 41.5742μs 24.0534 KOps/s 25.4460 KOps/s $\textbf{\color{#d91a1a}-5.47\%}$
test_getitem[tuple] 42.7710μs 20.4790μs 48.8305 KOps/s 48.5173 KOps/s $\color{#35bf28}+0.65\%$
test_getitem[list] 0.2495ms 38.4304μs 26.0211 KOps/s 27.4512 KOps/s $\textbf{\color{#d91a1a}-5.21\%}$
test_setitem_dim[int] 43.3110μs 27.3272μs 36.5935 KOps/s 38.6876 KOps/s $\textbf{\color{#d91a1a}-5.41\%}$
test_setitem_dim[slice_int] 82.0320μs 47.7186μs 20.9562 KOps/s 21.5820 KOps/s $\color{#d91a1a}-2.90\%$
test_setitem_dim[range] 90.2520μs 63.1909μs 15.8251 KOps/s 15.9215 KOps/s $\color{#d91a1a}-0.61\%$
test_setitem_dim[tuple] 60.4110μs 39.9395μs 25.0379 KOps/s 25.7163 KOps/s $\color{#d91a1a}-2.64\%$
test_setitem 0.1034ms 19.1817μs 52.1331 KOps/s 55.3744 KOps/s $\textbf{\color{#d91a1a}-5.85\%}$
test_set 99.4120μs 18.5442μs 53.9252 KOps/s 57.3446 KOps/s $\textbf{\color{#d91a1a}-5.96\%}$
test_set_shared 0.5668ms 0.1027ms 9.7327 KOps/s 9.9260 KOps/s $\color{#d91a1a}-1.95\%$
test_update 0.1135ms 22.9937μs 43.4902 KOps/s 45.8345 KOps/s $\textbf{\color{#d91a1a}-5.11\%}$
test_update_nested 0.1284ms 32.1817μs 31.0735 KOps/s 32.1069 KOps/s $\color{#d91a1a}-3.22\%$
test_set_nested 98.6820μs 19.9733μs 50.0668 KOps/s 52.6531 KOps/s $\color{#d91a1a}-4.91\%$
test_set_nested_new 0.1090ms 23.9336μs 41.7822 KOps/s 43.8990 KOps/s $\color{#d91a1a}-4.82\%$
test_select 76.8920μs 46.3370μs 21.5810 KOps/s 21.2858 KOps/s $\color{#35bf28}+1.39\%$
test_to 73.6220μs 52.3354μs 19.1075 KOps/s 18.8297 KOps/s $\color{#35bf28}+1.48\%$
test_to_nonblocking 70.3810μs 34.8765μs 28.6726 KOps/s 28.4916 KOps/s $\color{#35bf28}+0.64\%$
test_unbind_speed 0.3927ms 0.3565ms 2.8049 KOps/s 2.8222 KOps/s $\color{#d91a1a}-0.61\%$
test_unbind_speed_stack0 63.3412ms 5.2621ms 190.0377 Ops/s 191.1943 Ops/s $\color{#d91a1a}-0.60\%$
test_unbind_speed_stack1 1.2430μs 0.5244μs 1.9068 MOps/s 1.9284 MOps/s $\color{#d91a1a}-1.12\%$
test_split 54.2278ms 1.8369ms 544.4051 Ops/s 558.3058 Ops/s $\color{#d91a1a}-2.49\%$
test_chunk 54.0882ms 1.8191ms 549.7133 Ops/s 564.2835 Ops/s $\color{#d91a1a}-2.58\%$
test_creation[device0] 0.4516ms 0.3097ms 3.2290 KOps/s 3.1815 KOps/s $\color{#35bf28}+1.49\%$
test_creation[device1] 0.7904ms 0.3115ms 3.2103 KOps/s 3.1816 KOps/s $\color{#35bf28}+0.90\%$
test_creation_from_tensor 57.2059ms 0.3640ms 2.7472 KOps/s 2.9490 KOps/s $\textbf{\color{#d91a1a}-6.84\%}$
test_add_one[memmap_tensor0] 0.2546ms 24.5831μs 40.6783 KOps/s 41.9724 KOps/s $\color{#d91a1a}-3.08\%$
test_add_one[memmap_tensor1] 0.1845ms 75.1746μs 13.3024 KOps/s 13.2829 KOps/s $\color{#35bf28}+0.15\%$
test_contiguous[memmap_tensor0] 32.1500μs 6.0829μs 164.3948 KOps/s 169.5108 KOps/s $\color{#d91a1a}-3.02\%$
test_contiguous[memmap_tensor1] 50.8800μs 22.5668μs 44.3129 KOps/s 45.1950 KOps/s $\color{#d91a1a}-1.95\%$
test_stack[memmap_tensor0] 49.3610μs 21.5523μs 46.3988 KOps/s 50.8129 KOps/s $\textbf{\color{#d91a1a}-8.69\%}$
test_stack[memmap_tensor1] 0.1623ms 75.4035μs 13.2620 KOps/s 13.4455 KOps/s $\color{#d91a1a}-1.36\%$
test_memmaptd_index 0.2619ms 0.2262ms 4.4212 KOps/s 4.4292 KOps/s $\color{#d91a1a}-0.18\%$
test_memmaptd_index_astensor 0.3879ms 0.2811ms 3.5579 KOps/s 3.5861 KOps/s $\color{#d91a1a}-0.79\%$
test_memmaptd_index_op 0.6509ms 0.5683ms 1.7596 KOps/s 1.8521 KOps/s $\color{#d91a1a}-4.99\%$
test_reshape_pytree 0.2609ms 21.4457μs 46.6293 KOps/s 47.8269 KOps/s $\color{#d91a1a}-2.50\%$
test_reshape_td 60.2110μs 30.8421μs 32.4232 KOps/s 33.4734 KOps/s $\color{#d91a1a}-3.14\%$
test_view_pytree 43.8300μs 21.2342μs 47.0939 KOps/s 48.6227 KOps/s $\color{#d91a1a}-3.14\%$
test_view_td 19.3600μs 4.1562μs 240.6035 KOps/s 245.4158 KOps/s $\color{#d91a1a}-1.96\%$
test_unbind_pytree 44.2310μs 26.6925μs 37.4637 KOps/s 38.5549 KOps/s $\color{#d91a1a}-2.83\%$
test_unbind_td 83.8310μs 57.3718μs 17.4302 KOps/s 17.6359 KOps/s $\color{#d91a1a}-1.17\%$
test_split_pytree 94.8720μs 25.7152μs 38.8875 KOps/s 42.1588 KOps/s $\textbf{\color{#d91a1a}-7.76\%}$
test_split_td 71.3520μs 44.9081μs 22.2677 KOps/s 22.7556 KOps/s $\color{#d91a1a}-2.14\%$
test_add_pytree 56.6310μs 33.6811μs 29.6902 KOps/s 32.1449 KOps/s $\textbf{\color{#d91a1a}-7.64\%}$
test_add_td 76.2010μs 46.8318μs 21.3530 KOps/s 22.7685 KOps/s $\textbf{\color{#d91a1a}-6.22\%}$
test_distributed 20.8110μs 5.6930μs 175.6552 KOps/s 176.3318 KOps/s $\color{#d91a1a}-0.38\%$
test_tdmodule 89.4720μs 16.9690μs 58.9311 KOps/s 58.8565 KOps/s $\color{#35bf28}+0.13\%$
test_tdmodule_dispatch 0.2298ms 33.2376μs 30.0865 KOps/s 30.1235 KOps/s $\color{#d91a1a}-0.12\%$
test_tdseq 39.7800μs 20.2016μs 49.5011 KOps/s 48.4943 KOps/s $\color{#35bf28}+2.08\%$
test_tdseq_dispatch 0.1357ms 36.3542μs 27.5071 KOps/s 27.1658 KOps/s $\color{#35bf28}+1.26\%$
test_instantiation_functorch 1.7490ms 1.7130ms 583.7809 Ops/s 595.4224 Ops/s $\color{#d91a1a}-1.96\%$
test_instantiation_td 1.8487ms 1.1924ms 838.6647 Ops/s 847.1615 Ops/s $\color{#d91a1a}-1.00\%$
test_exec_functorch 0.2055ms 0.1625ms 6.1541 KOps/s 6.2629 KOps/s $\color{#d91a1a}-1.74\%$
test_exec_functional_call 0.2208ms 0.1643ms 6.0867 KOps/s 6.2422 KOps/s $\color{#d91a1a}-2.49\%$
test_exec_td 0.1883ms 0.1542ms 6.4845 KOps/s 6.6362 KOps/s $\color{#d91a1a}-2.29\%$
test_exec_td_decorator 1.0352ms 0.2261ms 4.4232 KOps/s 4.4516 KOps/s $\color{#d91a1a}-0.64\%$
test_vmap_mlp_speed[True-True] 1.1796ms 1.0949ms 913.3494 Ops/s 905.0262 Ops/s $\color{#35bf28}+0.92\%$
test_vmap_mlp_speed[True-False] 0.7440ms 0.6384ms 1.5665 KOps/s 1.5885 KOps/s $\color{#d91a1a}-1.39\%$
test_vmap_mlp_speed[False-True] 1.0706ms 1.0082ms 991.8382 Ops/s 991.0404 Ops/s $\color{#35bf28}+0.08\%$
test_vmap_mlp_speed[False-False] 0.6086ms 0.5549ms 1.8021 KOps/s 1.7826 KOps/s $\color{#35bf28}+1.09\%$
test_vmap_mlp_speed_decorator[True-True] 2.6210ms 1.8237ms 548.3348 Ops/s 544.1566 Ops/s $\color{#35bf28}+0.77\%$
test_vmap_mlp_speed_decorator[True-False] 1.1735ms 0.7077ms 1.4131 KOps/s 1.4225 KOps/s $\color{#d91a1a}-0.66\%$
test_vmap_mlp_speed_decorator[False-True] 2.1100ms 1.6404ms 609.6198 Ops/s 605.7532 Ops/s $\color{#35bf28}+0.64\%$
test_vmap_mlp_speed_decorator[False-False] 1.0398ms 0.5955ms 1.6793 KOps/s 1.6746 KOps/s $\color{#35bf28}+0.28\%$
test_vmap_transformer_speed[True-True] 12.9590ms 12.8752ms 77.6687 Ops/s 77.5838 Ops/s $\color{#35bf28}+0.11\%$
test_vmap_transformer_speed[True-False] 10.7422ms 8.4431ms 118.4392 Ops/s 118.5362 Ops/s $\color{#d91a1a}-0.08\%$
test_vmap_transformer_speed[False-True] 12.8261ms 12.7340ms 78.5302 Ops/s 78.2186 Ops/s $\color{#35bf28}+0.40\%$
test_vmap_transformer_speed[False-False] 8.6754ms 8.3677ms 119.5065 Ops/s 119.4152 Ops/s $\color{#35bf28}+0.08\%$
test_vmap_transformer_speed_decorator[True-True] 46.4936ms 44.8865ms 22.2784 Ops/s 22.8169 Ops/s $\color{#d91a1a}-2.36\%$
test_vmap_transformer_speed_decorator[True-False] 99.3917ms 22.2757ms 44.8919 Ops/s 44.8315 Ops/s $\color{#35bf28}+0.13\%$
test_vmap_transformer_speed_decorator[False-True] 44.4971ms 43.3295ms 23.0790 Ops/s 23.0208 Ops/s $\color{#35bf28}+0.25\%$
test_vmap_transformer_speed_decorator[False-False] 0.1012s 21.9076ms 45.6463 Ops/s 45.4709 Ops/s $\color{#35bf28}+0.39\%$

@vmoens vmoens merged commit 1a7f43a into main Nov 23, 2023
45 checks passed
@vmoens vmoens deleted the faster-params-registration branch November 23, 2023 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants