Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Faster set #619

Merged
merged 1 commit into from
Jan 15, 2024
Merged

[Performance] Faster set #619

merged 1 commit into from
Jan 15, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 15, 2024

cc @dubuqa

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 15, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 120. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1474ms 17.4049μs 57.4551 KOps/s 55.0453 KOps/s $\color{#35bf28}+4.38\%$
test_plain_set_stack_nested 0.1987ms 0.1456ms 6.8668 KOps/s 6.6853 KOps/s $\color{#35bf28}+2.72\%$
test_plain_set_nested_inplace 66.8750μs 19.6244μs 50.9571 KOps/s 49.4518 KOps/s $\color{#35bf28}+3.04\%$
test_plain_set_stack_nested_inplace 0.3188ms 0.1799ms 5.5599 KOps/s 5.4782 KOps/s $\color{#35bf28}+1.49\%$
test_items 18.5340μs 2.4148μs 414.1109 KOps/s 403.4641 KOps/s $\color{#35bf28}+2.64\%$
test_items_nested 0.4429ms 0.2654ms 3.7676 KOps/s 3.6988 KOps/s $\color{#35bf28}+1.86\%$
test_items_nested_locked 0.6331ms 0.2798ms 3.5737 KOps/s 3.6817 KOps/s $\color{#d91a1a}-2.93\%$
test_items_nested_leaf 0.2909ms 0.1644ms 6.0812 KOps/s 5.9162 KOps/s $\color{#35bf28}+2.79\%$
test_items_stack_nested 1.7300ms 1.3195ms 757.8714 Ops/s 705.8108 Ops/s $\textbf{\color{#35bf28}+7.38\%}$
test_items_stack_nested_leaf 1.3097ms 1.1929ms 838.2958 Ops/s 828.1432 Ops/s $\color{#35bf28}+1.23\%$
test_items_stack_nested_locked 1.1261ms 0.8722ms 1.1466 KOps/s 1.1441 KOps/s $\color{#35bf28}+0.22\%$
test_keys 20.1780μs 3.8007μs 263.1101 KOps/s 261.8893 KOps/s $\color{#35bf28}+0.47\%$
test_keys_nested 59.6221ms 0.1581ms 6.3252 KOps/s 6.8204 KOps/s $\textbf{\color{#d91a1a}-7.26\%}$
test_keys_nested_locked 0.2666ms 0.1479ms 6.7625 KOps/s 6.8350 KOps/s $\color{#d91a1a}-1.06\%$
test_keys_nested_leaf 0.2277ms 0.1304ms 7.6696 KOps/s 7.8440 KOps/s $\color{#d91a1a}-2.22\%$
test_keys_stack_nested 1.7626ms 1.2749ms 784.3967 Ops/s 764.7685 Ops/s $\color{#35bf28}+2.57\%$
test_keys_stack_nested_leaf 1.6094ms 1.2599ms 793.7245 Ops/s 763.9428 Ops/s $\color{#35bf28}+3.90\%$
test_keys_stack_nested_locked 1.0252ms 0.7982ms 1.2528 KOps/s 1.2363 KOps/s $\color{#35bf28}+1.33\%$
test_values 7.3716μs 1.0551μs 947.7851 KOps/s 880.9837 KOps/s $\textbf{\color{#35bf28}+7.58\%}$
test_values_nested 99.4550μs 51.9548μs 19.2475 KOps/s 19.1889 KOps/s $\color{#35bf28}+0.31\%$
test_values_nested_locked 97.4020μs 52.7308μs 18.9643 KOps/s 19.0777 KOps/s $\color{#d91a1a}-0.59\%$
test_values_nested_leaf 0.1029ms 47.6777μs 20.9741 KOps/s 21.5859 KOps/s $\color{#d91a1a}-2.83\%$
test_values_stack_nested 1.3010ms 1.0509ms 951.5260 Ops/s 942.7298 Ops/s $\color{#35bf28}+0.93\%$
test_values_stack_nested_leaf 1.2065ms 1.0385ms 962.9303 Ops/s 957.4539 Ops/s $\color{#35bf28}+0.57\%$
test_values_stack_nested_locked 3.4413ms 0.6157ms 1.6241 KOps/s 1.6339 KOps/s $\color{#d91a1a}-0.60\%$
test_membership 37.5200μs 1.3521μs 739.5755 KOps/s 733.5604 KOps/s $\color{#35bf28}+0.82\%$
test_membership_nested 35.4060μs 2.8890μs 346.1413 KOps/s 344.0494 KOps/s $\color{#35bf28}+0.61\%$
test_membership_nested_leaf 39.3230μs 2.9555μs 338.3569 KOps/s 345.0321 KOps/s $\color{#d91a1a}-1.93\%$
test_membership_stacked_nested 30.0860μs 12.0192μs 83.2001 KOps/s 83.4028 KOps/s $\color{#d91a1a}-0.24\%$
test_membership_stacked_nested_leaf 61.7650μs 11.9799μs 83.4735 KOps/s 83.2538 KOps/s $\color{#35bf28}+0.26\%$
test_membership_nested_last 37.0980μs 6.1050μs 163.7989 KOps/s 167.1081 KOps/s $\color{#d91a1a}-1.98\%$
test_membership_nested_leaf_last 39.3030μs 6.1398μs 162.8730 KOps/s 166.3374 KOps/s $\color{#d91a1a}-2.08\%$
test_membership_stacked_nested_last 0.3059ms 0.1689ms 5.9212 KOps/s 5.9165 KOps/s $\color{#35bf28}+0.08\%$
test_membership_stacked_nested_leaf_last 50.2740μs 14.0437μs 71.2063 KOps/s 72.1471 KOps/s $\color{#d91a1a}-1.30\%$
test_nested_getleaf 55.2530μs 10.7147μs 93.3297 KOps/s 94.8429 KOps/s $\color{#d91a1a}-1.60\%$
test_nested_get 42.1290μs 10.0794μs 99.2118 KOps/s 97.9070 KOps/s $\color{#35bf28}+1.33\%$
test_stacked_getleaf 0.5442ms 0.4033ms 2.4795 KOps/s 2.4401 KOps/s $\color{#35bf28}+1.61\%$
test_stacked_get 0.6758ms 0.3682ms 2.7159 KOps/s 2.6773 KOps/s $\color{#35bf28}+1.44\%$
test_nested_getitemleaf 39.9640μs 10.6326μs 94.0500 KOps/s 94.5793 KOps/s $\color{#d91a1a}-0.56\%$
test_nested_getitem 43.4310μs 10.0771μs 99.2351 KOps/s 100.0480 KOps/s $\color{#d91a1a}-0.81\%$
test_stacked_getitemleaf 0.6179ms 0.4062ms 2.4616 KOps/s 2.4236 KOps/s $\color{#35bf28}+1.57\%$
test_stacked_getitem 0.6182ms 0.3680ms 2.7177 KOps/s 2.6582 KOps/s $\color{#35bf28}+2.24\%$
test_lock_nested 1.3040ms 0.4175ms 2.3951 KOps/s 2.4080 KOps/s $\color{#d91a1a}-0.54\%$
test_lock_stack_nested 88.6757ms 6.9351ms 144.1935 Ops/s 144.7434 Ops/s $\color{#d91a1a}-0.38\%$
test_unlock_nested 69.0728ms 0.4903ms 2.0395 KOps/s 2.3658 KOps/s $\textbf{\color{#d91a1a}-13.79\%}$
test_unlock_stack_nested 82.2344ms 6.4046ms 156.1384 Ops/s 155.5763 Ops/s $\color{#35bf28}+0.36\%$
test_flatten_speed 0.7273ms 0.3727ms 2.6833 KOps/s 2.6550 KOps/s $\color{#35bf28}+1.07\%$
test_unflatten_speed 0.6560ms 0.4584ms 2.1815 KOps/s 2.1728 KOps/s $\color{#35bf28}+0.40\%$
test_common_ops 1.4277ms 0.7007ms 1.4271 KOps/s 1.3917 KOps/s $\color{#35bf28}+2.54\%$
test_creation 85.8590μs 2.0069μs 498.2833 KOps/s 494.4449 KOps/s $\color{#35bf28}+0.78\%$
test_creation_empty 46.9070μs 11.0634μs 90.3885 KOps/s 85.2331 KOps/s $\textbf{\color{#35bf28}+6.05\%}$
test_creation_nested_1 56.8650μs 14.1503μs 70.6698 KOps/s 67.9512 KOps/s $\color{#35bf28}+4.00\%$
test_creation_nested_2 53.8000μs 17.3278μs 57.7107 KOps/s 50.7208 KOps/s $\textbf{\color{#35bf28}+13.78\%}$
test_clone 0.2299ms 12.3308μs 81.0978 KOps/s 82.2104 KOps/s $\color{#d91a1a}-1.35\%$
test_getitem[int] 43.3710μs 11.7848μs 84.8553 KOps/s 84.4453 KOps/s $\color{#35bf28}+0.49\%$
test_getitem[slice_int] 55.8440μs 22.8609μs 43.7427 KOps/s 42.6405 KOps/s $\color{#35bf28}+2.59\%$
test_getitem[range] 99.8860μs 44.4289μs 22.5079 KOps/s 23.9788 KOps/s $\textbf{\color{#d91a1a}-6.13\%}$
test_getitem[tuple] 71.4930μs 18.8429μs 53.0705 KOps/s 52.6004 KOps/s $\color{#35bf28}+0.89\%$
test_getitem[list] 0.4974ms 38.0287μs 26.2960 KOps/s 26.5253 KOps/s $\color{#d91a1a}-0.86\%$
test_setitem_dim[int] 82.3630μs 30.4845μs 32.8035 KOps/s 29.8512 KOps/s $\textbf{\color{#35bf28}+9.89\%}$
test_setitem_dim[slice_int] 91.1290μs 56.9866μs 17.5480 KOps/s 16.9031 KOps/s $\color{#35bf28}+3.82\%$
test_setitem_dim[range] 0.1229ms 78.5496μs 12.7308 KOps/s 12.9518 KOps/s $\color{#d91a1a}-1.71\%$
test_setitem_dim[tuple] 98.2130μs 45.3928μs 22.0299 KOps/s 21.2960 KOps/s $\color{#35bf28}+3.45\%$
test_setitem 0.2509ms 18.9518μs 52.7654 KOps/s 50.9169 KOps/s $\color{#35bf28}+3.63\%$
test_set 0.2126ms 18.1843μs 54.9925 KOps/s 52.2237 KOps/s $\textbf{\color{#35bf28}+5.30\%}$
test_set_shared 2.1551ms 0.1390ms 7.1919 KOps/s 7.0258 KOps/s $\color{#35bf28}+2.36\%$
test_update 0.2128ms 21.8317μs 45.8051 KOps/s 44.5790 KOps/s $\color{#35bf28}+2.75\%$
test_update_nested 0.2282ms 28.9964μs 34.4871 KOps/s 33.7546 KOps/s $\color{#35bf28}+2.17\%$
test_set_nested 0.2172ms 19.9508μs 50.1233 KOps/s 48.1916 KOps/s $\color{#35bf28}+4.01\%$
test_set_nested_new 0.2175ms 25.0202μs 39.9676 KOps/s 40.3067 KOps/s $\color{#d91a1a}-0.84\%$
test_select 0.1202ms 47.9843μs 20.8402 KOps/s 20.4530 KOps/s $\color{#35bf28}+1.89\%$
test_unbind_speed 0.5067ms 0.3423ms 2.9216 KOps/s 2.9574 KOps/s $\color{#d91a1a}-1.21\%$
test_unbind_speed_stack0 69.9919ms 4.5807ms 218.3086 Ops/s 234.2013 Ops/s $\textbf{\color{#d91a1a}-6.79\%}$
test_unbind_speed_stack1 2.5757μs 0.6200μs 1.6129 MOps/s 1.5195 MOps/s $\textbf{\color{#35bf28}+6.15\%}$
test_split 2.3586ms 1.5571ms 642.2103 Ops/s 589.0819 Ops/s $\textbf{\color{#35bf28}+9.02\%}$
test_chunk 69.4639ms 1.6630ms 601.3186 Ops/s 598.0948 Ops/s $\color{#35bf28}+0.54\%$
test_creation[device0] 0.2261ms 0.1007ms 9.9288 KOps/s 9.9772 KOps/s $\color{#d91a1a}-0.49\%$
test_creation_from_tensor 3.1541ms 80.7516μs 12.3837 KOps/s 12.2576 KOps/s $\color{#35bf28}+1.03\%$
test_add_one[memmap_tensor0] 0.4102ms 5.1425μs 194.4593 KOps/s 194.2104 KOps/s $\color{#35bf28}+0.13\%$
test_contiguous[memmap_tensor0] 13.0740μs 0.6407μs 1.5607 MOps/s 1.5950 MOps/s $\color{#d91a1a}-2.15\%$
test_stack[memmap_tensor0] 71.1420μs 3.4960μs 286.0436 KOps/s 293.1548 KOps/s $\color{#d91a1a}-2.43\%$
test_memmaptd_index 0.4047ms 0.1997ms 5.0064 KOps/s 5.0950 KOps/s $\color{#d91a1a}-1.74\%$
test_memmaptd_index_astensor 0.9717ms 0.2614ms 3.8259 KOps/s 3.8652 KOps/s $\color{#d91a1a}-1.02\%$
test_memmaptd_index_op 0.8038ms 0.5451ms 1.8346 KOps/s 1.7924 KOps/s $\color{#35bf28}+2.36\%$
test_serialize_model 0.1021s 98.9067ms 10.1105 Ops/s 8.9330 Ops/s $\textbf{\color{#35bf28}+13.18\%}$
test_serialize_model_pickle 0.4650s 0.3799s 2.6320 Ops/s 2.5846 Ops/s $\color{#35bf28}+1.83\%$
test_serialize_weights 0.1731s 0.1036s 9.6551 Ops/s 9.3534 Ops/s $\color{#35bf28}+3.22\%$
test_serialize_weights_returnearly 0.1798s 0.1288s 7.7658 Ops/s 7.3567 Ops/s $\textbf{\color{#35bf28}+5.56\%}$
test_serialize_weights_pickle 1.0858s 0.6132s 1.6307 Ops/s 1.5945 Ops/s $\color{#35bf28}+2.27\%$
test_serialize_weights_filesystem 0.1588s 95.6355ms 10.4564 Ops/s 10.6125 Ops/s $\color{#d91a1a}-1.47\%$
test_serialize_model_filesystem 98.1990ms 91.4000ms 10.9409 Ops/s 10.1290 Ops/s $\textbf{\color{#35bf28}+8.02\%}$
test_reshape_pytree 66.3530μs 23.1842μs 43.1328 KOps/s 42.4549 KOps/s $\color{#35bf28}+1.60\%$
test_reshape_td 62.5070μs 31.0382μs 32.2184 KOps/s 31.7562 KOps/s $\color{#35bf28}+1.46\%$
test_view_pytree 55.7640μs 23.0998μs 43.2904 KOps/s 43.8069 KOps/s $\color{#d91a1a}-1.18\%$
test_view_td 34.8550μs 4.8768μs 205.0533 KOps/s 201.5123 KOps/s $\color{#35bf28}+1.76\%$
test_unbind_pytree 60.6330μs 26.2771μs 38.0560 KOps/s 37.6472 KOps/s $\color{#35bf28}+1.09\%$
test_unbind_td 0.1181ms 54.8436μs 18.2337 KOps/s 18.1446 KOps/s $\color{#35bf28}+0.49\%$
test_split_pytree 74.1380μs 26.6978μs 37.4563 KOps/s 38.2040 KOps/s $\color{#d91a1a}-1.96\%$
test_split_td 0.5223ms 43.4233μs 23.0291 KOps/s 22.8524 KOps/s $\color{#35bf28}+0.77\%$
test_add_pytree 93.4240μs 31.7574μs 31.4887 KOps/s 31.3653 KOps/s $\color{#35bf28}+0.39\%$
test_add_td 0.1524ms 51.7880μs 19.3095 KOps/s 19.5074 KOps/s $\color{#d91a1a}-1.01\%$
test_distributed 0.2183ms 97.4719μs 10.2594 KOps/s 9.6886 KOps/s $\textbf{\color{#35bf28}+5.89\%}$
test_tdmodule 0.1122ms 23.1429μs 43.2099 KOps/s 41.8353 KOps/s $\color{#35bf28}+3.29\%$
test_tdmodule_dispatch 0.2270ms 42.4961μs 23.5316 KOps/s 23.1495 KOps/s $\color{#35bf28}+1.65\%$
test_tdseq 54.5520μs 26.4907μs 37.7491 KOps/s 36.7227 KOps/s $\color{#35bf28}+2.79\%$
test_tdseq_dispatch 0.1443ms 46.8900μs 21.3265 KOps/s 20.8761 KOps/s $\color{#35bf28}+2.16\%$
test_instantiation_functorch 1.5165ms 1.2998ms 769.3472 Ops/s 760.7691 Ops/s $\color{#35bf28}+1.13\%$
test_instantiation_td 1.6187ms 1.0275ms 973.2514 Ops/s 977.2524 Ops/s $\color{#d91a1a}-0.41\%$
test_exec_functorch 0.2939ms 0.1605ms 6.2293 KOps/s 6.3072 KOps/s $\color{#d91a1a}-1.24\%$
test_exec_functional_call 0.2941ms 0.1487ms 6.7242 KOps/s 6.8808 KOps/s $\color{#d91a1a}-2.28\%$
test_exec_td 0.2658ms 0.1433ms 6.9759 KOps/s 7.1467 KOps/s $\color{#d91a1a}-2.39\%$
test_exec_td_decorator 0.7582ms 0.1818ms 5.4997 KOps/s 5.6534 KOps/s $\color{#d91a1a}-2.72\%$
test_vmap_mlp_speed[True-True] 1.1650ms 0.8804ms 1.1359 KOps/s 1.1381 KOps/s $\color{#d91a1a}-0.20\%$
test_vmap_mlp_speed[True-False] 0.6894ms 0.4711ms 2.1229 KOps/s 2.1433 KOps/s $\color{#d91a1a}-0.95\%$
test_vmap_mlp_speed[False-True] 1.2347ms 0.7765ms 1.2878 KOps/s 1.3189 KOps/s $\color{#d91a1a}-2.36\%$
test_vmap_mlp_speed[False-False] 0.5563ms 0.3863ms 2.5886 KOps/s 2.6602 KOps/s $\color{#d91a1a}-2.69\%$
test_vmap_mlp_speed_decorator[True-True] 3.2518ms 2.4571ms 406.9840 Ops/s 377.5929 Ops/s $\textbf{\color{#35bf28}+7.78\%}$
test_vmap_mlp_speed_decorator[True-False] 0.8686ms 0.5242ms 1.9078 KOps/s 1.9273 KOps/s $\color{#d91a1a}-1.01\%$
test_vmap_mlp_speed_decorator[False-True] 79.1764ms 2.1393ms 467.4368 Ops/s 501.9978 Ops/s $\textbf{\color{#d91a1a}-6.88\%}$
test_vmap_mlp_speed_decorator[False-False] 0.6989ms 0.3994ms 2.5040 KOps/s 2.5507 KOps/s $\color{#d91a1a}-1.83\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 128. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}19$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1331ms 14.1728μs 70.5576 KOps/s 75.6005 KOps/s $\textbf{\color{#d91a1a}-6.67\%}$
test_plain_set_stack_nested 0.1400ms 0.1182ms 8.4604 KOps/s 8.6574 KOps/s $\color{#d91a1a}-2.28\%$
test_plain_set_nested_inplace 32.6000μs 15.3839μs 65.0032 KOps/s 68.8844 KOps/s $\textbf{\color{#d91a1a}-5.63\%}$
test_plain_set_stack_nested_inplace 0.1862ms 0.1446ms 6.9146 KOps/s 6.9789 KOps/s $\color{#d91a1a}-0.92\%$
test_items 24.0200μs 4.7008μs 212.7310 KOps/s 211.7866 KOps/s $\color{#35bf28}+0.45\%$
test_items_nested 0.3785ms 0.3396ms 2.9450 KOps/s 2.9682 KOps/s $\color{#d91a1a}-0.78\%$
test_items_nested_locked 0.3927ms 0.3398ms 2.9426 KOps/s 2.9510 KOps/s $\color{#d91a1a}-0.29\%$
test_items_nested_leaf 0.2209ms 0.1984ms 5.0400 KOps/s 5.0811 KOps/s $\color{#d91a1a}-0.81\%$
test_items_stack_nested 1.3855ms 1.2672ms 789.1410 Ops/s 768.6245 Ops/s $\color{#35bf28}+2.67\%$
test_items_stack_nested_leaf 1.2710ms 1.1086ms 902.0275 Ops/s 886.6226 Ops/s $\color{#35bf28}+1.74\%$
test_items_stack_nested_locked 1.0286ms 0.8871ms 1.1273 KOps/s 1.1050 KOps/s $\color{#35bf28}+2.02\%$
test_keys 24.4110μs 4.5835μs 218.1762 KOps/s 216.9873 KOps/s $\color{#35bf28}+0.55\%$
test_keys_nested 0.8489ms 95.2553μs 10.4981 KOps/s 10.6618 KOps/s $\color{#d91a1a}-1.53\%$
test_keys_nested_locked 0.1416ms 94.1783μs 10.6182 KOps/s 10.6983 KOps/s $\color{#d91a1a}-0.75\%$
test_keys_nested_leaf 0.1964ms 77.9798μs 12.8238 KOps/s 12.9729 KOps/s $\color{#d91a1a}-1.15\%$
test_keys_stack_nested 1.1761ms 1.1141ms 897.6200 Ops/s 886.7466 Ops/s $\color{#35bf28}+1.23\%$
test_keys_stack_nested_leaf 1.1421ms 1.0791ms 926.6861 Ops/s 903.3344 Ops/s $\color{#35bf28}+2.59\%$
test_keys_stack_nested_locked 0.7951ms 0.7035ms 1.4215 KOps/s 1.3932 KOps/s $\color{#35bf28}+2.03\%$
test_values 8.5003μs 1.8893μs 529.2931 KOps/s 529.8359 KOps/s $\color{#d91a1a}-0.10\%$
test_values_nested 78.7910μs 45.1076μs 22.1692 KOps/s 22.0144 KOps/s $\color{#35bf28}+0.70\%$
test_values_nested_locked 61.6910μs 47.4534μs 21.0733 KOps/s 20.9445 KOps/s $\color{#35bf28}+0.61\%$
test_values_nested_leaf 53.2510μs 39.4725μs 25.3341 KOps/s 25.2484 KOps/s $\color{#35bf28}+0.34\%$
test_values_stack_nested 0.9868ms 0.9263ms 1.0796 KOps/s 1.0493 KOps/s $\color{#35bf28}+2.88\%$
test_values_stack_nested_leaf 1.0227ms 0.9212ms 1.0856 KOps/s 1.0592 KOps/s $\color{#35bf28}+2.49\%$
test_values_stack_nested_locked 0.6559ms 0.5653ms 1.7691 KOps/s 1.7342 KOps/s $\color{#35bf28}+2.01\%$
test_membership 3.7060μs 0.9419μs 1.0617 MOps/s 919.2465 KOps/s $\textbf{\color{#35bf28}+15.49\%}$
test_membership_nested 11.1505μs 2.2367μs 447.0778 KOps/s 438.6079 KOps/s $\color{#35bf28}+1.93\%$
test_membership_nested_leaf 13.1105μs 2.1953μs 455.5275 KOps/s 457.5116 KOps/s $\color{#d91a1a}-0.43\%$
test_membership_stacked_nested 31.7210μs 11.0444μs 90.5434 KOps/s 92.2149 KOps/s $\color{#d91a1a}-1.81\%$
test_membership_stacked_nested_leaf 30.1810μs 11.2039μs 89.2547 KOps/s 91.8081 KOps/s $\color{#d91a1a}-2.78\%$
test_membership_nested_last 17.9490μs 4.7848μs 208.9971 KOps/s 211.5276 KOps/s $\color{#d91a1a}-1.20\%$
test_membership_nested_leaf_last 23.2500μs 4.8503μs 206.1742 KOps/s 209.9464 KOps/s $\color{#d91a1a}-1.80\%$
test_membership_stacked_nested_last 0.1677ms 0.1350ms 7.4059 KOps/s 7.3920 KOps/s $\color{#35bf28}+0.19\%$
test_membership_stacked_nested_leaf_last 36.6600μs 13.0503μs 76.6269 KOps/s 78.4232 KOps/s $\color{#d91a1a}-2.29\%$
test_nested_getleaf 49.6410μs 8.4721μs 118.0346 KOps/s 120.3244 KOps/s $\color{#d91a1a}-1.90\%$
test_nested_get 0.1950ms 7.9575μs 125.6674 KOps/s 127.2676 KOps/s $\color{#d91a1a}-1.26\%$
test_stacked_getleaf 0.5176ms 0.3151ms 3.1741 KOps/s 3.1507 KOps/s $\color{#35bf28}+0.74\%$
test_stacked_get 0.4920ms 0.2860ms 3.4960 KOps/s 3.4682 KOps/s $\color{#35bf28}+0.80\%$
test_nested_getitemleaf 0.2056ms 8.4662μs 118.1171 KOps/s 119.3899 KOps/s $\color{#d91a1a}-1.07\%$
test_nested_getitem 47.3920μs 7.9774μs 125.3547 KOps/s 126.2123 KOps/s $\color{#d91a1a}-0.68\%$
test_stacked_getitemleaf 0.5251ms 0.3137ms 3.1873 KOps/s 3.1395 KOps/s $\color{#35bf28}+1.52\%$
test_stacked_getitem 0.4905ms 0.2843ms 3.5171 KOps/s 3.5312 KOps/s $\color{#d91a1a}-0.40\%$
test_lock_nested 4.6507ms 0.4211ms 2.3745 KOps/s 2.4201 KOps/s $\color{#d91a1a}-1.88\%$
test_lock_stack_nested 84.1382ms 6.4732ms 154.4829 Ops/s 154.1364 Ops/s $\color{#35bf28}+0.22\%$
test_unlock_nested 0.8045ms 0.4096ms 2.4413 KOps/s 2.4389 KOps/s $\color{#35bf28}+0.10\%$
test_unlock_stack_nested 83.0853ms 6.8593ms 145.7884 Ops/s 145.4958 Ops/s $\color{#35bf28}+0.20\%$
test_flatten_speed 0.4963ms 0.2631ms 3.8006 KOps/s 3.7987 KOps/s $\color{#35bf28}+0.05\%$
test_unflatten_speed 0.5391ms 0.3516ms 2.8442 KOps/s 2.8206 KOps/s $\color{#35bf28}+0.84\%$
test_common_ops 1.0392ms 0.5922ms 1.6886 KOps/s 1.6935 KOps/s $\color{#d91a1a}-0.29\%$
test_creation 19.6600μs 1.5656μs 638.7213 KOps/s 629.4467 KOps/s $\color{#35bf28}+1.47\%$
test_creation_empty 36.9810μs 9.2121μs 108.5528 KOps/s 131.9425 KOps/s $\textbf{\color{#d91a1a}-17.73\%}$
test_creation_nested_1 0.2147ms 11.1678μs 89.5428 KOps/s 105.1924 KOps/s $\textbf{\color{#d91a1a}-14.88\%}$
test_creation_nested_2 38.9500μs 13.6286μs 73.3753 KOps/s 70.6287 KOps/s $\color{#35bf28}+3.89\%$
test_clone 0.1355ms 12.8384μs 77.8911 KOps/s 74.9827 KOps/s $\color{#35bf28}+3.88\%$
test_getitem[int] 0.2224ms 11.1672μs 89.5483 KOps/s 89.2292 KOps/s $\color{#35bf28}+0.36\%$
test_getitem[slice_int] 46.4210μs 20.7799μs 48.1234 KOps/s 47.9290 KOps/s $\color{#35bf28}+0.41\%$
test_getitem[range] 61.5510μs 35.3104μs 28.3203 KOps/s 28.1072 KOps/s $\color{#35bf28}+0.76\%$
test_getitem[tuple] 0.2290ms 18.6283μs 53.6818 KOps/s 53.5024 KOps/s $\color{#35bf28}+0.34\%$
test_getitem[list] 57.5210μs 31.8430μs 31.4040 KOps/s 30.3369 KOps/s $\color{#35bf28}+3.52\%$
test_setitem_dim[int] 43.7610μs 27.6357μs 36.1851 KOps/s 38.5296 KOps/s $\textbf{\color{#d91a1a}-6.09\%}$
test_setitem_dim[slice_int] 73.7710μs 48.6710μs 20.5461 KOps/s 21.7423 KOps/s $\textbf{\color{#d91a1a}-5.50\%}$
test_setitem_dim[range] 95.4320μs 61.7136μs 16.2039 KOps/s 16.7766 KOps/s $\color{#d91a1a}-3.41\%$
test_setitem_dim[tuple] 61.5510μs 42.8583μs 23.3327 KOps/s 25.2647 KOps/s $\textbf{\color{#d91a1a}-7.65\%}$
test_setitem 0.2142ms 18.2271μs 54.8633 KOps/s 58.3249 KOps/s $\textbf{\color{#d91a1a}-5.94\%}$
test_set 0.1280ms 17.5148μs 57.0947 KOps/s 60.4818 KOps/s $\textbf{\color{#d91a1a}-5.60\%}$
test_set_shared 2.6933ms 0.1001ms 9.9948 KOps/s 9.9486 KOps/s $\color{#35bf28}+0.46\%$
test_update 0.2447ms 20.1657μs 49.5891 KOps/s 52.6379 KOps/s $\textbf{\color{#d91a1a}-5.79\%}$
test_update_nested 0.1453ms 26.7801μs 37.3412 KOps/s 40.3925 KOps/s $\textbf{\color{#d91a1a}-7.55\%}$
test_set_nested 0.2367ms 18.6114μs 53.7306 KOps/s 55.5180 KOps/s $\color{#d91a1a}-3.22\%$
test_set_nested_new 0.1254ms 21.9051μs 45.6516 KOps/s 48.0688 KOps/s $\textbf{\color{#d91a1a}-5.03\%}$
test_select 0.2500ms 42.6711μs 23.4351 KOps/s 24.5786 KOps/s $\color{#d91a1a}-4.65\%$
test_to 93.6010μs 56.5308μs 17.6895 KOps/s 18.7841 KOps/s $\textbf{\color{#d91a1a}-5.83\%}$
test_to_nonblocking 67.7810μs 32.2609μs 30.9973 KOps/s 30.4862 KOps/s $\color{#35bf28}+1.68\%$
test_unbind_speed 0.3777ms 0.3240ms 3.0860 KOps/s 3.0472 KOps/s $\color{#35bf28}+1.27\%$
test_unbind_speed_stack0 79.5677ms 3.7558ms 266.2564 Ops/s 263.0733 Ops/s $\color{#35bf28}+1.21\%$
test_unbind_speed_stack1 1.4930μs 0.5300μs 1.8869 MOps/s 1.8728 MOps/s $\color{#35bf28}+0.75\%$
test_split 74.6590ms 1.7087ms 585.2569 Ops/s 576.9215 Ops/s $\color{#35bf28}+1.44\%$
test_chunk 1.6166ms 1.5644ms 639.2424 Ops/s 587.3949 Ops/s $\textbf{\color{#35bf28}+8.83\%}$
test_creation[device0] 0.1438ms 71.5275μs 13.9806 KOps/s 14.5282 KOps/s $\color{#d91a1a}-3.77\%$
test_creation_from_tensor 0.1303ms 55.6750μs 17.9614 KOps/s 19.3336 KOps/s $\textbf{\color{#d91a1a}-7.10\%}$
test_add_one[memmap_tensor0] 0.1202ms 6.6279μs 150.8783 KOps/s 149.2507 KOps/s $\color{#35bf28}+1.09\%$
test_contiguous[memmap_tensor0] 25.3810μs 0.6302μs 1.5867 MOps/s 1.5950 MOps/s $\color{#d91a1a}-0.52\%$
test_stack[memmap_tensor0] 37.8720μs 4.3904μs 227.7697 KOps/s 234.7277 KOps/s $\color{#d91a1a}-2.96\%$
test_memmaptd_index 0.2632ms 0.2301ms 4.3464 KOps/s 4.3325 KOps/s $\color{#35bf28}+0.32\%$
test_memmaptd_index_astensor 0.3197ms 0.2880ms 3.4722 KOps/s 3.4627 KOps/s $\color{#35bf28}+0.27\%$
test_memmaptd_index_op 0.7643ms 0.5802ms 1.7235 KOps/s 1.7946 KOps/s $\color{#d91a1a}-3.96\%$
test_serialize_model 0.1713s 96.2412ms 10.3906 Ops/s 9.6232 Ops/s $\textbf{\color{#35bf28}+7.97\%}$
test_serialize_model_pickle 1.3488s 1.2368s 0.8086 Ops/s 0.8065 Ops/s $\color{#35bf28}+0.26\%$
test_serialize_weights 88.9779ms 86.1719ms 11.6047 Ops/s 9.8866 Ops/s $\textbf{\color{#35bf28}+17.38\%}$
test_serialize_weights_returnearly 0.2500s 77.6090ms 12.8851 Ops/s 14.8368 Ops/s $\textbf{\color{#d91a1a}-13.15\%}$
test_serialize_weights_pickle 1.3519s 1.2451s 0.8032 Ops/s 0.8088 Ops/s $\color{#d91a1a}-0.69\%$
test_reshape_pytree 50.7810μs 23.9955μs 41.6746 KOps/s 41.9168 KOps/s $\color{#d91a1a}-0.58\%$
test_reshape_td 46.2310μs 27.7701μs 36.0099 KOps/s 35.1486 KOps/s $\color{#35bf28}+2.45\%$
test_view_pytree 45.7910μs 23.4807μs 42.5881 KOps/s 42.6964 KOps/s $\color{#d91a1a}-0.25\%$
test_view_td 21.5800μs 4.0309μs 248.0866 KOps/s 242.5299 KOps/s $\color{#35bf28}+2.29\%$
test_unbind_pytree 58.2110μs 29.7987μs 33.5585 KOps/s 33.2560 KOps/s $\color{#35bf28}+0.91\%$
test_unbind_td 76.1600μs 50.5159μs 19.7957 KOps/s 19.1499 KOps/s $\color{#35bf28}+3.37\%$
test_split_pytree 42.6610μs 27.3592μs 36.5507 KOps/s 36.0397 KOps/s $\color{#35bf28}+1.42\%$
test_split_td 0.6566ms 38.9585μs 25.6683 KOps/s 25.0382 KOps/s $\color{#35bf28}+2.52\%$
test_add_pytree 59.7820μs 34.9105μs 28.6447 KOps/s 28.3057 KOps/s $\color{#35bf28}+1.20\%$
test_add_td 67.7920μs 49.1326μs 20.3531 KOps/s 23.2013 KOps/s $\textbf{\color{#d91a1a}-12.28\%}$
test_distributed 0.2641ms 70.6544μs 14.1534 KOps/s 14.6141 KOps/s $\color{#d91a1a}-3.15\%$
test_tdmodule 0.1077ms 18.3946μs 54.3639 KOps/s 57.8624 KOps/s $\textbf{\color{#d91a1a}-6.05\%}$
test_tdmodule_dispatch 0.2680ms 35.5234μs 28.1505 KOps/s 30.1569 KOps/s $\textbf{\color{#d91a1a}-6.65\%}$
test_tdseq 0.1815ms 21.4833μs 46.5479 KOps/s 49.5362 KOps/s $\textbf{\color{#d91a1a}-6.03\%}$
test_tdseq_dispatch 0.1623ms 37.2535μs 26.8431 KOps/s 27.4810 KOps/s $\color{#d91a1a}-2.32\%$
test_instantiation_functorch 1.6961ms 1.6353ms 611.5133 Ops/s 609.3928 Ops/s $\color{#35bf28}+0.35\%$
test_instantiation_td 1.7098ms 1.1524ms 867.7469 Ops/s 865.4654 Ops/s $\color{#35bf28}+0.26\%$
test_exec_functorch 0.1864ms 0.1550ms 6.4504 KOps/s 6.4701 KOps/s $\color{#d91a1a}-0.30\%$
test_exec_functional_call 0.1803ms 0.1541ms 6.4880 KOps/s 6.4784 KOps/s $\color{#35bf28}+0.15\%$
test_exec_td 0.1767ms 0.1447ms 6.9127 KOps/s 6.7361 KOps/s $\color{#35bf28}+2.62\%$
test_exec_td_decorator 0.8928ms 0.1864ms 5.3640 KOps/s 5.3831 KOps/s $\color{#d91a1a}-0.36\%$
test_vmap_mlp_speed[True-True] 1.1581ms 1.0714ms 933.3856 Ops/s 925.8897 Ops/s $\color{#35bf28}+0.81\%$
test_vmap_mlp_speed[True-False] 0.7640ms 0.6435ms 1.5540 KOps/s 1.5340 KOps/s $\color{#35bf28}+1.30\%$
test_vmap_mlp_speed[False-True] 1.0979ms 0.9925ms 1.0076 KOps/s 993.7144 Ops/s $\color{#35bf28}+1.40\%$
test_vmap_mlp_speed[False-False] 0.6189ms 0.5748ms 1.7396 KOps/s 1.6832 KOps/s $\color{#35bf28}+3.35\%$
test_vmap_mlp_speed_decorator[True-True] 3.1181ms 2.4689ms 405.0393 Ops/s 406.2466 Ops/s $\color{#d91a1a}-0.30\%$
test_vmap_mlp_speed_decorator[True-False] 1.0199ms 0.6933ms 1.4423 KOps/s 1.4300 KOps/s $\color{#35bf28}+0.86\%$
test_vmap_mlp_speed_decorator[False-True] 2.5513ms 2.1267ms 470.2117 Ops/s 485.8885 Ops/s $\color{#d91a1a}-3.23\%$
test_vmap_mlp_speed_decorator[False-False] 0.9752ms 0.6150ms 1.6261 KOps/s 1.6670 KOps/s $\color{#d91a1a}-2.45\%$
test_vmap_transformer_speed[True-True] 12.7093ms 12.2856ms 81.3960 Ops/s 82.3253 Ops/s $\color{#d91a1a}-1.13\%$
test_vmap_transformer_speed[True-False] 8.4520ms 8.1461ms 122.7579 Ops/s 123.7864 Ops/s $\color{#d91a1a}-0.83\%$
test_vmap_transformer_speed[False-True] 13.3532ms 12.3299ms 81.1034 Ops/s 83.0202 Ops/s $\color{#d91a1a}-2.31\%$
test_vmap_transformer_speed[False-False] 8.4429ms 8.1203ms 123.1486 Ops/s 125.2735 Ops/s $\color{#d91a1a}-1.70\%$
test_vmap_transformer_speed_decorator[True-True] 0.1654s 82.4073ms 12.1348 Ops/s 12.4329 Ops/s $\color{#d91a1a}-2.40\%$
test_vmap_transformer_speed_decorator[True-False] 21.4881ms 19.7849ms 50.5437 Ops/s 51.4946 Ops/s $\color{#d91a1a}-1.85\%$
test_vmap_transformer_speed_decorator[False-True] 70.8819ms 68.6744ms 14.5615 Ops/s 14.9704 Ops/s $\color{#d91a1a}-2.73\%$
test_vmap_transformer_speed_decorator[False-False] 21.0174ms 19.3594ms 51.6546 Ops/s 47.9706 Ops/s $\textbf{\color{#35bf28}+7.68\%}$

@vmoens vmoens merged commit c85acfb into main Jan 15, 2024
44 of 45 checks passed
@vmoens vmoens deleted the faster_set branch January 15, 2024 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants