Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Remove remaining MemmapTensor references #617

Merged
merged 1 commit into from
Jan 11, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 11, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 11, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 120. Improved: $\large\color{#35bf28}27$. Worsened: $\large\color{#d91a1a}21$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 35.0550μs 17.4358μs 57.3531 KOps/s 51.2410 KOps/s $\textbf{\color{#35bf28}+11.93\%}$
test_plain_set_stack_nested 0.1786ms 0.1429ms 6.9978 KOps/s 5.8852 KOps/s $\textbf{\color{#35bf28}+18.91\%}$
test_plain_set_nested_inplace 70.6320μs 19.6088μs 50.9975 KOps/s 46.1499 KOps/s $\textbf{\color{#35bf28}+10.50\%}$
test_plain_set_stack_nested_inplace 0.3034ms 0.1781ms 5.6138 KOps/s 4.8964 KOps/s $\textbf{\color{#35bf28}+14.65\%}$
test_items 24.8170μs 2.5237μs 396.2438 KOps/s 403.8718 KOps/s $\color{#d91a1a}-1.89\%$
test_items_nested 0.4814ms 0.2716ms 3.6819 KOps/s 3.6164 KOps/s $\color{#35bf28}+1.81\%$
test_items_nested_locked 0.8371ms 0.2731ms 3.6613 KOps/s 3.6299 KOps/s $\color{#35bf28}+0.87\%$
test_items_nested_leaf 0.3004ms 0.1686ms 5.9321 KOps/s 5.6741 KOps/s $\color{#35bf28}+4.55\%$
test_items_stack_nested 1.5938ms 1.2971ms 770.9252 Ops/s 711.6564 Ops/s $\textbf{\color{#35bf28}+8.33\%}$
test_items_stack_nested_leaf 2.0675ms 1.1760ms 850.3254 Ops/s 781.4290 Ops/s $\textbf{\color{#35bf28}+8.82\%}$
test_items_stack_nested_locked 1.0133ms 0.7594ms 1.3168 KOps/s 1.2711 KOps/s $\color{#35bf28}+3.60\%$
test_keys 22.3420μs 3.9182μs 255.2168 KOps/s 230.9560 KOps/s $\textbf{\color{#35bf28}+10.50\%}$
test_keys_nested 49.1959ms 0.1549ms 6.4549 KOps/s 6.6356 KOps/s $\color{#d91a1a}-2.72\%$
test_keys_nested_locked 0.2698ms 0.1461ms 6.8435 KOps/s 6.6706 KOps/s $\color{#35bf28}+2.59\%$
test_keys_nested_leaf 0.3727ms 0.1296ms 7.7184 KOps/s 7.5606 KOps/s $\color{#35bf28}+2.09\%$
test_keys_stack_nested 1.9862ms 1.2816ms 780.2524 Ops/s 751.2071 Ops/s $\color{#35bf28}+3.87\%$
test_keys_stack_nested_leaf 2.4695ms 1.2785ms 782.1875 Ops/s 761.8828 Ops/s $\color{#35bf28}+2.67\%$
test_keys_stack_nested_locked 1.2264ms 0.7048ms 1.4189 KOps/s 1.3850 KOps/s $\color{#35bf28}+2.44\%$
test_values 9.0920μs 1.2186μs 820.6195 KOps/s 860.6233 KOps/s $\color{#d91a1a}-4.65\%$
test_values_nested 0.1002ms 52.0154μs 19.2251 KOps/s 17.5543 KOps/s $\textbf{\color{#35bf28}+9.52\%}$
test_values_nested_locked 99.3860μs 52.2437μs 19.1411 KOps/s 17.5721 KOps/s $\textbf{\color{#35bf28}+8.93\%}$
test_values_nested_leaf 94.0160μs 46.2761μs 21.6094 KOps/s 19.3702 KOps/s $\textbf{\color{#35bf28}+11.56\%}$
test_values_stack_nested 1.3096ms 1.0276ms 973.1730 Ops/s 916.3124 Ops/s $\textbf{\color{#35bf28}+6.21\%}$
test_values_stack_nested_leaf 1.8553ms 1.0438ms 958.0144 Ops/s 935.3696 Ops/s $\color{#35bf28}+2.42\%$
test_values_stack_nested_locked 0.9149ms 0.5049ms 1.9808 KOps/s 1.8751 KOps/s $\textbf{\color{#35bf28}+5.63\%}$
test_membership 14.2970μs 1.3837μs 722.6846 KOps/s 725.7866 KOps/s $\color{#d91a1a}-0.43\%$
test_membership_nested 33.8130μs 2.9078μs 343.9012 KOps/s 345.3888 KOps/s $\color{#d91a1a}-0.43\%$
test_membership_nested_leaf 27.0010μs 2.9091μs 343.7482 KOps/s 335.4862 KOps/s $\color{#35bf28}+2.46\%$
test_membership_stacked_nested 32.3300μs 12.2784μs 81.4436 KOps/s 85.3958 KOps/s $\color{#d91a1a}-4.63\%$
test_membership_stacked_nested_leaf 34.1140μs 12.2804μs 81.4308 KOps/s 84.7407 KOps/s $\color{#d91a1a}-3.91\%$
test_membership_nested_last 26.1390μs 6.0635μs 164.9218 KOps/s 158.1602 KOps/s $\color{#35bf28}+4.28\%$
test_membership_nested_leaf_last 41.3480μs 6.0970μs 164.0163 KOps/s 156.7274 KOps/s $\color{#35bf28}+4.65\%$
test_membership_stacked_nested_last 0.3593ms 0.1664ms 6.0086 KOps/s 5.1690 KOps/s $\textbf{\color{#35bf28}+16.24\%}$
test_membership_stacked_nested_leaf_last 48.7010μs 14.3600μs 69.6381 KOps/s 70.3213 KOps/s $\color{#d91a1a}-0.97\%$
test_nested_getleaf 42.3190μs 10.5971μs 94.3658 KOps/s 78.0961 KOps/s $\textbf{\color{#35bf28}+20.83\%}$
test_nested_get 44.0030μs 9.9284μs 100.7216 KOps/s 82.2386 KOps/s $\textbf{\color{#35bf28}+22.47\%}$
test_stacked_getleaf 0.8344ms 0.4620ms 2.1643 KOps/s 2.0329 KOps/s $\textbf{\color{#35bf28}+6.46\%}$
test_stacked_get 1.1265ms 0.4411ms 2.2670 KOps/s 2.1852 KOps/s $\color{#35bf28}+3.74\%$
test_nested_getitemleaf 36.4180μs 10.5549μs 94.7427 KOps/s 76.3426 KOps/s $\textbf{\color{#35bf28}+24.10\%}$
test_nested_getitem 41.9080μs 10.1239μs 98.7765 KOps/s 80.5141 KOps/s $\textbf{\color{#35bf28}+22.68\%}$
test_stacked_getitemleaf 1.0522ms 0.4663ms 2.1446 KOps/s 2.0501 KOps/s $\color{#35bf28}+4.61\%$
test_stacked_getitem 0.7677ms 0.4314ms 2.3178 KOps/s 2.1785 KOps/s $\textbf{\color{#35bf28}+6.39\%}$
test_lock_nested 1.2670ms 0.4133ms 2.4198 KOps/s 2.4052 KOps/s $\color{#35bf28}+0.61\%$
test_lock_stack_nested 81.6694ms 6.7161ms 148.8961 Ops/s 149.9854 Ops/s $\color{#d91a1a}-0.73\%$
test_unlock_nested 66.1321ms 0.4839ms 2.0668 KOps/s 2.3803 KOps/s $\textbf{\color{#d91a1a}-13.17\%}$
test_unlock_stack_nested 82.0057ms 6.2800ms 159.2365 Ops/s 159.1441 Ops/s $\color{#35bf28}+0.06\%$
test_flatten_speed 0.5782ms 0.3614ms 2.7668 KOps/s 2.5496 KOps/s $\textbf{\color{#35bf28}+8.52\%}$
test_unflatten_speed 0.5070ms 0.4547ms 2.1990 KOps/s 1.9923 KOps/s $\textbf{\color{#35bf28}+10.37\%}$
test_common_ops 1.3865ms 0.7313ms 1.3673 KOps/s 1.4366 KOps/s $\color{#d91a1a}-4.82\%$
test_creation 66.1840μs 2.0549μs 486.6345 KOps/s 504.3582 KOps/s $\color{#d91a1a}-3.51\%$
test_creation_empty 37.3800μs 11.5425μs 86.6364 KOps/s 106.5179 KOps/s $\textbf{\color{#d91a1a}-18.66\%}$
test_creation_nested_1 48.9730μs 14.3605μs 69.6353 KOps/s 80.3076 KOps/s $\textbf{\color{#d91a1a}-13.29\%}$
test_creation_nested_2 50.4350μs 19.7360μs 50.6689 KOps/s 57.4011 KOps/s $\textbf{\color{#d91a1a}-11.73\%}$
test_clone 0.1205ms 12.4475μs 80.3372 KOps/s 80.3885 KOps/s $\color{#d91a1a}-0.06\%$
test_getitem[int] 42.4300μs 11.8422μs 84.4436 KOps/s 83.6606 KOps/s $\color{#35bf28}+0.94\%$
test_getitem[slice_int] 75.8030μs 23.2819μs 42.9519 KOps/s 42.2946 KOps/s $\color{#35bf28}+1.55\%$
test_getitem[range] 0.1633ms 43.4690μs 23.0049 KOps/s 24.0774 KOps/s $\color{#d91a1a}-4.45\%$
test_getitem[tuple] 54.4020μs 18.9063μs 52.8925 KOps/s 52.6201 KOps/s $\color{#35bf28}+0.52\%$
test_getitem[list] 0.3826ms 37.4729μs 26.6860 KOps/s 26.7831 KOps/s $\color{#d91a1a}-0.36\%$
test_setitem_dim[int] 51.8680μs 33.1519μs 30.1642 KOps/s 33.1389 KOps/s $\textbf{\color{#d91a1a}-8.98\%}$
test_setitem_dim[slice_int] 0.1066ms 59.5109μs 16.8036 KOps/s 17.6191 KOps/s $\color{#d91a1a}-4.63\%$
test_setitem_dim[range] 0.1549ms 77.0533μs 12.9780 KOps/s 13.3954 KOps/s $\color{#d91a1a}-3.12\%$
test_setitem_dim[tuple] 76.9750μs 48.0551μs 20.8095 KOps/s 22.2382 KOps/s $\textbf{\color{#d91a1a}-6.42\%}$
test_setitem 0.2329ms 19.7253μs 50.6964 KOps/s 55.3121 KOps/s $\textbf{\color{#d91a1a}-8.34\%}$
test_set 0.1851ms 19.1790μs 52.1405 KOps/s 57.8107 KOps/s $\textbf{\color{#d91a1a}-9.81\%}$
test_set_shared 1.8844ms 0.1385ms 7.2190 KOps/s 7.3578 KOps/s $\color{#d91a1a}-1.89\%$
test_update 0.1436ms 23.1414μs 43.2126 KOps/s 48.7675 KOps/s $\textbf{\color{#d91a1a}-11.39\%}$
test_update_nested 0.1562ms 30.2992μs 33.0042 KOps/s 35.9256 KOps/s $\textbf{\color{#d91a1a}-8.13\%}$
test_set_nested 0.1652ms 21.0269μs 47.5582 KOps/s 51.4328 KOps/s $\textbf{\color{#d91a1a}-7.53\%}$
test_set_nested_new 0.1628ms 25.2016μs 39.6800 KOps/s 41.8945 KOps/s $\textbf{\color{#d91a1a}-5.29\%}$
test_select 0.1476ms 49.0685μs 20.3797 KOps/s 20.6415 KOps/s $\color{#d91a1a}-1.27\%$
test_unbind_speed 0.4116ms 0.3417ms 2.9268 KOps/s 2.9514 KOps/s $\color{#d91a1a}-0.84\%$
test_unbind_speed_stack0 72.0301ms 4.4387ms 225.2900 Ops/s 254.4428 Ops/s $\textbf{\color{#d91a1a}-11.46\%}$
test_unbind_speed_stack1 2.6119μs 0.6363μs 1.5717 MOps/s 1.5378 MOps/s $\color{#35bf28}+2.21\%$
test_split 71.2054ms 1.7020ms 587.5309 Ops/s 586.0562 Ops/s $\color{#35bf28}+0.25\%$
test_chunk 67.4394ms 1.6659ms 600.2718 Ops/s 599.6722 Ops/s $\color{#35bf28}+0.10\%$
test_creation[device0] 0.2347ms 0.1014ms 9.8591 KOps/s 3.3984 KOps/s $\textbf{\color{#35bf28}+190.11\%}$
test_creation_from_tensor 3.1021ms 82.2527μs 12.1577 KOps/s 3.0016 KOps/s $\textbf{\color{#35bf28}+305.03\%}$
test_add_one[memmap_tensor0] 0.2231ms 5.2967μs 188.7977 KOps/s 40.1514 KOps/s $\textbf{\color{#35bf28}+370.21\%}$
test_contiguous[memmap_tensor0] 22.2020μs 0.6580μs 1.5198 MOps/s 175.3154 KOps/s $\textbf{\color{#35bf28}+766.92\%}$
test_stack[memmap_tensor0] 53.1500μs 3.5774μs 279.5329 KOps/s 51.8741 KOps/s $\textbf{\color{#35bf28}+438.87\%}$
test_memmaptd_index 0.3644ms 0.1966ms 5.0871 KOps/s 4.9347 KOps/s $\color{#35bf28}+3.09\%$
test_memmaptd_index_astensor 0.3360ms 0.2549ms 3.9231 KOps/s 3.8337 KOps/s $\color{#35bf28}+2.33\%$
test_memmaptd_index_op 0.9697ms 0.5585ms 1.7906 KOps/s 1.9109 KOps/s $\textbf{\color{#d91a1a}-6.30\%}$
test_serialize_model 0.1654s 0.1071s 9.3345 Ops/s 9.8982 Ops/s $\textbf{\color{#d91a1a}-5.70\%}$
test_serialize_model_pickle 0.4533s 0.3776s 2.6484 Ops/s 2.5934 Ops/s $\color{#35bf28}+2.12\%$
test_serialize_weights 0.1564s 0.1031s 9.6973 Ops/s 9.2685 Ops/s $\color{#35bf28}+4.63\%$
test_serialize_weights_returnearly 0.1960s 0.1279s 7.8215 Ops/s 7.7319 Ops/s $\color{#35bf28}+1.16\%$
test_serialize_weights_pickle 1.0390s 0.6126s 1.6324 Ops/s 2.3810 Ops/s $\textbf{\color{#d91a1a}-31.44\%}$
test_serialize_weights_filesystem 0.1034s 90.6308ms 11.0338 Ops/s 10.7314 Ops/s $\color{#35bf28}+2.82\%$
test_serialize_model_filesystem 96.8395ms 90.4040ms 11.0615 Ops/s 10.0091 Ops/s $\textbf{\color{#35bf28}+10.51\%}$
test_reshape_pytree 52.7190μs 22.9046μs 43.6593 KOps/s 42.5669 KOps/s $\color{#35bf28}+2.57\%$
test_reshape_td 76.2740μs 30.1857μs 33.1283 KOps/s 33.5828 KOps/s $\color{#d91a1a}-1.35\%$
test_view_pytree 90.9490μs 22.9793μs 43.5174 KOps/s 43.1048 KOps/s $\color{#35bf28}+0.96\%$
test_view_td 31.7300μs 4.9839μs 200.6458 KOps/s 207.3855 KOps/s $\color{#d91a1a}-3.25\%$
test_unbind_pytree 64.3320μs 26.1802μs 38.1968 KOps/s 37.6745 KOps/s $\color{#35bf28}+1.39\%$
test_unbind_td 0.1282ms 55.7641μs 17.9327 KOps/s 18.1435 KOps/s $\color{#d91a1a}-1.16\%$
test_split_pytree 63.5500μs 25.9565μs 38.5260 KOps/s 37.9279 KOps/s $\color{#35bf28}+1.58\%$
test_split_td 0.5260ms 43.3606μs 23.0624 KOps/s 22.8717 KOps/s $\color{#35bf28}+0.83\%$
test_add_pytree 76.1930μs 32.0685μs 31.1832 KOps/s 31.2103 KOps/s $\color{#d91a1a}-0.09\%$
test_add_td 0.1175ms 50.0411μs 19.9836 KOps/s 21.9120 KOps/s $\textbf{\color{#d91a1a}-8.80\%}$
test_distributed 0.1766ms 0.1006ms 9.9377 KOps/s 161.3555 KOps/s $\textbf{\color{#d91a1a}-93.84\%}$
test_tdmodule 0.1819ms 24.4152μs 40.9580 KOps/s 46.1842 KOps/s $\textbf{\color{#d91a1a}-11.32\%}$
test_tdmodule_dispatch 0.2289ms 43.6435μs 22.9129 KOps/s 24.6396 KOps/s $\textbf{\color{#d91a1a}-7.01\%}$
test_tdseq 0.1221ms 26.3129μs 38.0041 KOps/s 38.8152 KOps/s $\color{#d91a1a}-2.09\%$
test_tdseq_dispatch 0.1415ms 47.7083μs 20.9607 KOps/s 22.1688 KOps/s $\textbf{\color{#d91a1a}-5.45\%}$
test_instantiation_functorch 1.5476ms 1.2807ms 780.8343 Ops/s 774.2048 Ops/s $\color{#35bf28}+0.86\%$
test_instantiation_td 1.5405ms 0.9958ms 1.0043 KOps/s 1.0062 KOps/s $\color{#d91a1a}-0.19\%$
test_exec_functorch 0.8170ms 0.1585ms 6.3104 KOps/s 6.2715 KOps/s $\color{#35bf28}+0.62\%$
test_exec_functional_call 0.2701ms 0.1474ms 6.7828 KOps/s 6.9190 KOps/s $\color{#d91a1a}-1.97\%$
test_exec_td 0.3380ms 0.1435ms 6.9672 KOps/s 7.1054 KOps/s $\color{#d91a1a}-1.94\%$
test_exec_td_decorator 0.8698ms 0.1813ms 5.5147 KOps/s 5.5925 KOps/s $\color{#d91a1a}-1.39\%$
test_vmap_mlp_speed[True-True] 1.8065ms 0.9204ms 1.0865 KOps/s 1.1110 KOps/s $\color{#d91a1a}-2.21\%$
test_vmap_mlp_speed[True-False] 0.6623ms 0.4775ms 2.0942 KOps/s 2.0746 KOps/s $\color{#35bf28}+0.94\%$
test_vmap_mlp_speed[False-True] 1.3731ms 0.7863ms 1.2718 KOps/s 1.2137 KOps/s $\color{#35bf28}+4.78\%$
test_vmap_mlp_speed[False-False] 0.4828ms 0.3875ms 2.5807 KOps/s 2.5292 KOps/s $\color{#35bf28}+2.04\%$
test_vmap_mlp_speed_decorator[True-True] 2.4573ms 1.8077ms 553.2006 Ops/s 558.7527 Ops/s $\color{#d91a1a}-0.99\%$
test_vmap_mlp_speed_decorator[True-False] 0.9065ms 0.5276ms 1.8952 KOps/s 1.8829 KOps/s $\color{#35bf28}+0.65\%$
test_vmap_mlp_speed_decorator[False-True] 1.8582ms 1.5036ms 665.0845 Ops/s 668.2032 Ops/s $\color{#d91a1a}-0.47\%$
test_vmap_mlp_speed_decorator[False-False] 0.7958ms 0.4051ms 2.4687 KOps/s 2.4592 KOps/s $\color{#35bf28}+0.39\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 128. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}17$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1242ms 14.2178μs 70.3343 KOps/s 74.3991 KOps/s $\textbf{\color{#d91a1a}-5.46\%}$
test_plain_set_stack_nested 0.1521ms 0.1188ms 8.4177 KOps/s 8.5280 KOps/s $\color{#d91a1a}-1.29\%$
test_plain_set_nested_inplace 44.9000μs 15.4718μs 64.6336 KOps/s 67.7587 KOps/s $\color{#d91a1a}-4.61\%$
test_plain_set_stack_nested_inplace 0.1712ms 0.1456ms 6.8673 KOps/s 6.9564 KOps/s $\color{#d91a1a}-1.28\%$
test_items 31.5600μs 4.7141μs 212.1274 KOps/s 212.0289 KOps/s $\color{#35bf28}+0.05\%$
test_items_nested 0.3951ms 0.3410ms 2.9322 KOps/s 2.9376 KOps/s $\color{#d91a1a}-0.18\%$
test_items_nested_locked 0.3968ms 0.3425ms 2.9200 KOps/s 2.9208 KOps/s $\color{#d91a1a}-0.03\%$
test_items_nested_leaf 0.2488ms 0.1995ms 5.0129 KOps/s 4.9613 KOps/s $\color{#35bf28}+1.04\%$
test_items_stack_nested 1.4401ms 1.3340ms 749.6371 Ops/s 755.3319 Ops/s $\color{#d91a1a}-0.75\%$
test_items_stack_nested_leaf 1.3076ms 1.1625ms 860.2323 Ops/s 866.4579 Ops/s $\color{#d91a1a}-0.72\%$
test_items_stack_nested_locked 0.9426ms 0.8312ms 1.2031 KOps/s 1.2058 KOps/s $\color{#d91a1a}-0.22\%$
test_keys 27.3100μs 4.5605μs 219.2761 KOps/s 220.2920 KOps/s $\color{#d91a1a}-0.46\%$
test_keys_nested 0.9003ms 95.4449μs 10.4772 KOps/s 10.6398 KOps/s $\color{#d91a1a}-1.53\%$
test_keys_nested_locked 0.1228ms 94.9329μs 10.5338 KOps/s 10.6940 KOps/s $\color{#d91a1a}-1.50\%$
test_keys_nested_leaf 0.1827ms 77.9650μs 12.8263 KOps/s 12.9551 KOps/s $\color{#d91a1a}-0.99\%$
test_keys_stack_nested 1.2805ms 1.1688ms 855.5734 Ops/s 870.3791 Ops/s $\color{#d91a1a}-1.70\%$
test_keys_stack_nested_leaf 1.3096ms 1.1468ms 871.9839 Ops/s 883.4590 Ops/s $\color{#d91a1a}-1.30\%$
test_keys_stack_nested_locked 0.7632ms 0.6518ms 1.5341 KOps/s 1.5530 KOps/s $\color{#d91a1a}-1.22\%$
test_values 9.0900μs 1.8876μs 529.7757 KOps/s 522.1870 KOps/s $\color{#35bf28}+1.45\%$
test_values_nested 66.7710μs 45.0868μs 22.1794 KOps/s 22.2326 KOps/s $\color{#d91a1a}-0.24\%$
test_values_nested_locked 80.2010μs 47.2368μs 21.1699 KOps/s 21.2002 KOps/s $\color{#d91a1a}-0.14\%$
test_values_nested_leaf 58.6910μs 39.4823μs 25.3278 KOps/s 25.5406 KOps/s $\color{#d91a1a}-0.83\%$
test_values_stack_nested 1.1607ms 0.9762ms 1.0243 KOps/s 1.0309 KOps/s $\color{#d91a1a}-0.64\%$
test_values_stack_nested_leaf 1.1127ms 0.9735ms 1.0272 KOps/s 1.0413 KOps/s $\color{#d91a1a}-1.35\%$
test_values_stack_nested_locked 0.6062ms 0.5093ms 1.9636 KOps/s 1.9753 KOps/s $\color{#d91a1a}-0.60\%$
test_membership 24.9500μs 1.0487μs 953.5669 KOps/s 937.5107 KOps/s $\color{#35bf28}+1.71\%$
test_membership_nested 31.1200μs 2.2818μs 438.2455 KOps/s 432.8743 KOps/s $\color{#35bf28}+1.24\%$
test_membership_nested_leaf 15.1400μs 2.2060μs 453.3166 KOps/s 452.4595 KOps/s $\color{#35bf28}+0.19\%$
test_membership_stacked_nested 40.6210μs 11.3106μs 88.4123 KOps/s 91.7716 KOps/s $\color{#d91a1a}-3.66\%$
test_membership_stacked_nested_leaf 51.1610μs 11.3261μs 88.2917 KOps/s 91.5352 KOps/s $\color{#d91a1a}-3.54\%$
test_membership_nested_last 23.5700μs 4.6227μs 216.3231 KOps/s 215.1896 KOps/s $\color{#35bf28}+0.53\%$
test_membership_nested_leaf_last 37.1610μs 4.6687μs 214.1939 KOps/s 215.1911 KOps/s $\color{#d91a1a}-0.46\%$
test_membership_stacked_nested_last 0.1841ms 0.1365ms 7.3258 KOps/s 7.3781 KOps/s $\color{#d91a1a}-0.71\%$
test_membership_stacked_nested_leaf_last 30.7110μs 13.0459μs 76.6527 KOps/s 78.6713 KOps/s $\color{#d91a1a}-2.57\%$
test_nested_getleaf 32.3200μs 8.3789μs 119.3468 KOps/s 118.9959 KOps/s $\color{#35bf28}+0.29\%$
test_nested_get 30.9000μs 7.8852μs 126.8196 KOps/s 126.5225 KOps/s $\color{#35bf28}+0.23\%$
test_stacked_getleaf 0.4902ms 0.3925ms 2.5478 KOps/s 2.6053 KOps/s $\color{#d91a1a}-2.21\%$
test_stacked_get 0.4825ms 0.3637ms 2.7492 KOps/s 2.8251 KOps/s $\color{#d91a1a}-2.69\%$
test_nested_getitemleaf 24.3500μs 8.4402μs 118.4803 KOps/s 118.9812 KOps/s $\color{#d91a1a}-0.42\%$
test_nested_getitem 28.2510μs 7.9705μs 125.4634 KOps/s 126.1935 KOps/s $\color{#d91a1a}-0.58\%$
test_stacked_getitemleaf 0.4564ms 0.3879ms 2.5780 KOps/s 2.5935 KOps/s $\color{#d91a1a}-0.60\%$
test_stacked_getitem 0.4256ms 0.3561ms 2.8082 KOps/s 2.8199 KOps/s $\color{#d91a1a}-0.42\%$
test_lock_nested 7.3258ms 0.4271ms 2.3414 KOps/s 2.4208 KOps/s $\color{#d91a1a}-3.28\%$
test_lock_stack_nested 84.2158ms 6.5897ms 151.7514 Ops/s 154.1498 Ops/s $\color{#d91a1a}-1.56\%$
test_unlock_nested 0.8285ms 0.4156ms 2.4064 KOps/s 2.4476 KOps/s $\color{#d91a1a}-1.68\%$
test_unlock_stack_nested 84.1000ms 6.8993ms 144.9425 Ops/s 145.4550 Ops/s $\color{#d91a1a}-0.35\%$
test_flatten_speed 76.3223ms 0.2853ms 3.5049 KOps/s 3.8248 KOps/s $\textbf{\color{#d91a1a}-8.37\%}$
test_unflatten_speed 0.4191ms 0.3506ms 2.8525 KOps/s 2.8373 KOps/s $\color{#35bf28}+0.54\%$
test_common_ops 1.1714ms 0.6334ms 1.5788 KOps/s 1.6474 KOps/s $\color{#d91a1a}-4.16\%$
test_creation 32.3910μs 1.6109μs 620.7758 KOps/s 616.6727 KOps/s $\color{#35bf28}+0.67\%$
test_creation_empty 40.6810μs 9.4620μs 105.6862 KOps/s 123.2467 KOps/s $\textbf{\color{#d91a1a}-14.25\%}$
test_creation_nested_1 27.4400μs 11.3014μs 88.4848 KOps/s 99.5669 KOps/s $\textbf{\color{#d91a1a}-11.13\%}$
test_creation_nested_2 41.3300μs 15.9888μs 62.5436 KOps/s 68.7730 KOps/s $\textbf{\color{#d91a1a}-9.06\%}$
test_clone 0.1119ms 13.1249μs 76.1910 KOps/s 77.6837 KOps/s $\color{#d91a1a}-1.92\%$
test_getitem[int] 29.0710μs 11.1683μs 89.5393 KOps/s 88.7663 KOps/s $\color{#35bf28}+0.87\%$
test_getitem[slice_int] 44.1610μs 21.5163μs 46.4765 KOps/s 46.8514 KOps/s $\color{#d91a1a}-0.80\%$
test_getitem[range] 66.3910μs 37.1012μs 26.9533 KOps/s 26.5737 KOps/s $\color{#35bf28}+1.43\%$
test_getitem[tuple] 45.8910μs 18.8592μs 53.0244 KOps/s 53.1224 KOps/s $\color{#d91a1a}-0.18\%$
test_getitem[list] 0.3157ms 33.5392μs 29.8158 KOps/s 29.7581 KOps/s $\color{#35bf28}+0.19\%$
test_setitem_dim[int] 48.0910μs 28.9391μs 34.5553 KOps/s 37.3090 KOps/s $\textbf{\color{#d91a1a}-7.38\%}$
test_setitem_dim[slice_int] 66.9910μs 50.2512μs 19.9000 KOps/s 20.5221 KOps/s $\color{#d91a1a}-3.03\%$
test_setitem_dim[range] 85.8320μs 64.6772μs 15.4614 KOps/s 15.8814 KOps/s $\color{#d91a1a}-2.64\%$
test_setitem_dim[tuple] 69.1010μs 43.2889μs 23.1006 KOps/s 23.5468 KOps/s $\color{#d91a1a}-1.89\%$
test_setitem 0.1180ms 18.4045μs 54.3346 KOps/s 57.4688 KOps/s $\textbf{\color{#d91a1a}-5.45\%}$
test_set 0.1258ms 17.6807μs 56.5589 KOps/s 59.5017 KOps/s $\color{#d91a1a}-4.95\%$
test_set_shared 2.8026ms 0.1024ms 9.7629 KOps/s 9.9588 KOps/s $\color{#d91a1a}-1.97\%$
test_update 0.1167ms 21.7653μs 45.9448 KOps/s 50.9639 KOps/s $\textbf{\color{#d91a1a}-9.85\%}$
test_update_nested 0.1374ms 27.6824μs 36.1241 KOps/s 39.4553 KOps/s $\textbf{\color{#d91a1a}-8.44\%}$
test_set_nested 0.1141ms 19.2148μs 52.0431 KOps/s 54.4017 KOps/s $\color{#d91a1a}-4.34\%$
test_set_nested_new 0.1140ms 22.1012μs 45.2464 KOps/s 47.6790 KOps/s $\textbf{\color{#d91a1a}-5.10\%}$
test_select 73.6910μs 43.1306μs 23.1854 KOps/s 23.0652 KOps/s $\color{#35bf28}+0.52\%$
test_to 75.8210μs 54.9408μs 18.2014 KOps/s 18.4261 KOps/s $\color{#d91a1a}-1.22\%$
test_to_nonblocking 61.4310μs 35.2853μs 28.3404 KOps/s 29.5428 KOps/s $\color{#d91a1a}-4.07\%$
test_unbind_speed 0.3793ms 0.3267ms 3.0610 KOps/s 3.0592 KOps/s $\color{#35bf28}+0.06\%$
test_unbind_speed_stack0 78.2384ms 4.0453ms 247.2016 Ops/s 264.5349 Ops/s $\textbf{\color{#d91a1a}-6.55\%}$
test_unbind_speed_stack1 1.3825μs 0.5336μs 1.8740 MOps/s 1.8980 MOps/s $\color{#d91a1a}-1.27\%$
test_split 1.8486ms 1.5612ms 640.5229 Ops/s 582.8217 Ops/s $\textbf{\color{#35bf28}+9.90\%}$
test_chunk 74.4035ms 1.6704ms 598.6769 Ops/s 591.5384 Ops/s $\color{#35bf28}+1.21\%$
test_creation[device0] 0.1214ms 71.1102μs 14.0627 KOps/s 3.2200 KOps/s $\textbf{\color{#35bf28}+336.73\%}$
test_creation_from_tensor 0.1412ms 54.9767μs 18.1895 KOps/s 2.9461 KOps/s $\textbf{\color{#35bf28}+517.41\%}$
test_add_one[memmap_tensor0] 92.0710μs 7.2951μs 137.0785 KOps/s 42.2787 KOps/s $\textbf{\color{#35bf28}+224.23\%}$
test_contiguous[memmap_tensor0] 23.5800μs 0.6399μs 1.5628 MOps/s 167.7868 KOps/s $\textbf{\color{#35bf28}+831.40\%}$
test_stack[memmap_tensor0] 28.5000μs 4.4611μs 224.1605 KOps/s 51.9950 KOps/s $\textbf{\color{#35bf28}+331.12\%}$
test_memmaptd_index 0.2905ms 0.2385ms 4.1926 KOps/s 4.1267 KOps/s $\color{#35bf28}+1.60\%$
test_memmaptd_index_astensor 0.3197ms 0.2960ms 3.3782 KOps/s 3.3180 KOps/s $\color{#35bf28}+1.81\%$
test_memmaptd_index_op 0.7300ms 0.6131ms 1.6310 KOps/s 1.6903 KOps/s $\color{#d91a1a}-3.51\%$
test_serialize_model 0.1659s 96.6157ms 10.3503 Ops/s 10.8645 Ops/s $\color{#d91a1a}-4.73\%$
test_serialize_model_pickle 1.3609s 1.2383s 0.8076 Ops/s 0.8079 Ops/s $\color{#d91a1a}-0.04\%$
test_serialize_weights 0.1652s 94.8602ms 10.5418 Ops/s 10.0161 Ops/s $\textbf{\color{#35bf28}+5.25\%}$
test_serialize_weights_returnearly 0.2785s 80.3234ms 12.4497 Ops/s 14.6748 Ops/s $\textbf{\color{#d91a1a}-15.16\%}$
test_serialize_weights_pickle 1.4134s 1.2441s 0.8038 Ops/s 0.8040 Ops/s $\color{#d91a1a}-0.03\%$
test_reshape_pytree 44.0110μs 24.8492μs 40.2428 KOps/s 41.1709 KOps/s $\color{#d91a1a}-2.25\%$
test_reshape_td 50.9110μs 29.3174μs 34.1094 KOps/s 34.4755 KOps/s $\color{#d91a1a}-1.06\%$
test_view_pytree 0.2468ms 24.4783μs 40.8526 KOps/s 41.8544 KOps/s $\color{#d91a1a}-2.39\%$
test_view_td 18.1400μs 4.0787μs 245.1789 KOps/s 234.5332 KOps/s $\color{#35bf28}+4.54\%$
test_unbind_pytree 48.4700μs 30.1246μs 33.1955 KOps/s 32.0341 KOps/s $\color{#35bf28}+3.63\%$
test_unbind_td 0.2849ms 52.3718μs 19.0943 KOps/s 18.9902 KOps/s $\color{#35bf28}+0.55\%$
test_split_pytree 0.1584ms 28.8209μs 34.6971 KOps/s 35.3450 KOps/s $\color{#d91a1a}-1.83\%$
test_split_td 0.7237ms 41.3713μs 24.1714 KOps/s 25.5693 KOps/s $\textbf{\color{#d91a1a}-5.47\%}$
test_add_pytree 71.2820μs 36.7088μs 27.2414 KOps/s 28.0968 KOps/s $\color{#d91a1a}-3.04\%$
test_add_td 0.2772ms 48.4766μs 20.6285 KOps/s 22.5911 KOps/s $\textbf{\color{#d91a1a}-8.69\%}$
test_distributed 0.2465ms 70.4120μs 14.2021 KOps/s 179.3096 KOps/s $\textbf{\color{#d91a1a}-92.08\%}$
test_tdmodule 33.9600μs 18.3575μs 54.4737 KOps/s 57.3514 KOps/s $\textbf{\color{#d91a1a}-5.02\%}$
test_tdmodule_dispatch 0.1262ms 36.0654μs 27.7274 KOps/s 28.9203 KOps/s $\color{#d91a1a}-4.12\%$
test_tdseq 37.2410μs 21.4944μs 46.5237 KOps/s 47.9923 KOps/s $\color{#d91a1a}-3.06\%$
test_tdseq_dispatch 55.2210μs 38.8016μs 25.7721 KOps/s 27.0535 KOps/s $\color{#d91a1a}-4.74\%$
test_instantiation_functorch 1.9058ms 1.6975ms 589.0915 Ops/s 600.0370 Ops/s $\color{#d91a1a}-1.82\%$
test_instantiation_td 1.7535ms 1.1784ms 848.5921 Ops/s 856.5132 Ops/s $\color{#d91a1a}-0.92\%$
test_exec_functorch 0.2001ms 0.1627ms 6.1460 KOps/s 6.3948 KOps/s $\color{#d91a1a}-3.89\%$
test_exec_functional_call 0.3701ms 0.1635ms 6.1156 KOps/s 6.2846 KOps/s $\color{#d91a1a}-2.69\%$
test_exec_td 0.1793ms 0.1530ms 6.5368 KOps/s 6.6901 KOps/s $\color{#d91a1a}-2.29\%$
test_exec_td_decorator 0.9854ms 0.1929ms 5.1835 KOps/s 5.2796 KOps/s $\color{#d91a1a}-1.82\%$
test_vmap_mlp_speed[True-True] 1.3473ms 1.1204ms 892.5029 Ops/s 902.0118 Ops/s $\color{#d91a1a}-1.05\%$
test_vmap_mlp_speed[True-False] 0.9422ms 0.6663ms 1.5007 KOps/s 1.5183 KOps/s $\color{#d91a1a}-1.16\%$
test_vmap_mlp_speed[False-True] 1.3361ms 1.0741ms 931.0048 Ops/s 987.1367 Ops/s $\textbf{\color{#d91a1a}-5.69\%}$
test_vmap_mlp_speed[False-False] 0.8188ms 0.5947ms 1.6816 KOps/s 1.6917 KOps/s $\color{#d91a1a}-0.60\%$
test_vmap_mlp_speed_decorator[True-True] 2.9166ms 2.1375ms 467.8312 Ops/s 485.6969 Ops/s $\color{#d91a1a}-3.68\%$
test_vmap_mlp_speed_decorator[True-False] 1.5609ms 0.7170ms 1.3947 KOps/s 1.4057 KOps/s $\color{#d91a1a}-0.79\%$
test_vmap_mlp_speed_decorator[False-True] 2.1561ms 1.8096ms 552.6027 Ops/s 558.7127 Ops/s $\color{#d91a1a}-1.09\%$
test_vmap_mlp_speed_decorator[False-False] 0.9986ms 0.6135ms 1.6301 KOps/s 1.6494 KOps/s $\color{#d91a1a}-1.17\%$
test_vmap_transformer_speed[True-True] 12.4771ms 12.3035ms 81.2777 Ops/s 81.9607 Ops/s $\color{#d91a1a}-0.83\%$
test_vmap_transformer_speed[True-False] 8.2639ms 8.2044ms 121.8857 Ops/s 118.6552 Ops/s $\color{#35bf28}+2.72\%$
test_vmap_transformer_speed[False-True] 12.4277ms 12.1832ms 82.0802 Ops/s 79.0313 Ops/s $\color{#35bf28}+3.86\%$
test_vmap_transformer_speed[False-False] 8.2003ms 8.1382ms 122.8770 Ops/s 121.0406 Ops/s $\color{#35bf28}+1.52\%$
test_vmap_transformer_speed_decorator[True-True] 63.9766ms 63.1153ms 15.8440 Ops/s 15.4040 Ops/s $\color{#35bf28}+2.86\%$
test_vmap_transformer_speed_decorator[True-False] 21.6184ms 19.8688ms 50.3302 Ops/s 49.6639 Ops/s $\color{#35bf28}+1.34\%$
test_vmap_transformer_speed_decorator[False-True] 60.3447ms 57.6972ms 17.3319 Ops/s 15.4291 Ops/s $\textbf{\color{#35bf28}+12.33\%}$
test_vmap_transformer_speed_decorator[False-False] 21.8250ms 19.4765ms 51.3440 Ops/s 50.6056 Ops/s $\color{#35bf28}+1.46\%$

@vmoens vmoens merged commit 3f977c6 into main Jan 11, 2024
47 checks passed
@vmoens vmoens deleted the remove-memmap-refs branch January 11, 2024 15:46
@vmoens vmoens added the Refactor Refactoring code - not a new feature label Jan 11, 2024
vmoens referenced this pull request Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Refactor Refactoring code - not a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants