Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] online edition of memory mapped tensordicts #775

Merged
merged 4 commits into from
May 14, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented May 13, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 13, 2024
@vmoens vmoens added the enhancement New feature or request label May 13, 2024
Copy link

github-actions bot commented May 13, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}15$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 39.3530μs 16.3947μs 60.9953 KOps/s 64.6024 KOps/s $\textbf{\color{#d91a1a}-5.58\%}$
test_plain_set_stack_nested 45.2840μs 16.5792μs 60.3165 KOps/s 63.1214 KOps/s $\color{#d91a1a}-4.44\%$
test_plain_set_nested_inplace 53.8910μs 18.9777μs 52.6935 KOps/s 55.4861 KOps/s $\textbf{\color{#d91a1a}-5.03\%}$
test_plain_set_stack_nested_inplace 65.6630μs 19.1321μs 52.2681 KOps/s 56.1041 KOps/s $\textbf{\color{#d91a1a}-6.84\%}$
test_items 25.7180μs 2.6195μs 381.7515 KOps/s 400.3726 KOps/s $\color{#d91a1a}-4.65\%$
test_items_nested 0.4889ms 0.2669ms 3.7463 KOps/s 3.7028 KOps/s $\color{#35bf28}+1.17\%$
test_items_nested_locked 0.9491ms 0.2685ms 3.7249 KOps/s 3.6292 KOps/s $\color{#35bf28}+2.64\%$
test_items_nested_leaf 0.1792ms 76.7521μs 13.0290 KOps/s 12.5946 KOps/s $\color{#35bf28}+3.45\%$
test_items_stack_nested 1.2978ms 0.2723ms 3.6721 KOps/s 3.6531 KOps/s $\color{#35bf28}+0.52\%$
test_items_stack_nested_leaf 0.2306ms 79.7923μs 12.5325 KOps/s 12.5942 KOps/s $\color{#d91a1a}-0.49\%$
test_items_stack_nested_locked 0.4790ms 0.2703ms 3.7000 KOps/s 3.6882 KOps/s $\color{#35bf28}+0.32\%$
test_keys 22.8030μs 3.7457μs 266.9759 KOps/s 262.8781 KOps/s $\color{#35bf28}+1.56\%$
test_keys_nested 0.1860ms 0.1361ms 7.3456 KOps/s 7.3227 KOps/s $\color{#35bf28}+0.31\%$
test_keys_nested_locked 0.7872ms 0.1417ms 7.0594 KOps/s 7.0717 KOps/s $\color{#d91a1a}-0.17\%$
test_keys_nested_leaf 0.2207ms 0.1163ms 8.5961 KOps/s 8.6235 KOps/s $\color{#d91a1a}-0.32\%$
test_keys_stack_nested 0.1964ms 0.1396ms 7.1645 KOps/s 7.2890 KOps/s $\color{#d91a1a}-1.71\%$
test_keys_stack_nested_leaf 0.2066ms 0.1173ms 8.5245 KOps/s 8.5947 KOps/s $\color{#d91a1a}-0.82\%$
test_keys_stack_nested_locked 0.2242ms 0.1415ms 7.0684 KOps/s 7.0222 KOps/s $\color{#35bf28}+0.66\%$
test_values 9.8710μs 1.1930μs 838.2044 KOps/s 820.4269 KOps/s $\color{#35bf28}+2.17\%$
test_values_nested 83.2550μs 51.0575μs 19.5858 KOps/s 19.8418 KOps/s $\color{#d91a1a}-1.29\%$
test_values_nested_locked 0.1026ms 50.6756μs 19.7334 KOps/s 19.7384 KOps/s $\color{#d91a1a}-0.03\%$
test_values_nested_leaf 95.5380μs 46.2682μs 21.6131 KOps/s 21.8869 KOps/s $\color{#d91a1a}-1.25\%$
test_values_stack_nested 0.1075ms 51.9039μs 19.2664 KOps/s 19.4302 KOps/s $\color{#d91a1a}-0.84\%$
test_values_stack_nested_leaf 99.3350μs 46.1833μs 21.6528 KOps/s 21.8471 KOps/s $\color{#d91a1a}-0.89\%$
test_values_stack_nested_locked 93.7650μs 52.0759μs 19.2027 KOps/s 19.4761 KOps/s $\color{#d91a1a}-1.40\%$
test_membership 14.8470μs 1.3421μs 745.1227 KOps/s 741.4805 KOps/s $\color{#35bf28}+0.49\%$
test_membership_nested 40.3750μs 3.4159μs 292.7490 KOps/s 284.6461 KOps/s $\color{#35bf28}+2.85\%$
test_membership_nested_leaf 18.0440μs 3.4295μs 291.5886 KOps/s 285.5678 KOps/s $\color{#35bf28}+2.11\%$
test_membership_stacked_nested 22.4520μs 3.3886μs 295.1044 KOps/s 285.5265 KOps/s $\color{#35bf28}+3.35\%$
test_membership_stacked_nested_leaf 18.1540μs 3.4037μs 293.8011 KOps/s 260.8316 KOps/s $\textbf{\color{#35bf28}+12.64\%}$
test_membership_nested_last 54.1850μs 4.1216μs 242.6239 KOps/s 230.8572 KOps/s $\textbf{\color{#35bf28}+5.10\%}$
test_membership_nested_leaf_last 18.8250μs 4.1458μs 241.2089 KOps/s 228.4087 KOps/s $\textbf{\color{#35bf28}+5.60\%}$
test_membership_stacked_nested_last 22.5620μs 4.7246μs 211.6593 KOps/s 233.6317 KOps/s $\textbf{\color{#d91a1a}-9.40\%}$
test_membership_stacked_nested_leaf_last 24.6060μs 4.8119μs 207.8184 KOps/s 230.1010 KOps/s $\textbf{\color{#d91a1a}-9.68\%}$
test_nested_getleaf 52.9680μs 10.8784μs 91.9254 KOps/s 94.4192 KOps/s $\color{#d91a1a}-2.64\%$
test_nested_get 29.6260μs 10.2418μs 97.6394 KOps/s 99.5114 KOps/s $\color{#d91a1a}-1.88\%$
test_stacked_getleaf 33.3830μs 10.7047μs 93.4173 KOps/s 95.1510 KOps/s $\color{#d91a1a}-1.82\%$
test_stacked_get 50.2540μs 10.0025μs 99.9747 KOps/s 101.1245 KOps/s $\color{#d91a1a}-1.14\%$
test_nested_getitemleaf 33.1420μs 11.2263μs 89.0769 KOps/s 89.8781 KOps/s $\color{#d91a1a}-0.89\%$
test_nested_getitem 51.5460μs 10.3418μs 96.6954 KOps/s 96.6889 KOps/s $+0.01\%$
test_stacked_getitemleaf 29.8860μs 11.1204μs 89.9251 KOps/s 90.2875 KOps/s $\color{#d91a1a}-0.40\%$
test_stacked_getitem 44.7740μs 10.3088μs 97.0047 KOps/s 97.0011 KOps/s $+0.00\%$
test_lock_nested 48.1965ms 0.3894ms 2.5680 KOps/s 2.8396 KOps/s $\textbf{\color{#d91a1a}-9.56\%}$
test_lock_stack_nested 0.5799ms 0.3053ms 3.2751 KOps/s 3.2170 KOps/s $\color{#35bf28}+1.81\%$
test_unlock_nested 0.7113ms 0.3423ms 2.9215 KOps/s 2.4973 KOps/s $\textbf{\color{#35bf28}+16.99\%}$
test_unlock_stack_nested 0.4715ms 0.3133ms 3.1916 KOps/s 3.1260 KOps/s $\color{#35bf28}+2.10\%$
test_flatten_speed 0.2021ms 95.9821μs 10.4186 KOps/s 10.4156 KOps/s $\color{#35bf28}+0.03\%$
test_unflatten_speed 0.6141ms 0.4041ms 2.4747 KOps/s 2.4290 KOps/s $\color{#35bf28}+1.88\%$
test_common_ops 1.5217ms 0.7043ms 1.4198 KOps/s 1.4946 KOps/s $\textbf{\color{#d91a1a}-5.00\%}$
test_creation 76.5730μs 1.8918μs 528.5884 KOps/s 517.7838 KOps/s $\color{#35bf28}+2.09\%$
test_creation_empty 28.3030μs 10.2089μs 97.9533 KOps/s 118.9205 KOps/s $\textbf{\color{#d91a1a}-17.63\%}$
test_creation_nested_1 28.3430μs 12.9077μs 77.4729 KOps/s 91.7826 KOps/s $\textbf{\color{#d91a1a}-15.59\%}$
test_creation_nested_2 43.9920μs 16.1917μs 61.7600 KOps/s 69.6339 KOps/s $\textbf{\color{#d91a1a}-11.31\%}$
test_clone 0.1742ms 13.1980μs 75.7689 KOps/s 73.2375 KOps/s $\color{#35bf28}+3.46\%$
test_getitem[int] 42.6700μs 11.2134μs 89.1791 KOps/s 87.1886 KOps/s $\color{#35bf28}+2.28\%$
test_getitem[slice_int] 63.8690μs 21.6764μs 46.1332 KOps/s 43.2230 KOps/s $\textbf{\color{#35bf28}+6.73\%}$
test_getitem[range] 81.7930μs 59.8398μs 16.7113 KOps/s 16.2346 KOps/s $\color{#35bf28}+2.94\%$
test_getitem[tuple] 81.2020μs 18.3741μs 54.4243 KOps/s 52.2222 KOps/s $\color{#35bf28}+4.22\%$
test_getitem[list] 94.8470μs 39.7302μs 25.1698 KOps/s 23.7559 KOps/s $\textbf{\color{#35bf28}+5.95\%}$
test_setitem_dim[int] 53.9410μs 32.7530μs 30.5316 KOps/s 30.1029 KOps/s $\color{#35bf28}+1.42\%$
test_setitem_dim[slice_int] 0.1031ms 58.1033μs 17.2107 KOps/s 16.7861 KOps/s $\color{#35bf28}+2.53\%$
test_setitem_dim[range] 0.1722ms 81.4941μs 12.2708 KOps/s 12.1326 KOps/s $\color{#35bf28}+1.14\%$
test_setitem_dim[tuple] 84.0870μs 47.0502μs 21.2539 KOps/s 20.3315 KOps/s $\color{#35bf28}+4.54\%$
test_setitem 52.7380μs 19.9249μs 50.1885 KOps/s 51.6018 KOps/s $\color{#d91a1a}-2.74\%$
test_set 55.8650μs 19.3661μs 51.6367 KOps/s 54.1379 KOps/s $\color{#d91a1a}-4.62\%$
test_set_shared 3.0561ms 0.1409ms 7.0985 KOps/s 7.0232 KOps/s $\color{#35bf28}+1.07\%$
test_update 0.1578ms 21.3595μs 46.8176 KOps/s 51.7171 KOps/s $\textbf{\color{#d91a1a}-9.47\%}$
test_update_nested 97.0510μs 28.8757μs 34.6312 KOps/s 35.5400 KOps/s $\color{#d91a1a}-2.56\%$
test_update__nested 67.1960μs 24.6606μs 40.5505 KOps/s 38.4874 KOps/s $\textbf{\color{#35bf28}+5.36\%}$
test_set_nested 77.0440μs 21.2684μs 47.0182 KOps/s 48.7935 KOps/s $\color{#d91a1a}-3.64\%$
test_set_nested_new 56.9560μs 25.0785μs 39.8748 KOps/s 40.5036 KOps/s $\color{#d91a1a}-1.55\%$
test_select 0.1088ms 40.1929μs 24.8800 KOps/s 25.7334 KOps/s $\color{#d91a1a}-3.32\%$
test_select_nested 0.1324ms 60.0836μs 16.6435 KOps/s 16.4268 KOps/s $\color{#35bf28}+1.32\%$
test_exclude_nested 0.2416ms 0.1214ms 8.2364 KOps/s 8.3250 KOps/s $\color{#d91a1a}-1.06\%$
test_empty[True] 0.4774ms 0.3919ms 2.5519 KOps/s 2.5547 KOps/s $\color{#d91a1a}-0.11\%$
test_empty[False] 5.3960μs 1.0803μs 925.6911 KOps/s 906.4724 KOps/s $\color{#35bf28}+2.12\%$
test_unbind_speed 1.6298ms 0.2565ms 3.8982 KOps/s 3.8247 KOps/s $\color{#35bf28}+1.92\%$
test_unbind_speed_stack0 0.4317ms 0.2538ms 3.9405 KOps/s 3.9074 KOps/s $\color{#35bf28}+0.85\%$
test_unbind_speed_stack1 64.6952ms 0.7177ms 1.3934 KOps/s 1.2890 KOps/s $\textbf{\color{#35bf28}+8.10\%}$
test_split 65.2078ms 1.5702ms 636.8632 Ops/s 626.7359 Ops/s $\color{#35bf28}+1.62\%$
test_chunk 64.7207ms 1.5694ms 637.2018 Ops/s 623.5961 Ops/s $\color{#35bf28}+2.18\%$
test_creation[device0] 4.7175ms 83.8592μs 11.9248 KOps/s 9.3777 KOps/s $\textbf{\color{#35bf28}+27.16\%}$
test_creation_from_tensor 0.1632ms 82.1871μs 12.1674 KOps/s 11.8226 KOps/s $\color{#35bf28}+2.92\%$
test_add_one[memmap_tensor0] 49.9930μs 5.4021μs 185.1129 KOps/s 181.7224 KOps/s $\color{#35bf28}+1.87\%$
test_contiguous[memmap_tensor0] 13.4150μs 0.6482μs 1.5428 MOps/s 1.5390 MOps/s $\color{#35bf28}+0.25\%$
test_stack[memmap_tensor0] 24.9070μs 3.5963μs 278.0661 KOps/s 274.3025 KOps/s $\color{#35bf28}+1.37\%$
test_memmaptd_index 0.9979ms 0.2507ms 3.9885 KOps/s 3.8403 KOps/s $\color{#35bf28}+3.86\%$
test_memmaptd_index_astensor 0.7020ms 0.3204ms 3.1212 KOps/s 3.0238 KOps/s $\color{#35bf28}+3.22\%$
test_memmaptd_index_op 1.8860ms 0.6126ms 1.6324 KOps/s 1.7380 KOps/s $\textbf{\color{#d91a1a}-6.08\%}$
test_serialize_model 0.1097s 0.1024s 9.7686 Ops/s 9.0125 Ops/s $\textbf{\color{#35bf28}+8.39\%}$
test_serialize_model_pickle 0.4636s 0.3744s 2.6708 Ops/s 2.6361 Ops/s $\color{#35bf28}+1.32\%$
test_serialize_weights 0.1557s 0.1067s 9.3687 Ops/s 8.9210 Ops/s $\textbf{\color{#35bf28}+5.02\%}$
test_serialize_weights_returnearly 0.1337s 0.1243s 8.0440 Ops/s 7.5714 Ops/s $\textbf{\color{#35bf28}+6.24\%}$
test_serialize_weights_pickle 0.9590s 0.5608s 1.7832 Ops/s 1.5601 Ops/s $\textbf{\color{#35bf28}+14.30\%}$
test_serialize_weights_filesystem 94.8880ms 91.0722ms 10.9803 Ops/s 10.9681 Ops/s $\color{#35bf28}+0.11\%$
test_serialize_model_filesystem 0.1579s 97.9300ms 10.2114 Ops/s 10.2698 Ops/s $\color{#d91a1a}-0.57\%$
test_reshape_pytree 63.2080μs 24.7901μs 40.3387 KOps/s 39.4602 KOps/s $\color{#35bf28}+2.23\%$
test_reshape_td 96.2860μs 32.4057μs 30.8588 KOps/s 29.8247 KOps/s $\color{#35bf28}+3.47\%$
test_view_pytree 60.5330μs 24.6801μs 40.5185 KOps/s 39.8484 KOps/s $\color{#35bf28}+1.68\%$
test_view_td 89.9680μs 36.4580μs 27.4288 KOps/s 26.1626 KOps/s $\color{#35bf28}+4.84\%$
test_unbind_pytree 79.0570μs 28.8808μs 34.6251 KOps/s 34.7840 KOps/s $\color{#d91a1a}-0.46\%$
test_unbind_td 0.4215ms 37.8335μs 26.4316 KOps/s 26.1860 KOps/s $\color{#35bf28}+0.94\%$
test_split_pytree 81.5160μs 28.3711μs 35.2471 KOps/s 34.1248 KOps/s $\color{#35bf28}+3.29\%$
test_split_td 0.1217ms 39.5930μs 25.2570 KOps/s 24.2955 KOps/s $\color{#35bf28}+3.96\%$
test_add_pytree 81.2710μs 34.3842μs 29.0831 KOps/s 28.5854 KOps/s $\color{#35bf28}+1.74\%$
test_add_td 0.1522ms 54.3418μs 18.4020 KOps/s 18.6662 KOps/s $\color{#d91a1a}-1.42\%$
test_distributed 0.1927ms 99.0155μs 10.0994 KOps/s 9.7521 KOps/s $\color{#35bf28}+3.56\%$
test_tdmodule 66.1930μs 17.1832μs 58.1964 KOps/s 61.8143 KOps/s $\textbf{\color{#d91a1a}-5.85\%}$
test_tdmodule_dispatch 69.2190μs 34.2889μs 29.1640 KOps/s 30.8425 KOps/s $\textbf{\color{#d91a1a}-5.44\%}$
test_tdseq 36.3170μs 19.9323μs 50.1697 KOps/s 53.6648 KOps/s $\textbf{\color{#d91a1a}-6.51\%}$
test_tdseq_dispatch 66.8150μs 38.7993μs 25.7737 KOps/s 27.0019 KOps/s $\color{#d91a1a}-4.55\%$
test_instantiation_functorch 1.5825ms 1.3054ms 766.0335 Ops/s 767.5220 Ops/s $\color{#d91a1a}-0.19\%$
test_instantiation_td 1.4702ms 1.0073ms 992.8017 Ops/s 985.4844 Ops/s $\color{#35bf28}+0.74\%$
test_exec_functorch 0.3178ms 0.1593ms 6.2758 KOps/s 6.1655 KOps/s $\color{#35bf28}+1.79\%$
test_exec_functional_call 0.2891ms 0.1473ms 6.7898 KOps/s 6.4765 KOps/s $\color{#35bf28}+4.84\%$
test_exec_td 0.2797ms 0.1450ms 6.8951 KOps/s 6.7701 KOps/s $\color{#35bf28}+1.85\%$
test_exec_td_decorator 0.5920ms 0.2181ms 4.5842 KOps/s 4.4086 KOps/s $\color{#35bf28}+3.98\%$
test_vmap_mlp_speed[True-True] 0.7586ms 0.4756ms 2.1028 KOps/s 2.0367 KOps/s $\color{#35bf28}+3.25\%$
test_vmap_mlp_speed[True-False] 0.6082ms 0.4722ms 2.1177 KOps/s 2.0542 KOps/s $\color{#35bf28}+3.09\%$
test_vmap_mlp_speed[False-True] 0.6327ms 0.3868ms 2.5852 KOps/s 2.4290 KOps/s $\textbf{\color{#35bf28}+6.43\%}$
test_vmap_mlp_speed[False-False] 0.5639ms 0.3871ms 2.5831 KOps/s 2.5031 KOps/s $\color{#35bf28}+3.20\%$
test_vmap_mlp_speed_decorator[True-True] 1.1039ms 0.5429ms 1.8419 KOps/s 1.7826 KOps/s $\color{#35bf28}+3.33\%$
test_vmap_mlp_speed_decorator[True-False] 0.8584ms 0.5427ms 1.8425 KOps/s 1.7858 KOps/s $\color{#35bf28}+3.18\%$
test_vmap_mlp_speed_decorator[False-True] 0.8580ms 0.4464ms 2.2402 KOps/s 2.1508 KOps/s $\color{#35bf28}+4.16\%$
test_vmap_mlp_speed_decorator[False-False] 0.6638ms 0.4457ms 2.2434 KOps/s 2.1471 KOps/s $\color{#35bf28}+4.48\%$
test_to_module_speed[True] 1.7767ms 1.6789ms 595.6302 Ops/s 591.7187 Ops/s $\color{#35bf28}+0.66\%$
test_to_module_speed[False] 2.6566ms 1.6508ms 605.7771 Ops/s 602.5541 Ops/s $\color{#35bf28}+0.53\%$

Copy link

github-actions bot commented May 14, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 135. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 70.3840μs 13.2122μs 75.6873 KOps/s 77.6066 KOps/s $\color{#d91a1a}-2.47\%$
test_plain_set_stack_nested 29.1110μs 13.3635μs 74.8307 KOps/s 76.8150 KOps/s $\color{#d91a1a}-2.58\%$
test_plain_set_nested_inplace 39.2320μs 14.4765μs 69.0776 KOps/s 70.4996 KOps/s $\color{#d91a1a}-2.02\%$
test_plain_set_stack_nested_inplace 47.3020μs 14.5978μs 68.5034 KOps/s 69.9624 KOps/s $\color{#d91a1a}-2.09\%$
test_items 25.4610μs 4.6441μs 215.3292 KOps/s 211.2992 KOps/s $\color{#35bf28}+1.91\%$
test_items_nested 0.3992ms 0.3365ms 2.9716 KOps/s 2.9637 KOps/s $\color{#35bf28}+0.27\%$
test_items_nested_locked 0.3894ms 0.3372ms 2.9658 KOps/s 2.9535 KOps/s $\color{#35bf28}+0.41\%$
test_items_nested_leaf 0.1035ms 82.7793μs 12.0803 KOps/s 12.2293 KOps/s $\color{#d91a1a}-1.22\%$
test_items_stack_nested 0.3902ms 0.3395ms 2.9459 KOps/s 2.9606 KOps/s $\color{#d91a1a}-0.50\%$
test_items_stack_nested_leaf 0.1115ms 83.4145μs 11.9883 KOps/s 11.8071 KOps/s $\color{#35bf28}+1.53\%$
test_items_stack_nested_locked 0.3974ms 0.3391ms 2.9490 KOps/s 2.9345 KOps/s $\color{#35bf28}+0.49\%$
test_keys 28.2520μs 4.7356μs 211.1658 KOps/s 212.1154 KOps/s $\color{#d91a1a}-0.45\%$
test_keys_nested 90.1440μs 68.3463μs 14.6314 KOps/s 14.7014 KOps/s $\color{#d91a1a}-0.48\%$
test_keys_nested_locked 0.6215ms 73.1187μs 13.6764 KOps/s 13.6949 KOps/s $\color{#d91a1a}-0.14\%$
test_keys_nested_leaf 82.1940μs 58.8757μs 16.9849 KOps/s 17.1176 KOps/s $\color{#d91a1a}-0.77\%$
test_keys_stack_nested 86.2740μs 68.0011μs 14.7056 KOps/s 14.7223 KOps/s $\color{#d91a1a}-0.11\%$
test_keys_stack_nested_leaf 87.8140μs 58.3361μs 17.1420 KOps/s 17.0310 KOps/s $\color{#35bf28}+0.65\%$
test_keys_stack_nested_locked 0.1024ms 72.9483μs 13.7083 KOps/s 13.6742 KOps/s $\color{#35bf28}+0.25\%$
test_values 12.4860μs 1.8569μs 538.5313 KOps/s 546.5663 KOps/s $\color{#d91a1a}-1.47\%$
test_values_nested 57.2730μs 35.5284μs 28.1465 KOps/s 28.0575 KOps/s $\color{#35bf28}+0.32\%$
test_values_nested_locked 56.9930μs 37.6269μs 26.5767 KOps/s 26.6299 KOps/s $\color{#d91a1a}-0.20\%$
test_values_nested_leaf 50.7030μs 31.6524μs 31.5932 KOps/s 31.4731 KOps/s $\color{#35bf28}+0.38\%$
test_values_stack_nested 62.4030μs 36.0572μs 27.7337 KOps/s 27.4215 KOps/s $\color{#35bf28}+1.14\%$
test_values_stack_nested_leaf 55.0730μs 32.0177μs 31.2327 KOps/s 30.6960 KOps/s $\color{#35bf28}+1.75\%$
test_values_stack_nested_locked 65.5830μs 38.1816μs 26.1906 KOps/s 26.0556 KOps/s $\color{#35bf28}+0.52\%$
test_membership 13.9710μs 0.8978μs 1.1138 MOps/s 1.4163 MOps/s $\textbf{\color{#d91a1a}-21.36\%}$
test_membership_nested 17.5410μs 2.5988μs 384.7913 KOps/s 403.1854 KOps/s $\color{#d91a1a}-4.56\%$
test_membership_nested_leaf 32.2610μs 2.5648μs 389.9005 KOps/s 405.5155 KOps/s $\color{#d91a1a}-3.85\%$
test_membership_stacked_nested 21.9510μs 2.5589μs 390.7941 KOps/s 400.7625 KOps/s $\color{#d91a1a}-2.49\%$
test_membership_stacked_nested_leaf 32.7610μs 2.5228μs 396.3859 KOps/s 403.7137 KOps/s $\color{#d91a1a}-1.82\%$
test_membership_nested_last 21.1110μs 3.0505μs 327.8150 KOps/s 332.0716 KOps/s $\color{#d91a1a}-1.28\%$
test_membership_nested_leaf_last 33.6520μs 3.0904μs 323.5792 KOps/s 329.5743 KOps/s $\color{#d91a1a}-1.82\%$
test_membership_stacked_nested_last 22.0510μs 3.0916μs 323.4565 KOps/s 328.1507 KOps/s $\color{#d91a1a}-1.43\%$
test_membership_stacked_nested_leaf_last 18.5910μs 3.1115μs 321.3840 KOps/s 329.3031 KOps/s $\color{#d91a1a}-2.40\%$
test_nested_getleaf 38.9720μs 8.4318μs 118.5985 KOps/s 119.1146 KOps/s $\color{#d91a1a}-0.43\%$
test_nested_get 30.2420μs 7.8910μs 126.7271 KOps/s 126.4638 KOps/s $\color{#35bf28}+0.21\%$
test_stacked_getleaf 27.1410μs 8.3326μs 120.0100 KOps/s 119.6734 KOps/s $\color{#35bf28}+0.28\%$
test_stacked_get 36.8920μs 7.8202μs 127.8733 KOps/s 126.7934 KOps/s $\color{#35bf28}+0.85\%$
test_nested_getitemleaf 41.0520μs 8.5511μs 116.9444 KOps/s 117.6879 KOps/s $\color{#d91a1a}-0.63\%$
test_nested_getitem 25.0210μs 8.0349μs 124.4574 KOps/s 124.1479 KOps/s $\color{#35bf28}+0.25\%$
test_stacked_getitemleaf 37.9320μs 8.4970μs 117.6884 KOps/s 116.9972 KOps/s $\color{#35bf28}+0.59\%$
test_stacked_getitem 20.0800μs 7.9905μs 125.1491 KOps/s 124.3879 KOps/s $\color{#35bf28}+0.61\%$
test_lock_nested 55.6136ms 0.4077ms 2.4527 KOps/s 2.4706 KOps/s $\color{#d91a1a}-0.73\%$
test_lock_stack_nested 0.3500ms 0.3013ms 3.3192 KOps/s 3.2559 KOps/s $\color{#35bf28}+1.94\%$
test_unlock_nested 0.7142ms 0.3462ms 2.8887 KOps/s 2.8514 KOps/s $\color{#35bf28}+1.31\%$
test_unlock_stack_nested 0.3567ms 0.3101ms 3.2248 KOps/s 3.1802 KOps/s $\color{#35bf28}+1.40\%$
test_flatten_speed 0.3149ms 0.1023ms 9.7747 KOps/s 9.7823 KOps/s $\color{#d91a1a}-0.08\%$
test_unflatten_speed 0.3835ms 0.2873ms 3.4809 KOps/s 3.4463 KOps/s $\color{#35bf28}+1.00\%$
test_common_ops 1.0538ms 0.5782ms 1.7295 KOps/s 1.7244 KOps/s $\color{#35bf28}+0.30\%$
test_creation 36.8420μs 1.7202μs 581.3412 KOps/s 626.5415 KOps/s $\textbf{\color{#d91a1a}-7.21\%}$
test_creation_empty 25.8110μs 9.1410μs 109.3974 KOps/s 114.3171 KOps/s $\color{#d91a1a}-4.30\%$
test_creation_nested_1 34.6220μs 10.8909μs 91.8195 KOps/s 95.3014 KOps/s $\color{#d91a1a}-3.65\%$
test_creation_nested_2 37.7520μs 13.0454μs 76.6552 KOps/s 78.5404 KOps/s $\color{#d91a1a}-2.40\%$
test_clone 87.0840μs 11.2163μs 89.1556 KOps/s 87.6238 KOps/s $\color{#35bf28}+1.75\%$
test_getitem[int] 24.8320μs 10.7851μs 92.7209 KOps/s 91.9870 KOps/s $\color{#35bf28}+0.80\%$
test_getitem[slice_int] 48.6830μs 20.7108μs 48.2840 KOps/s 47.7353 KOps/s $\color{#35bf28}+1.15\%$
test_getitem[range] 66.6130μs 47.2262μs 21.1747 KOps/s 21.4462 KOps/s $\color{#d91a1a}-1.27\%$
test_getitem[tuple] 40.5220μs 18.9649μs 52.7289 KOps/s 52.1070 KOps/s $\color{#35bf28}+1.19\%$
test_getitem[list] 0.1264ms 33.9955μs 29.4156 KOps/s 28.8728 KOps/s $\color{#35bf28}+1.88\%$
test_setitem_dim[int] 45.7420μs 29.5144μs 33.8818 KOps/s 34.7976 KOps/s $\color{#d91a1a}-2.63\%$
test_setitem_dim[slice_int] 69.8530μs 49.0644μs 20.3814 KOps/s 20.5700 KOps/s $\color{#d91a1a}-0.92\%$
test_setitem_dim[range] 97.3150μs 66.4410μs 15.0510 KOps/s 14.4245 KOps/s $\color{#35bf28}+4.34\%$
test_setitem_dim[tuple] 63.3930μs 43.3927μs 23.0454 KOps/s 21.8909 KOps/s $\textbf{\color{#35bf28}+5.27\%}$
test_setitem 41.2120μs 16.2269μs 61.6260 KOps/s 60.0770 KOps/s $\color{#35bf28}+2.58\%$
test_set 45.8330μs 15.3367μs 65.2032 KOps/s 60.7352 KOps/s $\textbf{\color{#35bf28}+7.36\%}$
test_set_shared 71.1640ms 0.1106ms 9.0380 KOps/s 10.2747 KOps/s $\textbf{\color{#d91a1a}-12.04\%}$
test_update 84.4540μs 18.0962μs 55.2601 KOps/s 56.2223 KOps/s $\color{#d91a1a}-1.71\%$
test_update_nested 80.9040μs 23.0406μs 43.4016 KOps/s 43.2115 KOps/s $\color{#35bf28}+0.44\%$
test_update__nested 59.1820μs 21.5202μs 46.4679 KOps/s 45.2300 KOps/s $\color{#35bf28}+2.74\%$
test_set_nested 61.1230μs 16.3934μs 61.0002 KOps/s 58.2446 KOps/s $\color{#35bf28}+4.73\%$
test_set_nested_new 70.4530μs 19.0920μs 52.3779 KOps/s 52.7430 KOps/s $\color{#d91a1a}-0.69\%$
test_select 68.3830μs 30.9458μs 32.3146 KOps/s 29.8998 KOps/s $\textbf{\color{#35bf28}+8.08\%}$
test_select_nested 90.1750μs 56.1312μs 17.8154 KOps/s 18.4870 KOps/s $\color{#d91a1a}-3.63\%$
test_exclude_nested 0.1528ms 0.1098ms 9.1092 KOps/s 9.1992 KOps/s $\color{#d91a1a}-0.98\%$
test_empty[True] 0.4843ms 0.3438ms 2.9084 KOps/s 2.8820 KOps/s $\color{#35bf28}+0.92\%$
test_empty[False] 2.7671μs 0.9493μs 1.0534 MOps/s 1.1529 MOps/s $\textbf{\color{#d91a1a}-8.63\%}$
test_to 0.1022ms 75.8012μs 13.1924 KOps/s 13.2696 KOps/s $\color{#d91a1a}-0.58\%$
test_to_nonblocking 95.7840μs 61.4618μs 16.2703 KOps/s 15.7391 KOps/s $\color{#35bf28}+3.37\%$
test_unbind_speed 0.3189ms 0.2659ms 3.7601 KOps/s 3.7297 KOps/s $\color{#35bf28}+0.82\%$
test_unbind_speed_stack0 0.3205ms 0.2696ms 3.7098 KOps/s 3.7006 KOps/s $\color{#35bf28}+0.25\%$
test_unbind_speed_stack1 72.5490ms 0.8074ms 1.2386 KOps/s 1.2426 KOps/s $\color{#d91a1a}-0.33\%$
test_split 1.5907ms 1.5257ms 655.4479 Ops/s 648.9874 Ops/s $\color{#35bf28}+1.00\%$
test_chunk 73.0102ms 1.6422ms 608.9462 Ops/s 604.6837 Ops/s $\color{#35bf28}+0.70\%$
test_creation[device0] 0.1315ms 55.1320μs 18.1383 KOps/s 13.9200 KOps/s $\textbf{\color{#35bf28}+30.30\%}$
test_creation_from_tensor 0.1298ms 52.9221μs 18.8957 KOps/s 18.1850 KOps/s $\color{#35bf28}+3.91\%$
test_add_one[memmap_tensor0] 77.5140μs 6.4633μs 154.7189 KOps/s 150.0868 KOps/s $\color{#35bf28}+3.09\%$
test_contiguous[memmap_tensor0] 23.2710μs 0.6630μs 1.5084 MOps/s 1.5298 MOps/s $\color{#d91a1a}-1.40\%$
test_stack[memmap_tensor0] 28.2010μs 4.4620μs 224.1147 KOps/s 223.7800 KOps/s $\color{#35bf28}+0.15\%$
test_memmaptd_index 1.2385ms 0.2768ms 3.6127 KOps/s 3.5445 KOps/s $\color{#35bf28}+1.92\%$
test_memmaptd_index_astensor 0.6242ms 0.3489ms 2.8661 KOps/s 2.7834 KOps/s $\color{#35bf28}+2.97\%$
test_memmaptd_index_op 1.1757ms 0.6468ms 1.5462 KOps/s 1.5298 KOps/s $\color{#35bf28}+1.07\%$
test_serialize_model 0.1768s 0.1066s 9.3797 Ops/s 9.7282 Ops/s $\color{#d91a1a}-3.58\%$
test_serialize_model_pickle 1.3706s 1.2386s 0.8074 Ops/s 0.8084 Ops/s $\color{#d91a1a}-0.13\%$
test_serialize_weights 0.1786s 0.1052s 9.5076 Ops/s 9.0706 Ops/s $\color{#35bf28}+4.82\%$
test_serialize_weights_returnearly 85.2354ms 76.5356ms 13.0658 Ops/s 11.4208 Ops/s $\textbf{\color{#35bf28}+14.40\%}$
test_serialize_weights_pickle 1.3497s 1.2476s 0.8015 Ops/s 0.7981 Ops/s $\color{#35bf28}+0.43\%$
test_reshape_pytree 47.3130μs 23.0153μs 43.4493 KOps/s 43.5287 KOps/s $\color{#d91a1a}-0.18\%$
test_reshape_td 0.2347ms 30.4919μs 32.7956 KOps/s 31.8084 KOps/s $\color{#35bf28}+3.10\%$
test_view_pytree 39.8420μs 22.6297μs 44.1898 KOps/s 42.9186 KOps/s $\color{#35bf28}+2.96\%$
test_view_td 0.2344ms 34.4482μs 29.0291 KOps/s 29.7985 KOps/s $\color{#d91a1a}-2.58\%$
test_unbind_pytree 58.4830μs 28.9206μs 34.5775 KOps/s 34.2371 KOps/s $\color{#35bf28}+0.99\%$
test_unbind_td 0.5047ms 44.1251μs 22.6628 KOps/s 24.6713 KOps/s $\textbf{\color{#d91a1a}-8.14\%}$
test_split_pytree 54.0530μs 33.1541μs 30.1622 KOps/s 31.5509 KOps/s $\color{#d91a1a}-4.40\%$
test_split_td 0.2336ms 39.6262μs 25.2358 KOps/s 25.9639 KOps/s $\color{#d91a1a}-2.80\%$
test_add_pytree 0.1777ms 35.5402μs 28.1371 KOps/s 26.1069 KOps/s $\textbf{\color{#35bf28}+7.78\%}$
test_add_td 0.2096ms 56.0471μs 17.8421 KOps/s 18.2388 KOps/s $\color{#d91a1a}-2.17\%$
test_distributed 1.6764ms 68.3109μs 14.6390 KOps/s 11.6374 KOps/s $\textbf{\color{#35bf28}+25.79\%}$
test_tdmodule 81.1440μs 15.0777μs 66.3231 KOps/s 64.0229 KOps/s $\color{#35bf28}+3.59\%$
test_tdmodule_dispatch 47.5620μs 29.5511μs 33.8397 KOps/s 32.9686 KOps/s $\color{#35bf28}+2.64\%$
test_tdseq 33.1920μs 16.9100μs 59.1367 KOps/s 56.8975 KOps/s $\color{#35bf28}+3.94\%$
test_tdseq_dispatch 49.8730μs 33.1766μs 30.1417 KOps/s 28.9293 KOps/s $\color{#35bf28}+4.19\%$
test_instantiation_functorch 1.6523ms 1.5251ms 655.6819 Ops/s 660.2316 Ops/s $\color{#d91a1a}-0.69\%$
test_instantiation_td 75.8758ms 1.1416ms 875.9863 Ops/s 870.8308 Ops/s $\color{#35bf28}+0.59\%$
test_exec_functorch 0.1908ms 0.1454ms 6.8764 KOps/s 6.7341 KOps/s $\color{#35bf28}+2.11\%$
test_exec_functional_call 0.1738ms 0.1325ms 7.5483 KOps/s 7.2288 KOps/s $\color{#35bf28}+4.42\%$
test_exec_td 0.1633ms 0.1308ms 7.6482 KOps/s 7.1430 KOps/s $\textbf{\color{#35bf28}+7.07\%}$
test_exec_td_decorator 0.7233ms 0.2089ms 4.7869 KOps/s 4.7659 KOps/s $\color{#35bf28}+0.44\%$
test_vmap_mlp_speed[True-True] 0.6441ms 0.5814ms 1.7201 KOps/s 1.6699 KOps/s $\color{#35bf28}+3.00\%$
test_vmap_mlp_speed[True-False] 0.8346ms 0.5825ms 1.7167 KOps/s 1.6346 KOps/s $\textbf{\color{#35bf28}+5.02\%}$
test_vmap_mlp_speed[False-True] 0.5671ms 0.5092ms 1.9640 KOps/s 1.9042 KOps/s $\color{#35bf28}+3.14\%$
test_vmap_mlp_speed[False-False] 0.5693ms 0.5088ms 1.9653 KOps/s 1.8430 KOps/s $\textbf{\color{#35bf28}+6.64\%}$
test_vmap_mlp_speed_decorator[True-True] 0.7310ms 0.6473ms 1.5448 KOps/s 1.5290 KOps/s $\color{#35bf28}+1.03\%$
test_vmap_mlp_speed_decorator[True-False] 0.7834ms 0.6439ms 1.5530 KOps/s 1.5167 KOps/s $\color{#35bf28}+2.40\%$
test_vmap_mlp_speed_decorator[False-True] 0.6850ms 0.5697ms 1.7553 KOps/s 1.7278 KOps/s $\color{#35bf28}+1.59\%$
test_vmap_mlp_speed_decorator[False-False] 0.6883ms 0.5714ms 1.7500 KOps/s 1.7292 KOps/s $\color{#35bf28}+1.20\%$
test_vmap_transformer_speed[True-True] 7.7773ms 7.7176ms 129.5735 Ops/s 127.1636 Ops/s $\color{#35bf28}+1.90\%$
test_vmap_transformer_speed[True-False] 7.7779ms 7.7222ms 129.4972 Ops/s 126.9184 Ops/s $\color{#35bf28}+2.03\%$
test_vmap_transformer_speed[False-True] 7.6998ms 7.6531ms 130.6656 Ops/s 128.5335 Ops/s $\color{#35bf28}+1.66\%$
test_vmap_transformer_speed[False-False] 7.7038ms 7.6424ms 130.8495 Ops/s 128.4402 Ops/s $\color{#35bf28}+1.88\%$
test_vmap_transformer_speed_decorator[True-True] 18.7993ms 18.7208ms 53.4166 Ops/s 52.3278 Ops/s $\color{#35bf28}+2.08\%$
test_vmap_transformer_speed_decorator[True-False] 18.8437ms 18.7423ms 53.3552 Ops/s 51.3354 Ops/s $\color{#35bf28}+3.93\%$
test_vmap_transformer_speed_decorator[False-True] 18.7177ms 18.6072ms 53.7427 Ops/s 52.5583 Ops/s $\color{#35bf28}+2.25\%$
test_vmap_transformer_speed_decorator[False-False] 18.6844ms 18.6234ms 53.6960 Ops/s 52.5797 Ops/s $\color{#35bf28}+2.12\%$
test_to_module_speed[True] 2.2092ms 1.5575ms 642.0511 Ops/s 647.6511 Ops/s $\color{#d91a1a}-0.86\%$
test_to_module_speed[False] 1.6609ms 1.5425ms 648.2815 Ops/s 657.7368 Ops/s $\color{#d91a1a}-1.44\%$

@vmoens vmoens merged commit 5fef538 into main May 14, 2024
37 of 38 checks passed
@vmoens vmoens deleted the memmap-td-improvements branch May 14, 2024 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants