Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] mp_start_method in tensordict map #695

Merged
merged 3 commits into from
Mar 4, 2024
Merged

[Feature] mp_start_method in tensordict map #695

merged 3 commits into from
Mar 4, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Mar 4, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 4, 2024
Copy link

github-actions bot commented Mar 4, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 126. Improved: $\large\color{#35bf28}22$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 31.5180μs 16.5234μs 60.5201 KOps/s 55.6228 KOps/s $\textbf{\color{#35bf28}+8.80\%}$
test_plain_set_stack_nested 59.3910μs 16.7071μs 59.8547 KOps/s 54.1341 KOps/s $\textbf{\color{#35bf28}+10.57\%}$
test_plain_set_nested_inplace 47.8400μs 19.1366μs 52.2560 KOps/s 48.0243 KOps/s $\textbf{\color{#35bf28}+8.81\%}$
test_plain_set_stack_nested_inplace 71.0720μs 19.1678μs 52.1709 KOps/s 48.0344 KOps/s $\textbf{\color{#35bf28}+8.61\%}$
test_items 30.0960μs 2.3984μs 416.9413 KOps/s 397.8734 KOps/s $\color{#35bf28}+4.79\%$
test_items_nested 1.0754ms 0.2711ms 3.6891 KOps/s 3.6671 KOps/s $\color{#35bf28}+0.60\%$
test_items_nested_locked 0.5039ms 0.2707ms 3.6943 KOps/s 3.6603 KOps/s $\color{#35bf28}+0.93\%$
test_items_nested_leaf 0.5893ms 0.1695ms 5.8987 KOps/s 5.9355 KOps/s $\color{#d91a1a}-0.62\%$
test_items_stack_nested 1.0569ms 0.2801ms 3.5704 KOps/s 3.6513 KOps/s $\color{#d91a1a}-2.22\%$
test_items_stack_nested_leaf 0.3029ms 0.1674ms 5.9734 KOps/s 5.9030 KOps/s $\color{#35bf28}+1.19\%$
test_items_stack_nested_locked 0.4848ms 0.2758ms 3.6252 KOps/s 3.6159 KOps/s $\color{#35bf28}+0.26\%$
test_keys 24.9860μs 3.8410μs 260.3508 KOps/s 258.0278 KOps/s $\color{#35bf28}+0.90\%$
test_keys_nested 1.9844ms 0.1548ms 6.4611 KOps/s 6.7199 KOps/s $\color{#d91a1a}-3.85\%$
test_keys_nested_locked 0.3790ms 0.1604ms 6.2342 KOps/s 6.6123 KOps/s $\textbf{\color{#d91a1a}-5.72\%}$
test_keys_nested_leaf 36.8807ms 0.1416ms 7.0613 KOps/s 7.7635 KOps/s $\textbf{\color{#d91a1a}-9.04\%}$
test_keys_stack_nested 4.6532ms 0.1586ms 6.3070 KOps/s 6.6059 KOps/s $\color{#d91a1a}-4.53\%$
test_keys_stack_nested_leaf 0.2354ms 0.1371ms 7.2965 KOps/s 7.5536 KOps/s $\color{#d91a1a}-3.40\%$
test_keys_stack_nested_locked 0.3123ms 0.1635ms 6.1161 KOps/s 6.4312 KOps/s $\color{#d91a1a}-4.90\%$
test_values 4.6226μs 1.1486μs 870.6087 KOps/s 830.8099 KOps/s $\color{#35bf28}+4.79\%$
test_values_nested 0.1030ms 51.7922μs 19.3079 KOps/s 19.0034 KOps/s $\color{#35bf28}+1.60\%$
test_values_nested_locked 0.1037ms 52.0533μs 19.2111 KOps/s 19.1386 KOps/s $\color{#35bf28}+0.38\%$
test_values_nested_leaf 85.0390μs 46.2607μs 21.6166 KOps/s 21.3661 KOps/s $\color{#35bf28}+1.17\%$
test_values_stack_nested 0.1046ms 52.3832μs 19.0901 KOps/s 18.7877 KOps/s $\color{#35bf28}+1.61\%$
test_values_stack_nested_leaf 97.8930μs 45.6724μs 21.8951 KOps/s 21.1358 KOps/s $\color{#35bf28}+3.59\%$
test_values_stack_nested_locked 94.0350μs 52.1064μs 19.1915 KOps/s 18.7014 KOps/s $\color{#35bf28}+2.62\%$
test_membership 22.7320μs 1.3432μs 744.4882 KOps/s 721.9831 KOps/s $\color{#35bf28}+3.12\%$
test_membership_nested 20.6490μs 3.4810μs 287.2714 KOps/s 281.5802 KOps/s $\color{#35bf28}+2.02\%$
test_membership_nested_leaf 29.9860μs 3.5095μs 284.9416 KOps/s 283.5394 KOps/s $\color{#35bf28}+0.49\%$
test_membership_stacked_nested 27.6120μs 3.4615μs 288.8881 KOps/s 278.3162 KOps/s $\color{#35bf28}+3.80\%$
test_membership_stacked_nested_leaf 30.5870μs 3.5258μs 283.6227 KOps/s 285.5346 KOps/s $\color{#d91a1a}-0.67\%$
test_membership_nested_last 32.8020μs 4.3898μs 227.7992 KOps/s 226.6127 KOps/s $\color{#35bf28}+0.52\%$
test_membership_nested_leaf_last 38.5720μs 4.3767μs 228.4801 KOps/s 229.3751 KOps/s $\color{#d91a1a}-0.39\%$
test_membership_stacked_nested_last 34.0630μs 5.5151μs 181.3202 KOps/s 183.5225 KOps/s $\color{#d91a1a}-1.20\%$
test_membership_stacked_nested_leaf_last 31.7900μs 5.4885μs 182.2000 KOps/s 181.0664 KOps/s $\color{#35bf28}+0.63\%$
test_nested_getleaf 34.3540μs 10.6758μs 93.6696 KOps/s 91.8200 KOps/s $\color{#35bf28}+2.01\%$
test_nested_get 36.6190μs 10.0529μs 99.4742 KOps/s 97.7943 KOps/s $\color{#35bf28}+1.72\%$
test_stacked_getleaf 36.1970μs 10.6257μs 94.1111 KOps/s 94.2796 KOps/s $\color{#d91a1a}-0.18\%$
test_stacked_get 32.4810μs 10.0610μs 99.3933 KOps/s 98.6351 KOps/s $\color{#35bf28}+0.77\%$
test_nested_getitemleaf 36.5880μs 11.2143μs 89.1720 KOps/s 88.9495 KOps/s $\color{#35bf28}+0.25\%$
test_nested_getitem 37.5400μs 10.7244μs 93.2454 KOps/s 94.3268 KOps/s $\color{#d91a1a}-1.15\%$
test_stacked_getitemleaf 35.4760μs 11.1583μs 89.6196 KOps/s 88.1091 KOps/s $\color{#35bf28}+1.71\%$
test_stacked_getitem 35.7070μs 10.6272μs 94.0978 KOps/s 93.3976 KOps/s $\color{#35bf28}+0.75\%$
test_lock_nested 0.6450ms 0.3315ms 3.0163 KOps/s 2.9442 KOps/s $\color{#35bf28}+2.45\%$
test_lock_stack_nested 0.4818ms 0.2950ms 3.3894 KOps/s 3.3146 KOps/s $\color{#35bf28}+2.26\%$
test_unlock_nested 78.0156ms 0.4118ms 2.4284 KOps/s 2.3664 KOps/s $\color{#35bf28}+2.62\%$
test_unlock_stack_nested 0.4586ms 0.3041ms 3.2885 KOps/s 3.2250 KOps/s $\color{#35bf28}+1.97\%$
test_flatten_speed 0.5928ms 0.2900ms 3.4484 KOps/s 3.5214 KOps/s $\color{#d91a1a}-2.07\%$
test_unflatten_speed 0.7126ms 0.4110ms 2.4332 KOps/s 2.4297 KOps/s $\color{#35bf28}+0.14\%$
test_common_ops 4.7398ms 0.6704ms 1.4918 KOps/s 1.4078 KOps/s $\textbf{\color{#35bf28}+5.96\%}$
test_creation 41.0670μs 1.8061μs 553.6923 KOps/s 549.8564 KOps/s $\color{#35bf28}+0.70\%$
test_creation_empty 42.4190μs 9.1114μs 109.7522 KOps/s 85.2304 KOps/s $\textbf{\color{#35bf28}+28.77\%}$
test_creation_nested_1 44.7030μs 11.4178μs 87.5827 KOps/s 68.8809 KOps/s $\textbf{\color{#35bf28}+27.15\%}$
test_creation_nested_2 40.5260μs 14.8024μs 67.5567 KOps/s 56.9502 KOps/s $\textbf{\color{#35bf28}+18.62\%}$
test_clone 70.5420μs 13.1369μs 76.1215 KOps/s 76.6878 KOps/s $\color{#d91a1a}-0.74\%$
test_getitem[int] 26.1590μs 10.9273μs 91.5142 KOps/s 88.4782 KOps/s $\color{#35bf28}+3.43\%$
test_getitem[slice_int] 58.0780μs 22.7275μs 43.9995 KOps/s 45.3470 KOps/s $\color{#d91a1a}-2.97\%$
test_getitem[range] 0.1687ms 43.6530μs 22.9079 KOps/s 24.5617 KOps/s $\textbf{\color{#d91a1a}-6.73\%}$
test_getitem[tuple] 50.3740μs 18.2485μs 54.7990 KOps/s 54.1291 KOps/s $\color{#35bf28}+1.24\%$
test_getitem[list] 0.1672ms 37.9810μs 26.3289 KOps/s 27.5523 KOps/s $\color{#d91a1a}-4.44\%$
test_setitem_dim[int] 73.3170μs 31.5507μs 31.6950 KOps/s 28.4634 KOps/s $\textbf{\color{#35bf28}+11.35\%}$
test_setitem_dim[slice_int] 0.1034ms 60.3982μs 16.5568 KOps/s 15.7066 KOps/s $\textbf{\color{#35bf28}+5.41\%}$
test_setitem_dim[range] 0.1483ms 80.5802μs 12.4100 KOps/s 12.4192 KOps/s $\color{#d91a1a}-0.07\%$
test_setitem_dim[tuple] 0.1001ms 47.8924μs 20.8802 KOps/s 19.4161 KOps/s $\textbf{\color{#35bf28}+7.54\%}$
test_setitem 81.0310μs 19.1527μs 52.2120 KOps/s 49.0597 KOps/s $\textbf{\color{#35bf28}+6.43\%}$
test_set 81.9230μs 18.7870μs 53.2283 KOps/s 50.5710 KOps/s $\textbf{\color{#35bf28}+5.25\%}$
test_set_shared 4.1193ms 0.1427ms 7.0063 KOps/s 7.0236 KOps/s $\color{#d91a1a}-0.25\%$
test_update 81.8330μs 20.9948μs 47.6308 KOps/s 43.3316 KOps/s $\textbf{\color{#35bf28}+9.92\%}$
test_update_nested 83.4460μs 28.4473μs 35.1527 KOps/s 32.1687 KOps/s $\textbf{\color{#35bf28}+9.28\%}$
test_set_nested 87.8240μs 20.3660μs 49.1014 KOps/s 46.1516 KOps/s $\textbf{\color{#35bf28}+6.39\%}$
test_set_nested_new 84.9290μs 24.1823μs 41.3526 KOps/s 38.4963 KOps/s $\textbf{\color{#35bf28}+7.42\%}$
test_select 96.8810μs 39.0486μs 25.6091 KOps/s 25.1967 KOps/s $\color{#35bf28}+1.64\%$
test_select_nested 0.1463ms 59.5413μs 16.7951 KOps/s 17.5330 KOps/s $\color{#d91a1a}-4.21\%$
test_exclude_nested 0.2253ms 0.1202ms 8.3172 KOps/s 8.5669 KOps/s $\color{#d91a1a}-2.91\%$
test_empty[True] 0.5868ms 0.4190ms 2.3868 KOps/s 2.4476 KOps/s $\color{#d91a1a}-2.48\%$
test_empty[False] 6.0754μs 1.0267μs 973.9701 KOps/s 960.7206 KOps/s $\color{#35bf28}+1.38\%$
test_unbind_speed 0.4328ms 0.2424ms 4.1251 KOps/s 4.0187 KOps/s $\color{#35bf28}+2.65\%$
test_unbind_speed_stack0 0.4962ms 0.2366ms 4.2258 KOps/s 4.1417 KOps/s $\color{#35bf28}+2.03\%$
test_unbind_speed_stack1 0.1190s 0.6590ms 1.5173 KOps/s 1.4723 KOps/s $\color{#35bf28}+3.06\%$
test_split 0.1098s 1.6154ms 619.0292 Ops/s 618.2644 Ops/s $\color{#35bf28}+0.12\%$
test_chunk 1.6978ms 1.4577ms 686.0199 Ops/s 691.7793 Ops/s $\color{#d91a1a}-0.83\%$
test_creation[device0] 0.1968ms 0.1025ms 9.7599 KOps/s 9.7333 KOps/s $\color{#35bf28}+0.27\%$
test_creation_from_tensor 4.5460ms 83.3744μs 11.9941 KOps/s 11.9920 KOps/s $\color{#35bf28}+0.02\%$
test_add_one[memmap_tensor0] 0.1002ms 5.4868μs 182.2570 KOps/s 181.7124 KOps/s $\color{#35bf28}+0.30\%$
test_contiguous[memmap_tensor0] 15.7190μs 0.6600μs 1.5151 MOps/s 1.5701 MOps/s $\color{#d91a1a}-3.50\%$
test_stack[memmap_tensor0] 28.3020μs 3.5887μs 278.6494 KOps/s 280.4425 KOps/s $\color{#d91a1a}-0.64\%$
test_memmaptd_index 1.1119ms 0.2439ms 4.0997 KOps/s 4.2411 KOps/s $\color{#d91a1a}-3.33\%$
test_memmaptd_index_astensor 0.5142ms 0.3035ms 3.2951 KOps/s 3.3591 KOps/s $\color{#d91a1a}-1.91\%$
test_memmaptd_index_op 1.3333ms 0.5911ms 1.6918 KOps/s 1.6152 KOps/s $\color{#35bf28}+4.74\%$
test_serialize_model 0.2124s 0.1115s 8.9649 Ops/s 8.6555 Ops/s $\color{#35bf28}+3.58\%$
test_serialize_model_pickle 0.4635s 0.3750s 2.6666 Ops/s 2.6413 Ops/s $\color{#35bf28}+0.96\%$
test_serialize_weights 0.1058s 98.2029ms 10.1830 Ops/s 9.9626 Ops/s $\color{#35bf28}+2.21\%$
test_serialize_weights_returnearly 0.2284s 0.1408s 7.1008 Ops/s 8.2873 Ops/s $\textbf{\color{#d91a1a}-14.32\%}$
test_serialize_weights_pickle 0.7801s 0.5075s 1.9705 Ops/s 2.4860 Ops/s $\textbf{\color{#d91a1a}-20.73\%}$
test_serialize_weights_filesystem 99.4180ms 93.0347ms 10.7487 Ops/s 11.1327 Ops/s $\color{#d91a1a}-3.45\%$
test_serialize_model_filesystem 96.5170ms 92.9051ms 10.7637 Ops/s 10.5384 Ops/s $\color{#35bf28}+2.14\%$
test_reshape_pytree 83.5110μs 21.2487μs 47.0617 KOps/s 46.9317 KOps/s $\color{#35bf28}+0.28\%$
test_reshape_td 76.9330μs 31.0843μs 32.1706 KOps/s 31.8998 KOps/s $\color{#35bf28}+0.85\%$
test_view_pytree 58.6800μs 20.8890μs 47.8721 KOps/s 47.0893 KOps/s $\color{#35bf28}+1.66\%$
test_view_td 0.1295s 61.9699μs 16.1369 KOps/s 16.4252 KOps/s $\color{#d91a1a}-1.76\%$
test_unbind_pytree 47.8090μs 24.9350μs 40.1043 KOps/s 39.7499 KOps/s $\color{#35bf28}+0.89\%$
test_unbind_td 0.1277ms 36.2432μs 27.5914 KOps/s 26.7708 KOps/s $\color{#35bf28}+3.07\%$
test_split_pytree 59.7420μs 23.9797μs 41.7019 KOps/s 41.4869 KOps/s $\color{#35bf28}+0.52\%$
test_split_td 0.1157ms 39.2525μs 25.4761 KOps/s 25.2066 KOps/s $\color{#35bf28}+1.07\%$
test_add_pytree 80.3210μs 29.6660μs 33.7086 KOps/s 33.3460 KOps/s $\color{#35bf28}+1.09\%$
test_add_td 0.1029ms 51.9869μs 19.2356 KOps/s 18.1968 KOps/s $\textbf{\color{#35bf28}+5.71\%}$
test_distributed 0.1928ms 0.1002ms 9.9765 KOps/s 9.8159 KOps/s $\color{#35bf28}+1.64\%$
test_tdmodule 37.4900μs 17.0724μs 58.5740 KOps/s 53.1164 KOps/s $\textbf{\color{#35bf28}+10.27\%}$
test_tdmodule_dispatch 71.6150μs 32.5461μs 30.7256 KOps/s 27.6385 KOps/s $\textbf{\color{#35bf28}+11.17\%}$
test_tdseq 34.1840μs 19.5681μs 51.1037 KOps/s 46.4729 KOps/s $\textbf{\color{#35bf28}+9.96\%}$
test_tdseq_dispatch 62.8480μs 36.6860μs 27.2584 KOps/s 24.5460 KOps/s $\textbf{\color{#35bf28}+11.05\%}$
test_instantiation_functorch 2.0046ms 1.3238ms 755.4205 Ops/s 761.9084 Ops/s $\color{#d91a1a}-0.85\%$
test_instantiation_td 1.4985ms 1.0090ms 991.0590 Ops/s 1.0062 KOps/s $\color{#d91a1a}-1.50\%$
test_exec_functorch 0.4072ms 0.1621ms 6.1674 KOps/s 6.3758 KOps/s $\color{#d91a1a}-3.27\%$
test_exec_functional_call 0.3582ms 0.1544ms 6.4764 KOps/s 6.6405 KOps/s $\color{#d91a1a}-2.47\%$
test_exec_td 0.3111ms 0.1491ms 6.7052 KOps/s 6.9429 KOps/s $\color{#d91a1a}-3.42\%$
test_exec_td_decorator 0.6875ms 0.1999ms 5.0013 KOps/s 5.1445 KOps/s $\color{#d91a1a}-2.78\%$
test_vmap_mlp_speed[True-True] 0.7465ms 0.4714ms 2.1212 KOps/s 2.0851 KOps/s $\color{#35bf28}+1.74\%$
test_vmap_mlp_speed[True-False] 0.8433ms 0.4685ms 2.1343 KOps/s 2.0968 KOps/s $\color{#35bf28}+1.79\%$
test_vmap_mlp_speed[False-True] 0.4846ms 0.3833ms 2.6088 KOps/s 2.5821 KOps/s $\color{#35bf28}+1.04\%$
test_vmap_mlp_speed[False-False] 0.6615ms 0.3842ms 2.6028 KOps/s 2.5999 KOps/s $\color{#35bf28}+0.11\%$
test_vmap_mlp_speed_decorator[True-True] 1.0796ms 0.4922ms 2.0317 KOps/s 2.0337 KOps/s $\color{#d91a1a}-0.10\%$
test_vmap_mlp_speed_decorator[True-False] 0.6847ms 0.4950ms 2.0203 KOps/s 2.0485 KOps/s $\color{#d91a1a}-1.37\%$
test_vmap_mlp_speed_decorator[False-True] 0.6350ms 0.4035ms 2.4782 KOps/s 2.5382 KOps/s $\color{#d91a1a}-2.36\%$
test_vmap_mlp_speed_decorator[False-False] 0.6755ms 0.4022ms 2.4863 KOps/s 2.5421 KOps/s $\color{#d91a1a}-2.19\%$
test_to_module_speed[True] 1.9542ms 1.3838ms 722.6613 Ops/s 727.3774 Ops/s $\color{#d91a1a}-0.65\%$
test_to_module_speed[False] 2.1545ms 1.3680ms 730.9965 Ops/s 732.5269 Ops/s $\color{#d91a1a}-0.21\%$

@vmoens vmoens marked this pull request as ready for review March 4, 2024 11:54
@vmoens vmoens added the enhancement New feature or request label Mar 4, 2024
@vmoens vmoens merged commit e0ccfcf into main Mar 4, 2024
45 of 48 checks passed
@vmoens vmoens deleted the mp-start-map branch March 4, 2024 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants