Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Best intention stack #605

Merged
merged 4 commits into from
Jan 4, 2024
Merged

[Feature] Best intention stack #605

merged 4 commits into from
Jan 4, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 3, 2024

Description

Allows torch.stack to return a TensorDict whenever possible, and a LazyStackedTensorDict otherwise.
This is aimed at simplifying stacking tensordicts together while preserving the features of LazyStackedTDs.

Currently, this behaviour is only enabled when set_lazy_legacy(False) is called. In the future, lazy_legacy() will be False by default making this behaviour the default in tensordict.

In the PyTorch PR, this will be the only accepted behaviour of torch.stack.

cc @shagunsodhani @matteobettini

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 3, 2024
@vmoens vmoens added the enhancement New feature or request label Jan 3, 2024
Copy link

github-actions bot commented Jan 3, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 120. Improved: $\large\color{#35bf28}15$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 38.0710μs 15.3500μs 65.1465 KOps/s 64.9928 KOps/s $\color{#35bf28}+0.24\%$
test_plain_set_stack_nested 0.1900ms 0.1409ms 7.0992 KOps/s 7.0443 KOps/s $\color{#35bf28}+0.78\%$
test_plain_set_nested_inplace 66.5540μs 17.7125μs 56.4574 KOps/s 55.9344 KOps/s $\color{#35bf28}+0.94\%$
test_plain_set_stack_nested_inplace 0.2342ms 0.1737ms 5.7571 KOps/s 5.6624 KOps/s $\color{#35bf28}+1.67\%$
test_items 42.4500μs 2.4554μs 407.2601 KOps/s 403.5771 KOps/s $\color{#35bf28}+0.91\%$
test_items_nested 0.9264ms 0.2774ms 3.6043 KOps/s 3.7259 KOps/s $\color{#d91a1a}-3.26\%$
test_items_nested_locked 0.3360ms 0.2814ms 3.5539 KOps/s 3.7123 KOps/s $\color{#d91a1a}-4.27\%$
test_items_nested_leaf 0.5507ms 0.1697ms 5.8918 KOps/s 6.0207 KOps/s $\color{#d91a1a}-2.14\%$
test_items_stack_nested 1.4115ms 1.3118ms 762.2952 Ops/s 653.8100 Ops/s $\textbf{\color{#35bf28}+16.59\%}$
test_items_stack_nested_leaf 1.2932ms 1.1731ms 852.4091 Ops/s 718.9009 Ops/s $\textbf{\color{#35bf28}+18.57\%}$
test_items_stack_nested_locked 0.9303ms 0.7626ms 1.3112 KOps/s 1.3094 KOps/s $\color{#35bf28}+0.14\%$
test_keys 15.3990μs 4.1254μs 242.3988 KOps/s 260.4388 KOps/s $\textbf{\color{#d91a1a}-6.93\%}$
test_keys_nested 58.7519ms 0.1548ms 6.4617 KOps/s 6.8383 KOps/s $\textbf{\color{#d91a1a}-5.51\%}$
test_keys_nested_locked 0.2108ms 0.1433ms 6.9797 KOps/s 6.8435 KOps/s $\color{#35bf28}+1.99\%$
test_keys_nested_leaf 0.2073ms 0.1264ms 7.9138 KOps/s 7.7185 KOps/s $\color{#35bf28}+2.53\%$
test_keys_stack_nested 1.4105ms 1.2833ms 779.2535 Ops/s 667.1620 Ops/s $\textbf{\color{#35bf28}+16.80\%}$
test_keys_stack_nested_leaf 2.9077ms 1.2916ms 774.2608 Ops/s 676.5470 Ops/s $\textbf{\color{#35bf28}+14.44\%}$
test_keys_stack_nested_locked 1.0680ms 0.6853ms 1.4592 KOps/s 1.4426 KOps/s $\color{#35bf28}+1.15\%$
test_values 7.8245μs 1.1579μs 863.6371 KOps/s 841.2603 KOps/s $\color{#35bf28}+2.66\%$
test_values_nested 0.1139ms 52.1995μs 19.1573 KOps/s 19.2276 KOps/s $\color{#d91a1a}-0.37\%$
test_values_nested_locked 0.1054ms 52.4423μs 19.0686 KOps/s 19.1069 KOps/s $\color{#d91a1a}-0.20\%$
test_values_nested_leaf 0.1067ms 46.1768μs 21.6559 KOps/s 21.5934 KOps/s $\color{#35bf28}+0.29\%$
test_values_stack_nested 1.7035ms 1.0647ms 939.2343 Ops/s 803.2234 Ops/s $\textbf{\color{#35bf28}+16.93\%}$
test_values_stack_nested_leaf 1.2275ms 1.0217ms 978.7632 Ops/s 794.1083 Ops/s $\textbf{\color{#35bf28}+23.25\%}$
test_values_stack_nested_locked 0.9218ms 0.5086ms 1.9663 KOps/s 1.9690 KOps/s $\color{#d91a1a}-0.14\%$
test_membership 17.9930μs 1.3634μs 733.4550 KOps/s 704.0183 KOps/s $\color{#35bf28}+4.18\%$
test_membership_nested 26.9610μs 2.8686μs 348.6003 KOps/s 351.4403 KOps/s $\color{#d91a1a}-0.81\%$
test_membership_nested_leaf 36.5590μs 2.9332μs 340.9296 KOps/s 322.9385 KOps/s $\textbf{\color{#35bf28}+5.57\%}$
test_membership_stacked_nested 36.7090μs 12.0287μs 83.1345 KOps/s 86.8252 KOps/s $\color{#d91a1a}-4.25\%$
test_membership_stacked_nested_leaf 50.9360μs 11.8878μs 84.1199 KOps/s 86.3282 KOps/s $\color{#d91a1a}-2.56\%$
test_membership_nested_last 34.4140μs 6.2125μs 160.9655 KOps/s 168.2317 KOps/s $\color{#d91a1a}-4.32\%$
test_membership_nested_leaf_last 30.2170μs 6.1145μs 163.5457 KOps/s 167.4071 KOps/s $\color{#d91a1a}-2.31\%$
test_membership_stacked_nested_last 0.3199ms 0.1676ms 5.9676 KOps/s 6.0002 KOps/s $\color{#d91a1a}-0.54\%$
test_membership_stacked_nested_leaf_last 45.0040μs 13.6947μs 73.0208 KOps/s 74.2207 KOps/s $\color{#d91a1a}-1.62\%$
test_nested_getleaf 42.4700μs 10.5256μs 95.0066 KOps/s 94.9275 KOps/s $\color{#35bf28}+0.08\%$
test_nested_get 30.5270μs 9.9618μs 100.3834 KOps/s 99.4502 KOps/s $\color{#35bf28}+0.94\%$
test_stacked_getleaf 1.0050ms 0.4746ms 2.1068 KOps/s 1.4741 KOps/s $\textbf{\color{#35bf28}+42.93\%}$
test_stacked_get 0.5319ms 0.4406ms 2.2697 KOps/s 1.5468 KOps/s $\textbf{\color{#35bf28}+46.73\%}$
test_nested_getitemleaf 35.6670μs 10.6005μs 94.3356 KOps/s 94.1845 KOps/s $\color{#35bf28}+0.16\%$
test_nested_getitem 38.2320μs 10.0927μs 99.0812 KOps/s 99.3653 KOps/s $\color{#d91a1a}-0.29\%$
test_stacked_getitemleaf 0.5644ms 0.4762ms 2.0998 KOps/s 1.4807 KOps/s $\textbf{\color{#35bf28}+41.81\%}$
test_stacked_getitem 0.6791ms 0.4418ms 2.2636 KOps/s 1.5441 KOps/s $\textbf{\color{#35bf28}+46.59\%}$
test_lock_nested 2.0547ms 0.4211ms 2.3746 KOps/s 2.4171 KOps/s $\color{#d91a1a}-1.76\%$
test_lock_stack_nested 83.3618ms 6.9231ms 144.4446 Ops/s 148.6991 Ops/s $\color{#d91a1a}-2.86\%$
test_unlock_nested 87.9306ms 0.5156ms 1.9394 KOps/s 2.3686 KOps/s $\textbf{\color{#d91a1a}-18.12\%}$
test_unlock_stack_nested 89.3231ms 6.6747ms 149.8191 Ops/s 158.7116 Ops/s $\textbf{\color{#d91a1a}-5.60\%}$
test_flatten_speed 0.7433ms 0.3643ms 2.7452 KOps/s 2.6951 KOps/s $\color{#35bf28}+1.86\%$
test_unflatten_speed 0.5608ms 0.4579ms 2.1839 KOps/s 2.2152 KOps/s $\color{#d91a1a}-1.41\%$
test_common_ops 3.7087ms 0.6749ms 1.4817 KOps/s 1.5469 KOps/s $\color{#d91a1a}-4.22\%$
test_creation 19.5970μs 2.0215μs 494.6839 KOps/s 482.4042 KOps/s $\color{#35bf28}+2.55\%$
test_creation_empty 36.5290μs 7.9083μs 126.4501 KOps/s 129.2805 KOps/s $\color{#d91a1a}-2.19\%$
test_creation_nested_1 37.0290μs 10.7329μs 93.1715 KOps/s 94.8619 KOps/s $\color{#d91a1a}-1.78\%$
test_creation_nested_2 61.5960μs 16.0558μs 62.2827 KOps/s 62.7703 KOps/s $\color{#d91a1a}-0.78\%$
test_clone 0.3205ms 12.3811μs 80.7682 KOps/s 79.4243 KOps/s $\color{#35bf28}+1.69\%$
test_getitem[int] 41.0370μs 12.2847μs 81.4021 KOps/s 84.1615 KOps/s $\color{#d91a1a}-3.28\%$
test_getitem[slice_int] 79.9390μs 24.1394μs 41.4261 KOps/s 40.6568 KOps/s $\color{#35bf28}+1.89\%$
test_getitem[range] 0.1656ms 42.6411μs 23.4516 KOps/s 23.9761 KOps/s $\color{#d91a1a}-2.19\%$
test_getitem[tuple] 51.4670μs 19.5400μs 51.1770 KOps/s 52.2095 KOps/s $\color{#d91a1a}-1.98\%$
test_getitem[list] 97.3930μs 37.5515μs 26.6301 KOps/s 27.3438 KOps/s $\color{#d91a1a}-2.61\%$
test_setitem_dim[int] 50.9450μs 27.9881μs 35.7295 KOps/s 36.4471 KOps/s $\color{#d91a1a}-1.97\%$
test_setitem_dim[slice_int] 0.1484ms 53.1931μs 18.7994 KOps/s 19.0685 KOps/s $\color{#d91a1a}-1.41\%$
test_setitem_dim[range] 0.1415ms 71.1973μs 14.0455 KOps/s 14.2008 KOps/s $\color{#d91a1a}-1.09\%$
test_setitem_dim[tuple] 71.8650μs 41.9820μs 23.8197 KOps/s 24.4175 KOps/s $\color{#d91a1a}-2.45\%$
test_setitem 0.2401ms 17.6607μs 56.6228 KOps/s 57.9751 KOps/s $\color{#d91a1a}-2.33\%$
test_set 0.2551ms 17.1282μs 58.3831 KOps/s 59.0782 KOps/s $\color{#d91a1a}-1.18\%$
test_set_shared 2.1774ms 0.1382ms 7.2336 KOps/s 7.3638 KOps/s $\color{#d91a1a}-1.77\%$
test_update 0.1823ms 18.8046μs 53.1784 KOps/s 54.1414 KOps/s $\color{#d91a1a}-1.78\%$
test_update_nested 60.3530μs 25.6207μs 39.0309 KOps/s 38.8301 KOps/s $\color{#35bf28}+0.52\%$
test_set_nested 0.2256ms 18.7088μs 53.4509 KOps/s 53.3464 KOps/s $\color{#35bf28}+0.20\%$
test_set_nested_new 0.1943ms 22.7785μs 43.9010 KOps/s 44.5875 KOps/s $\color{#d91a1a}-1.54\%$
test_select 0.1054ms 46.0508μs 21.7152 KOps/s 21.5638 KOps/s $\color{#35bf28}+0.70\%$
test_unbind_speed 0.4902ms 0.3430ms 2.9152 KOps/s 2.9452 KOps/s $\color{#d91a1a}-1.02\%$
test_unbind_speed_stack0 72.9809ms 4.5104ms 221.7121 Ops/s 250.8482 Ops/s $\textbf{\color{#d91a1a}-11.62\%}$
test_unbind_speed_stack1 2.7151μs 0.6338μs 1.5778 MOps/s 1.5590 MOps/s $\color{#35bf28}+1.21\%$
test_split 3.2736ms 1.5504ms 644.9797 Ops/s 581.2285 Ops/s $\textbf{\color{#35bf28}+10.97\%}$
test_chunk 64.3080ms 1.6447ms 608.0242 Ops/s 590.6739 Ops/s $\color{#35bf28}+2.94\%$
test_creation[device0] 3.4212ms 0.2929ms 3.4144 KOps/s 3.4191 KOps/s $\color{#d91a1a}-0.14\%$
test_creation_from_tensor 2.9263ms 0.3294ms 3.0357 KOps/s 3.0491 KOps/s $\color{#d91a1a}-0.44\%$
test_add_one[memmap_tensor0] 0.3874ms 25.4119μs 39.3517 KOps/s 38.6460 KOps/s $\color{#35bf28}+1.83\%$
test_contiguous[memmap_tensor0] 46.3760μs 5.9241μs 168.8026 KOps/s 167.4780 KOps/s $\color{#35bf28}+0.79\%$
test_stack[memmap_tensor0] 72.5360μs 19.6211μs 50.9656 KOps/s 50.8983 KOps/s $\color{#35bf28}+0.13\%$
test_memmaptd_index 0.2810ms 0.1999ms 5.0028 KOps/s 4.9361 KOps/s $\color{#35bf28}+1.35\%$
test_memmaptd_index_astensor 0.4211ms 0.2583ms 3.8722 KOps/s 3.8219 KOps/s $\color{#35bf28}+1.32\%$
test_memmaptd_index_op 1.1640ms 0.5085ms 1.9667 KOps/s 1.9260 KOps/s $\color{#35bf28}+2.11\%$
test_serialize_model 0.1655s 0.1047s 9.5495 Ops/s 9.9410 Ops/s $\color{#d91a1a}-3.94\%$
test_serialize_model_filesystem 98.6964ms 91.5896ms 10.9183 Ops/s 10.7360 Ops/s $\color{#35bf28}+1.70\%$
test_serialize_model_pickle 0.4520s 0.3816s 2.6203 Ops/s 2.5937 Ops/s $\color{#35bf28}+1.03\%$
test_serialize_weights 0.1039s 95.9982ms 10.4169 Ops/s 9.4540 Ops/s $\textbf{\color{#35bf28}+10.18\%}$
test_serialize_weights_filesystem 0.1627s 96.3534ms 10.3785 Ops/s 10.8085 Ops/s $\color{#d91a1a}-3.98\%$
test_serialize_weights_returnearly 0.1265s 0.1203s 8.3105 Ops/s 7.5560 Ops/s $\textbf{\color{#35bf28}+9.99\%}$
test_serialize_weights_pickle 1.1563s 0.6685s 1.4959 Ops/s 1.3274 Ops/s $\textbf{\color{#35bf28}+12.69\%}$
test_reshape_pytree 75.9520μs 23.4755μs 42.5975 KOps/s 42.6020 KOps/s $\color{#d91a1a}-0.01\%$
test_reshape_td 73.3670μs 30.4465μs 32.8445 KOps/s 32.4895 KOps/s $\color{#35bf28}+1.09\%$
test_view_pytree 69.0900μs 23.3036μs 42.9118 KOps/s 43.0073 KOps/s $\color{#d91a1a}-0.22\%$
test_view_td 27.1910μs 4.8877μs 204.5950 KOps/s 201.1965 KOps/s $\color{#35bf28}+1.69\%$
test_unbind_pytree 58.9800μs 26.6190μs 37.5672 KOps/s 37.2769 KOps/s $\color{#35bf28}+0.78\%$
test_unbind_td 87.4840μs 55.4662μs 18.0290 KOps/s 18.0139 KOps/s $\color{#35bf28}+0.08\%$
test_split_pytree 54.5820μs 26.2997μs 38.0233 KOps/s 38.0000 KOps/s $\color{#35bf28}+0.06\%$
test_split_td 0.5898ms 43.7534μs 22.8554 KOps/s 22.9920 KOps/s $\color{#d91a1a}-0.59\%$
test_add_pytree 0.1217ms 32.2738μs 30.9849 KOps/s 31.2496 KOps/s $\color{#d91a1a}-0.85\%$
test_add_td 94.3070μs 44.7293μs 22.3567 KOps/s 22.7119 KOps/s $\color{#d91a1a}-1.56\%$
test_distributed 24.2250μs 6.0745μs 164.6221 KOps/s 163.0276 KOps/s $\color{#35bf28}+0.98\%$
test_tdmodule 0.3909ms 21.3623μs 46.8113 KOps/s 46.5173 KOps/s $\color{#35bf28}+0.63\%$
test_tdmodule_dispatch 0.1953ms 37.8466μs 26.4225 KOps/s 26.2346 KOps/s $\color{#35bf28}+0.72\%$
test_tdseq 42.6200μs 24.2654μs 41.2109 KOps/s 41.4538 KOps/s $\color{#d91a1a}-0.59\%$
test_tdseq_dispatch 0.4367ms 43.5119μs 22.9822 KOps/s 23.6098 KOps/s $\color{#d91a1a}-2.66\%$
test_instantiation_functorch 1.4698ms 1.3096ms 763.6160 Ops/s 760.5597 Ops/s $\color{#35bf28}+0.40\%$
test_instantiation_td 74.1761ms 1.0829ms 923.4041 Ops/s 982.1427 Ops/s $\textbf{\color{#d91a1a}-5.98\%}$
test_exec_functorch 0.3533ms 0.1574ms 6.3537 KOps/s 6.1597 KOps/s $\color{#35bf28}+3.15\%$
test_exec_functional_call 0.3191ms 0.1449ms 6.9033 KOps/s 6.6778 KOps/s $\color{#35bf28}+3.38\%$
test_exec_td 0.3552ms 0.1425ms 7.0167 KOps/s 6.9092 KOps/s $\color{#35bf28}+1.56\%$
test_exec_td_decorator 1.0449ms 0.1737ms 5.7580 KOps/s 5.6540 KOps/s $\color{#35bf28}+1.84\%$
test_vmap_mlp_speed[True-True] 1.4223ms 0.8992ms 1.1121 KOps/s 1.0957 KOps/s $\color{#35bf28}+1.50\%$
test_vmap_mlp_speed[True-False] 0.7336ms 0.4754ms 2.1033 KOps/s 2.1246 KOps/s $\color{#d91a1a}-1.00\%$
test_vmap_mlp_speed[False-True] 1.0163ms 0.7803ms 1.2816 KOps/s 1.2542 KOps/s $\color{#35bf28}+2.18\%$
test_vmap_mlp_speed[False-False] 0.6017ms 0.3925ms 2.5476 KOps/s 2.5976 KOps/s $\color{#d91a1a}-1.93\%$
test_vmap_mlp_speed_decorator[True-True] 3.6907ms 1.7821ms 561.1316 Ops/s 550.9478 Ops/s $\color{#35bf28}+1.85\%$
test_vmap_mlp_speed_decorator[True-False] 0.8341ms 0.5145ms 1.9438 KOps/s 1.9581 KOps/s $\color{#d91a1a}-0.73\%$
test_vmap_mlp_speed_decorator[False-True] 2.4271ms 1.4927ms 669.9204 Ops/s 659.8723 Ops/s $\color{#35bf28}+1.52\%$
test_vmap_mlp_speed_decorator[False-False] 0.7060ms 0.4002ms 2.4985 KOps/s 2.5354 KOps/s $\color{#d91a1a}-1.46\%$

@vmoens vmoens merged commit 4151a83 into main Jan 4, 2024
43 of 45 checks passed
@vmoens vmoens deleted the best-intention-stack branch January 4, 2024 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants