Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] inplace to_module #610

Merged
merged 3 commits into from
Jan 5, 2024
Merged

[Feature] inplace to_module #610

merged 3 commits into from
Jan 5, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 5, 2024

Allows to_module to write tensors in-place.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 5, 2024
@vmoens vmoens added the enhancement New feature or request label Jan 5, 2024
Copy link

github-actions bot commented Jan 5, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 120. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 31.3390μs 15.5836μs 64.1700 KOps/s 63.8748 KOps/s $\color{#35bf28}+0.46\%$
test_plain_set_stack_nested 0.1948ms 0.1426ms 7.0130 KOps/s 7.0588 KOps/s $\color{#d91a1a}-0.65\%$
test_plain_set_nested_inplace 63.2390μs 17.9156μs 55.8173 KOps/s 55.4594 KOps/s $\color{#35bf28}+0.65\%$
test_plain_set_stack_nested_inplace 0.2320ms 0.1741ms 5.7449 KOps/s 5.6606 KOps/s $\color{#35bf28}+1.49\%$
test_items 17.5230μs 2.5462μs 392.7437 KOps/s 398.5276 KOps/s $\color{#d91a1a}-1.45\%$
test_items_nested 0.3926ms 0.2659ms 3.7613 KOps/s 3.7150 KOps/s $\color{#35bf28}+1.25\%$
test_items_nested_locked 0.5567ms 0.2672ms 3.7431 KOps/s 3.6776 KOps/s $\color{#35bf28}+1.78\%$
test_items_nested_leaf 1.1396ms 0.1762ms 5.6750 KOps/s 5.9643 KOps/s $\color{#d91a1a}-4.85\%$
test_items_stack_nested 1.4327ms 1.3073ms 764.9628 Ops/s 752.5044 Ops/s $\color{#35bf28}+1.66\%$
test_items_stack_nested_leaf 1.2740ms 1.1788ms 848.3528 Ops/s 841.3675 Ops/s $\color{#35bf28}+0.83\%$
test_items_stack_nested_locked 0.8622ms 0.7567ms 1.3215 KOps/s 1.2934 KOps/s $\color{#35bf28}+2.17\%$
test_keys 18.6050μs 3.9125μs 255.5913 KOps/s 255.3199 KOps/s $\color{#35bf28}+0.11\%$
test_keys_nested 50.3692ms 0.1589ms 6.2914 KOps/s 6.7272 KOps/s $\textbf{\color{#d91a1a}-6.48\%}$
test_keys_nested_locked 0.1992ms 0.1486ms 6.7290 KOps/s 6.6958 KOps/s $\color{#35bf28}+0.49\%$
test_keys_nested_leaf 0.1997ms 0.1295ms 7.7238 KOps/s 7.6947 KOps/s $\color{#35bf28}+0.38\%$
test_keys_stack_nested 2.4466ms 1.2746ms 784.5386 Ops/s 781.1785 Ops/s $\color{#35bf28}+0.43\%$
test_keys_stack_nested_leaf 1.8233ms 1.2637ms 791.3312 Ops/s 785.5006 Ops/s $\color{#35bf28}+0.74\%$
test_keys_stack_nested_locked 5.2063ms 0.7059ms 1.4166 KOps/s 1.4264 KOps/s $\color{#d91a1a}-0.69\%$
test_values 8.0232μs 1.1358μs 880.4555 KOps/s 833.2648 KOps/s $\textbf{\color{#35bf28}+5.66\%}$
test_values_nested 0.1031ms 52.0042μs 19.2292 KOps/s 19.3325 KOps/s $\color{#d91a1a}-0.53\%$
test_values_nested_locked 99.8880μs 52.3951μs 19.0857 KOps/s 19.1451 KOps/s $\color{#d91a1a}-0.31\%$
test_values_nested_leaf 0.1201ms 46.8142μs 21.3611 KOps/s 21.8429 KOps/s $\color{#d91a1a}-2.21\%$
test_values_stack_nested 1.6950ms 1.0459ms 956.0710 Ops/s 946.6361 Ops/s $\color{#35bf28}+1.00\%$
test_values_stack_nested_leaf 1.1340ms 1.0271ms 973.5986 Ops/s 961.9214 Ops/s $\color{#35bf28}+1.21\%$
test_values_stack_nested_locked 0.8717ms 0.5052ms 1.9796 KOps/s 1.9165 KOps/s $\color{#35bf28}+3.29\%$
test_membership 35.9780μs 1.3743μs 727.6463 KOps/s 757.3635 KOps/s $\color{#d91a1a}-3.92\%$
test_membership_nested 19.9480μs 2.8709μs 348.3212 KOps/s 347.3336 KOps/s $\color{#35bf28}+0.28\%$
test_membership_nested_leaf 37.4900μs 2.8765μs 347.6480 KOps/s 342.3955 KOps/s $\color{#35bf28}+1.53\%$
test_membership_stacked_nested 28.8040μs 11.9950μs 83.3681 KOps/s 82.6833 KOps/s $\color{#35bf28}+0.83\%$
test_membership_stacked_nested_leaf 45.2950μs 12.0510μs 82.9804 KOps/s 82.1650 KOps/s $\color{#35bf28}+0.99\%$
test_membership_nested_last 46.9180μs 6.0310μs 165.8105 KOps/s 167.4666 KOps/s $\color{#d91a1a}-0.99\%$
test_membership_nested_leaf_last 26.6400μs 6.0588μs 165.0481 KOps/s 167.6926 KOps/s $\color{#d91a1a}-1.58\%$
test_membership_stacked_nested_last 0.2239ms 0.1673ms 5.9759 KOps/s 5.9813 KOps/s $\color{#d91a1a}-0.09\%$
test_membership_stacked_nested_leaf_last 58.9910μs 14.0671μs 71.0878 KOps/s 70.5914 KOps/s $\color{#35bf28}+0.70\%$
test_nested_getleaf 35.2070μs 10.6989μs 93.4673 KOps/s 94.2008 KOps/s $\color{#d91a1a}-0.78\%$
test_nested_get 30.8780μs 10.0835μs 99.1715 KOps/s 99.0649 KOps/s $\color{#35bf28}+0.11\%$
test_stacked_getleaf 0.5962ms 0.4660ms 2.1459 KOps/s 2.1388 KOps/s $\color{#35bf28}+0.33\%$
test_stacked_get 0.7765ms 0.4371ms 2.2879 KOps/s 2.2742 KOps/s $\color{#35bf28}+0.60\%$
test_nested_getitemleaf 54.1720μs 10.5839μs 94.4833 KOps/s 93.2539 KOps/s $\color{#35bf28}+1.32\%$
test_nested_getitem 48.1810μs 10.0355μs 99.6466 KOps/s 99.2503 KOps/s $\color{#35bf28}+0.40\%$
test_stacked_getitemleaf 0.8406ms 0.4685ms 2.1345 KOps/s 2.1220 KOps/s $\color{#35bf28}+0.59\%$
test_stacked_getitem 0.7172ms 0.4381ms 2.2828 KOps/s 2.2686 KOps/s $\color{#35bf28}+0.63\%$
test_lock_nested 1.2448ms 0.4128ms 2.4222 KOps/s 2.4009 KOps/s $\color{#35bf28}+0.89\%$
test_lock_stack_nested 77.0164ms 6.3670ms 157.0588 Ops/s 155.4492 Ops/s $\color{#35bf28}+1.04\%$
test_unlock_nested 66.7179ms 0.4840ms 2.0663 KOps/s 2.3482 KOps/s $\textbf{\color{#d91a1a}-12.01\%}$
test_unlock_stack_nested 73.7984ms 6.0421ms 165.5061 Ops/s 163.8035 Ops/s $\color{#35bf28}+1.04\%$
test_flatten_speed 0.6382ms 0.3686ms 2.7129 KOps/s 2.7017 KOps/s $\color{#35bf28}+0.42\%$
test_unflatten_speed 0.9484ms 0.4523ms 2.2108 KOps/s 2.2313 KOps/s $\color{#d91a1a}-0.92\%$
test_common_ops 4.7481ms 0.6565ms 1.5231 KOps/s 1.4978 KOps/s $\color{#35bf28}+1.69\%$
test_creation 16.7710μs 1.9427μs 514.7372 KOps/s 501.6324 KOps/s $\color{#35bf28}+2.61\%$
test_creation_empty 26.3400μs 7.7181μs 129.5650 KOps/s 122.0734 KOps/s $\textbf{\color{#35bf28}+6.14\%}$
test_creation_nested_1 33.7240μs 10.6103μs 94.2476 KOps/s 91.7586 KOps/s $\color{#35bf28}+2.71\%$
test_creation_nested_2 69.5810μs 15.7877μs 63.3406 KOps/s 58.6848 KOps/s $\textbf{\color{#35bf28}+7.93\%}$
test_clone 0.1023ms 12.5682μs 79.5656 KOps/s 78.7840 KOps/s $\color{#35bf28}+0.99\%$
test_getitem[int] 34.3250μs 11.8912μs 84.0955 KOps/s 79.9204 KOps/s $\textbf{\color{#35bf28}+5.22\%}$
test_getitem[slice_int] 84.4290μs 23.4317μs 42.6772 KOps/s 41.8593 KOps/s $\color{#35bf28}+1.95\%$
test_getitem[range] 0.1176ms 44.2870μs 22.5800 KOps/s 23.2440 KOps/s $\color{#d91a1a}-2.86\%$
test_getitem[tuple] 44.6840μs 18.9664μs 52.7249 KOps/s 51.5002 KOps/s $\color{#35bf28}+2.38\%$
test_getitem[list] 99.7070μs 37.3930μs 26.7430 KOps/s 26.2794 KOps/s $\color{#35bf28}+1.76\%$
test_setitem_dim[int] 51.5470μs 26.5222μs 37.7043 KOps/s 34.9769 KOps/s $\textbf{\color{#35bf28}+7.80\%}$
test_setitem_dim[slice_int] 0.1203ms 52.8620μs 18.9172 KOps/s 18.3631 KOps/s $\color{#35bf28}+3.02\%$
test_setitem_dim[range] 0.1161ms 70.5367μs 14.1770 KOps/s 13.8018 KOps/s $\color{#35bf28}+2.72\%$
test_setitem_dim[tuple] 78.0270μs 40.6446μs 24.6035 KOps/s 22.4314 KOps/s $\textbf{\color{#35bf28}+9.68\%}$
test_setitem 0.1409ms 17.0645μs 58.6011 KOps/s 54.8712 KOps/s $\textbf{\color{#35bf28}+6.80\%}$
test_set 0.1172ms 16.5621μs 60.3790 KOps/s 56.9595 KOps/s $\textbf{\color{#35bf28}+6.00\%}$
test_set_shared 6.9972ms 0.1382ms 7.2359 KOps/s 7.1438 KOps/s $\color{#35bf28}+1.29\%$
test_update 0.1266ms 18.4173μs 54.2968 KOps/s 51.4774 KOps/s $\textbf{\color{#35bf28}+5.48\%}$
test_update_nested 0.1413ms 25.7088μs 38.8972 KOps/s 37.6592 KOps/s $\color{#35bf28}+3.29\%$
test_set_nested 0.1235ms 18.5587μs 53.8830 KOps/s 51.5273 KOps/s $\color{#35bf28}+4.57\%$
test_set_nested_new 0.1336ms 22.3316μs 44.7796 KOps/s 42.7529 KOps/s $\color{#35bf28}+4.74\%$
test_select 0.1207ms 46.7060μs 21.4105 KOps/s 21.1261 KOps/s $\color{#35bf28}+1.35\%$
test_unbind_speed 0.4996ms 0.3387ms 2.9524 KOps/s 2.8890 KOps/s $\color{#35bf28}+2.19\%$
test_unbind_speed_stack0 71.1752ms 4.2798ms 233.6541 Ops/s 238.8738 Ops/s $\color{#d91a1a}-2.19\%$
test_unbind_speed_stack1 2.7728μs 0.6456μs 1.5488 MOps/s 1.5853 MOps/s $\color{#d91a1a}-2.30\%$
test_split 1.6159ms 1.5430ms 648.0863 Ops/s 588.2220 Ops/s $\textbf{\color{#35bf28}+10.18\%}$
test_chunk 66.5630ms 1.6356ms 611.4123 Ops/s 597.7710 Ops/s $\color{#35bf28}+2.28\%$
test_creation[device0] 3.3799ms 0.2960ms 3.3780 KOps/s 3.4331 KOps/s $\color{#d91a1a}-1.60\%$
test_creation_from_tensor 66.2213ms 0.3642ms 2.7458 KOps/s 3.0539 KOps/s $\textbf{\color{#d91a1a}-10.09\%}$
test_add_one[memmap_tensor0] 0.2822ms 25.0403μs 39.9356 KOps/s 29.3441 KOps/s $\textbf{\color{#35bf28}+36.09\%}$
test_contiguous[memmap_tensor0] 26.7200μs 5.8796μs 170.0804 KOps/s 174.2072 KOps/s $\color{#d91a1a}-2.37\%$
test_stack[memmap_tensor0] 51.0760μs 19.5590μs 51.1274 KOps/s 51.0145 KOps/s $\color{#35bf28}+0.22\%$
test_memmaptd_index 0.3013ms 0.1925ms 5.1958 KOps/s 5.0513 KOps/s $\color{#35bf28}+2.86\%$
test_memmaptd_index_astensor 0.4967ms 0.2508ms 3.9867 KOps/s 3.9123 KOps/s $\color{#35bf28}+1.90\%$
test_memmaptd_index_op 0.9956ms 0.4850ms 2.0620 KOps/s 1.9807 KOps/s $\color{#35bf28}+4.10\%$
test_serialize_model 0.1653s 0.1047s 9.5510 Ops/s 9.9819 Ops/s $\color{#d91a1a}-4.32\%$
test_serialize_model_filesystem 98.1557ms 91.8300ms 10.8897 Ops/s 10.6636 Ops/s $\color{#35bf28}+2.12\%$
test_serialize_model_pickle 0.4479s 0.3817s 2.6198 Ops/s 2.6130 Ops/s $\color{#35bf28}+0.26\%$
test_serialize_weights 96.2699ms 93.5229ms 10.6926 Ops/s 9.4470 Ops/s $\textbf{\color{#35bf28}+13.18\%}$
test_serialize_weights_filesystem 0.1589s 97.0804ms 10.3007 Ops/s 10.2041 Ops/s $\color{#35bf28}+0.95\%$
test_serialize_weights_returnearly 0.1783s 0.1276s 7.8341 Ops/s 8.2044 Ops/s $\color{#d91a1a}-4.51\%$
test_serialize_weights_pickle 1.1604s 0.6491s 1.5406 Ops/s 2.0662 Ops/s $\textbf{\color{#d91a1a}-25.44\%}$
test_reshape_pytree 54.9740μs 23.5764μs 42.4153 KOps/s 42.4681 KOps/s $\color{#d91a1a}-0.12\%$
test_reshape_td 58.8810μs 30.6858μs 32.5884 KOps/s 33.5099 KOps/s $\color{#d91a1a}-2.75\%$
test_view_pytree 80.5820μs 23.4322μs 42.6763 KOps/s 42.9318 KOps/s $\color{#d91a1a}-0.60\%$
test_view_td 63.0370μs 4.8700μs 205.3386 KOps/s 206.4646 KOps/s $\color{#d91a1a}-0.55\%$
test_unbind_pytree 56.7870μs 26.0398μs 38.4027 KOps/s 37.6495 KOps/s $\color{#35bf28}+2.00\%$
test_unbind_td 0.1203ms 55.4112μs 18.0469 KOps/s 18.1243 KOps/s $\color{#d91a1a}-0.43\%$
test_split_pytree 62.8780μs 26.5273μs 37.6970 KOps/s 38.2006 KOps/s $\color{#d91a1a}-1.32\%$
test_split_td 0.5552ms 43.2915μs 23.0992 KOps/s 22.6622 KOps/s $\color{#35bf28}+1.93\%$
test_add_pytree 0.1040ms 32.0347μs 31.2162 KOps/s 30.5618 KOps/s $\color{#35bf28}+2.14\%$
test_add_td 0.1072ms 42.8259μs 23.3504 KOps/s 22.0961 KOps/s $\textbf{\color{#35bf28}+5.68\%}$
test_distributed 19.5970μs 5.9364μs 168.4531 KOps/s 166.4235 KOps/s $\color{#35bf28}+1.22\%$
test_tdmodule 0.9021ms 21.9435μs 45.5715 KOps/s 46.7917 KOps/s $\color{#d91a1a}-2.61\%$
test_tdmodule_dispatch 0.1748ms 37.9904μs 26.3225 KOps/s 25.1699 KOps/s $\color{#35bf28}+4.58\%$
test_tdseq 56.5060μs 24.2363μs 41.2604 KOps/s 39.5901 KOps/s $\color{#35bf28}+4.22\%$
test_tdseq_dispatch 0.1357ms 42.1431μs 23.7287 KOps/s 23.0886 KOps/s $\color{#35bf28}+2.77\%$
test_instantiation_functorch 2.0637ms 1.2886ms 776.0438 Ops/s 777.4379 Ops/s $\color{#d91a1a}-0.18\%$
test_instantiation_td 1.4324ms 0.9895ms 1.0106 KOps/s 994.3935 Ops/s $\color{#35bf28}+1.63\%$
test_exec_functorch 0.2777ms 0.1565ms 6.3885 KOps/s 6.3230 KOps/s $\color{#35bf28}+1.04\%$
test_exec_functional_call 0.2311ms 0.1481ms 6.7500 KOps/s 6.8421 KOps/s $\color{#d91a1a}-1.35\%$
test_exec_td 0.2151ms 0.1418ms 7.0535 KOps/s 6.9617 KOps/s $\color{#35bf28}+1.32\%$
test_exec_td_decorator 0.6322ms 0.1748ms 5.7218 KOps/s 5.7667 KOps/s $\color{#d91a1a}-0.78\%$
test_vmap_mlp_speed[True-True] 0.9490ms 0.8609ms 1.1616 KOps/s 1.1326 KOps/s $\color{#35bf28}+2.56\%$
test_vmap_mlp_speed[True-False] 0.8443ms 0.4595ms 2.1763 KOps/s 2.1200 KOps/s $\color{#35bf28}+2.66\%$
test_vmap_mlp_speed[False-True] 1.2431ms 0.7597ms 1.3163 KOps/s 1.2980 KOps/s $\color{#35bf28}+1.41\%$
test_vmap_mlp_speed[False-False] 0.8438ms 0.3846ms 2.6000 KOps/s 2.5339 KOps/s $\color{#35bf28}+2.61\%$
test_vmap_mlp_speed_decorator[True-True] 2.5624ms 1.7151ms 583.0654 Ops/s 573.6739 Ops/s $\color{#35bf28}+1.64\%$
test_vmap_mlp_speed_decorator[True-False] 0.9792ms 0.5086ms 1.9660 KOps/s 1.9299 KOps/s $\color{#35bf28}+1.87\%$
test_vmap_mlp_speed_decorator[False-True] 1.9490ms 1.4468ms 691.1698 Ops/s 686.8760 Ops/s $\color{#35bf28}+0.63\%$
test_vmap_mlp_speed_decorator[False-False] 0.6833ms 0.3981ms 2.5121 KOps/s 2.4951 KOps/s $\color{#35bf28}+0.68\%$

@vmoens vmoens merged commit 80c7ef9 into main Jan 5, 2024
44 of 47 checks passed
@vmoens vmoens deleted the inplace-to_module branch January 5, 2024 15:20
@@ -308,6 +308,8 @@ def is_empty(self):
def to_module(
self,
module,
*,
inplace: bool | None = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is None supported for backward compatibility ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to catch inplace=True when use_state_dict=False since this is not implemented.

By making it None, and given that inplace is True for state dict, we tell users: you explicitly asked for False but we can't make that happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants