Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Better tensor allocation in memmap_like #543

Merged
merged 2 commits into from
Oct 11, 2023
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Oct 11, 2023

The memmap_like function does not use _set_str, therefore a to(...) can be called on memmap tensors, leading them to be regular tensors.
This PR solves this issue

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 11, 2023
@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 105. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}29$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1253ms 20.3075μs 49.2428 KOps/s 45.9230 KOps/s $\textbf{\color{#35bf28}+7.23\%}$
test_plain_set_stack_nested 1.5014ms 0.2062ms 4.8497 KOps/s 4.3702 KOps/s $\textbf{\color{#35bf28}+10.97\%}$
test_plain_set_nested_inplace 2.0826ms 29.1581μs 34.2958 KOps/s 33.7351 KOps/s $\color{#35bf28}+1.66\%$
test_plain_set_stack_nested_inplace 5.1725ms 0.2842ms 3.5192 KOps/s 3.8326 KOps/s $\textbf{\color{#d91a1a}-8.18\%}$
test_items 2.1288ms 3.9267μs 254.6648 KOps/s 256.3579 KOps/s $\color{#d91a1a}-0.66\%$
test_items_nested 2.6436ms 0.3958ms 2.5268 KOps/s 2.5080 KOps/s $\color{#35bf28}+0.75\%$
test_items_nested_locked 6.5170ms 0.3962ms 2.5241 KOps/s 2.4876 KOps/s $\color{#35bf28}+1.47\%$
test_items_nested_leaf 3.4541ms 0.2445ms 4.0900 KOps/s 4.0189 KOps/s $\color{#35bf28}+1.77\%$
test_items_stack_nested 4.8019ms 2.4640ms 405.8515 Ops/s 392.1884 Ops/s $\color{#35bf28}+3.48\%$
test_items_stack_nested_leaf 7.4461ms 2.3652ms 422.7905 Ops/s 447.0910 Ops/s $\textbf{\color{#d91a1a}-5.44\%}$
test_items_stack_nested_locked 6.9820ms 1.3643ms 732.9740 Ops/s 782.3738 Ops/s $\textbf{\color{#d91a1a}-6.31\%}$
test_keys 1.3312ms 5.3127μs 188.2290 KOps/s 192.7224 KOps/s $\color{#d91a1a}-2.33\%$
test_keys_nested 2.3374ms 0.1892ms 5.2854 KOps/s 4.9317 KOps/s $\textbf{\color{#35bf28}+7.17\%}$
test_keys_nested_locked 9.1093ms 0.1867ms 5.3563 KOps/s 5.2806 KOps/s $\color{#35bf28}+1.43\%$
test_keys_nested_leaf 2.8880ms 0.1853ms 5.3960 KOps/s 5.3787 KOps/s $\color{#35bf28}+0.32\%$
test_keys_stack_nested 7.6089ms 2.3127ms 432.3881 Ops/s 417.1650 Ops/s $\color{#35bf28}+3.65\%$
test_keys_stack_nested_leaf 5.3414ms 2.4231ms 412.6955 Ops/s 417.2773 Ops/s $\color{#d91a1a}-1.10\%$
test_keys_stack_nested_locked 2.6028ms 1.1112ms 899.8917 Ops/s 883.0447 Ops/s $\color{#35bf28}+1.91\%$
test_values 0.3191ms 1.4503μs 689.5179 KOps/s 694.7675 KOps/s $\color{#d91a1a}-0.76\%$
test_values_nested 2.4649ms 69.4485μs 14.3992 KOps/s 13.9199 KOps/s $\color{#35bf28}+3.44\%$
test_values_nested_locked 1.2254ms 66.6422μs 15.0055 KOps/s 14.1805 KOps/s $\textbf{\color{#35bf28}+5.82\%}$
test_values_nested_leaf 4.9540ms 60.7980μs 16.4479 KOps/s 15.6836 KOps/s $\color{#35bf28}+4.87\%$
test_values_stack_nested 6.2717ms 1.9920ms 502.0035 Ops/s 508.0448 Ops/s $\color{#d91a1a}-1.19\%$
test_values_stack_nested_leaf 3.9445ms 1.9755ms 506.2131 Ops/s 494.4853 Ops/s $\color{#35bf28}+2.37\%$
test_values_stack_nested_locked 2.5155ms 0.9026ms 1.1079 KOps/s 1.1423 KOps/s $\color{#d91a1a}-3.02\%$
test_membership 1.4753ms 2.1855μs 457.5635 KOps/s 558.9156 KOps/s $\textbf{\color{#d91a1a}-18.13\%}$
test_membership_nested 1.2034ms 4.3104μs 231.9959 KOps/s 248.5675 KOps/s $\textbf{\color{#d91a1a}-6.67\%}$
test_membership_nested_leaf 0.9206ms 4.3520μs 229.7783 KOps/s 235.5521 KOps/s $\color{#d91a1a}-2.45\%$
test_membership_stacked_nested 1.1331ms 16.2057μs 61.7066 KOps/s 61.8999 KOps/s $\color{#d91a1a}-0.31\%$
test_membership_stacked_nested_leaf 0.4260ms 15.9529μs 62.6846 KOps/s 58.8257 KOps/s $\textbf{\color{#35bf28}+6.56\%}$
test_membership_nested_last 4.5696ms 8.7793μs 113.9049 KOps/s 118.6792 KOps/s $\color{#d91a1a}-4.02\%$
test_membership_nested_leaf_last 1.2039ms 8.7112μs 114.7946 KOps/s 114.0584 KOps/s $\color{#35bf28}+0.65\%$
test_membership_stacked_nested_last 2.9628ms 0.2751ms 3.6356 KOps/s 3.6095 KOps/s $\color{#35bf28}+0.72\%$
test_membership_stacked_nested_leaf_last 1.2531ms 20.0917μs 49.7717 KOps/s 49.3476 KOps/s $\color{#35bf28}+0.86\%$
test_nested_getleaf 1.4224ms 17.0221μs 58.7472 KOps/s 55.0062 KOps/s $\textbf{\color{#35bf28}+6.80\%}$
test_nested_get 0.3420ms 16.0486μs 62.3105 KOps/s 59.8196 KOps/s $\color{#35bf28}+4.16\%$
test_stacked_getleaf 3.2808ms 1.0902ms 917.2474 Ops/s 924.4855 Ops/s $\color{#d91a1a}-0.78\%$
test_stacked_get 6.9611ms 1.0563ms 946.6659 Ops/s 973.6335 Ops/s $\color{#d91a1a}-2.77\%$
test_nested_getitemleaf 1.4626ms 17.1725μs 58.2326 KOps/s 56.3858 KOps/s $\color{#35bf28}+3.28\%$
test_nested_getitem 1.1629ms 16.3148μs 61.2942 KOps/s 60.5595 KOps/s $\color{#35bf28}+1.21\%$
test_stacked_getitemleaf 6.8877ms 1.0980ms 910.7560 Ops/s 893.2261 Ops/s $\color{#35bf28}+1.96\%$
test_stacked_getitem 3.1988ms 1.0213ms 979.1509 Ops/s 1.0193 KOps/s $\color{#d91a1a}-3.94\%$
test_lock_nested 83.5075ms 1.9299ms 518.1509 Ops/s 541.1430 Ops/s $\color{#d91a1a}-4.25\%$
test_lock_stack_nested 0.1116s 24.4534ms 40.8941 Ops/s 39.1593 Ops/s $\color{#35bf28}+4.43\%$
test_unlock_nested 79.7144ms 1.8756ms 533.1647 Ops/s 511.8164 Ops/s $\color{#35bf28}+4.17\%$
test_unlock_stack_nested 0.1217s 25.2719ms 39.5696 Ops/s 38.7493 Ops/s $\color{#35bf28}+2.12\%$
test_flatten_speed 6.2838ms 1.1203ms 892.6516 Ops/s 849.1616 Ops/s $\textbf{\color{#35bf28}+5.12\%}$
test_unflatten_speed 4.8611ms 2.0570ms 486.1485 Ops/s 477.8297 Ops/s $\color{#35bf28}+1.74\%$
test_common_ops 3.7408ms 1.6600ms 602.4168 Ops/s 627.6735 Ops/s $\color{#d91a1a}-4.02\%$
test_creation 0.8265ms 6.5975μs 151.5727 KOps/s 141.8989 KOps/s $\textbf{\color{#35bf28}+6.82\%}$
test_creation_empty 1.2935ms 15.5652μs 64.2458 KOps/s 60.5570 KOps/s $\textbf{\color{#35bf28}+6.09\%}$
test_creation_nested_1 1.5741ms 28.8176μs 34.7010 KOps/s 30.7995 KOps/s $\textbf{\color{#35bf28}+12.67\%}$
test_creation_nested_2 1.0136ms 32.2518μs 31.0060 KOps/s 28.4411 KOps/s $\textbf{\color{#35bf28}+9.02\%}$
test_clone 0.9406ms 33.1811μs 30.1376 KOps/s 30.4864 KOps/s $\color{#d91a1a}-1.14\%$
test_getitem[int] 4.9104ms 38.1325μs 26.2243 KOps/s 26.6930 KOps/s $\color{#d91a1a}-1.76\%$
test_getitem[slice_int] 1.6649ms 79.1019μs 12.6419 KOps/s 12.8225 KOps/s $\color{#d91a1a}-1.41\%$
test_getitem[range] 1.2555ms 0.1380ms 7.2479 KOps/s 8.0378 KOps/s $\textbf{\color{#d91a1a}-9.83\%}$
test_getitem[tuple] 2.5602ms 70.2189μs 14.2412 KOps/s 15.8705 KOps/s $\textbf{\color{#d91a1a}-10.27\%}$
test_getitem[list] 1.4524ms 0.1341ms 7.4592 KOps/s 7.7725 KOps/s $\color{#d91a1a}-4.03\%$
test_setitem_dim[int] 0.8447ms 61.4634μs 16.2698 KOps/s 16.5410 KOps/s $\color{#d91a1a}-1.64\%$
test_setitem_dim[slice_int] 2.6030ms 0.1170ms 8.5487 KOps/s 10.4188 KOps/s $\textbf{\color{#d91a1a}-17.95\%}$
test_setitem_dim[range] 1.7557ms 0.1504ms 6.6478 KOps/s 7.1310 KOps/s $\textbf{\color{#d91a1a}-6.78\%}$
test_setitem_dim[tuple] 2.7720ms 83.3356μs 11.9997 KOps/s 12.1379 KOps/s $\color{#d91a1a}-1.14\%$
test_setitem 2.8571ms 49.2693μs 20.2966 KOps/s 21.8481 KOps/s $\textbf{\color{#d91a1a}-7.10\%}$
test_set 1.9870ms 47.0715μs 21.2443 KOps/s 22.4526 KOps/s $\textbf{\color{#d91a1a}-5.38\%}$
test_set_shared 2.9542ms 0.3275ms 3.0532 KOps/s 2.9161 KOps/s $\color{#35bf28}+4.70\%$
test_update 4.8988ms 55.7000μs 17.9533 KOps/s 19.2858 KOps/s $\textbf{\color{#d91a1a}-6.91\%}$
test_update_nested 3.0663ms 73.0738μs 13.6848 KOps/s 12.8469 KOps/s $\textbf{\color{#35bf28}+6.52\%}$
test_set_nested 8.3263ms 53.5936μs 18.6590 KOps/s 20.1380 KOps/s $\textbf{\color{#d91a1a}-7.34\%}$
test_set_nested_new 3.2019ms 76.6758μs 13.0419 KOps/s 13.8633 KOps/s $\textbf{\color{#d91a1a}-5.92\%}$
test_select 4.1147ms 0.1399ms 7.1479 KOps/s 7.7206 KOps/s $\textbf{\color{#d91a1a}-7.42\%}$
test_unbind_speed 3.7550ms 0.7821ms 1.2786 KOps/s 1.3348 KOps/s $\color{#d91a1a}-4.21\%$
test_unbind_speed_stack0 88.3994ms 10.4818ms 95.4036 Ops/s 96.8713 Ops/s $\color{#d91a1a}-1.52\%$
test_unbind_speed_stack1 4.6152ms 1.2389μs 807.1704 KOps/s 1.0472 MOps/s $\textbf{\color{#d91a1a}-22.92\%}$
test_creation[device0] 6.0431ms 0.6073ms 1.6467 KOps/s 1.6446 KOps/s $\color{#35bf28}+0.13\%$
test_creation_from_tensor 7.9095ms 0.6778ms 1.4755 KOps/s 1.6545 KOps/s $\textbf{\color{#d91a1a}-10.82\%}$
test_add_one[memmap_tensor0] 3.9124ms 72.7928μs 13.7376 KOps/s 15.8867 KOps/s $\textbf{\color{#d91a1a}-13.53\%}$
test_contiguous[memmap_tensor0] 1.0459ms 13.5892μs 73.5880 KOps/s 80.8446 KOps/s $\textbf{\color{#d91a1a}-8.98\%}$
test_stack[memmap_tensor0] 2.2030ms 47.8967μs 20.8783 KOps/s 21.9940 KOps/s $\textbf{\color{#d91a1a}-5.07\%}$
test_memmaptd_index 5.1818ms 0.4187ms 2.3881 KOps/s 2.5861 KOps/s $\textbf{\color{#d91a1a}-7.66\%}$
test_memmaptd_index_astensor 7.4053ms 1.9253ms 519.3883 Ops/s 488.3396 Ops/s $\textbf{\color{#35bf28}+6.36\%}$
test_memmaptd_index_op 7.4785ms 5.1341ms 194.7777 Ops/s 190.9614 Ops/s $\color{#35bf28}+2.00\%$
test_reshape_pytree 2.6013ms 41.5962μs 24.0407 KOps/s 25.0614 KOps/s $\color{#d91a1a}-4.07\%$
test_reshape_td 1.4945ms 52.9956μs 18.8695 KOps/s 18.6939 KOps/s $\color{#35bf28}+0.94\%$
test_view_pytree 2.6258ms 38.8592μs 25.7339 KOps/s 26.5494 KOps/s $\color{#d91a1a}-3.07\%$
test_view_td 0.8918ms 10.0651μs 99.3529 KOps/s 101.9492 KOps/s $\color{#d91a1a}-2.55\%$
test_unbind_pytree 1.0609ms 48.0864μs 20.7959 KOps/s 21.6465 KOps/s $\color{#d91a1a}-3.93\%$
test_unbind_td 2.3601ms 0.1310ms 7.6343 KOps/s 8.3943 KOps/s $\textbf{\color{#d91a1a}-9.05\%}$
test_split_pytree 2.3407ms 51.1323μs 19.5571 KOps/s 22.8530 KOps/s $\textbf{\color{#d91a1a}-14.42\%}$
test_split_td 3.0771ms 0.1546ms 6.4679 KOps/s 6.8749 KOps/s $\textbf{\color{#d91a1a}-5.92\%}$
test_add_pytree 2.8924ms 76.7669μs 13.0264 KOps/s 14.6333 KOps/s $\textbf{\color{#d91a1a}-10.98\%}$
test_add_td 2.6833ms 0.1383ms 7.2291 KOps/s 7.2776 KOps/s $\color{#d91a1a}-0.67\%$
test_distributed 0.7993ms 9.4978μs 105.2871 KOps/s 111.5351 KOps/s $\textbf{\color{#d91a1a}-5.60\%}$
test_tdmodule 0.5962ms 40.5425μs 24.6655 KOps/s 25.6911 KOps/s $\color{#d91a1a}-3.99\%$
test_tdmodule_dispatch 0.8099ms 78.0050μs 12.8197 KOps/s 13.0100 KOps/s $\color{#d91a1a}-1.46\%$
test_tdseq 0.4077ms 48.3397μs 20.6869 KOps/s 22.2486 KOps/s $\textbf{\color{#d91a1a}-7.02\%}$
test_tdseq_dispatch 0.3341ms 95.1994μs 10.5043 KOps/s 10.0374 KOps/s $\color{#35bf28}+4.65\%$
test_instantiation_functorch 6.3699ms 2.2143ms 451.6078 Ops/s 495.9035 Ops/s $\textbf{\color{#d91a1a}-8.93\%}$
test_instantiation_td 4.9764ms 1.6871ms 592.7490 Ops/s 612.6192 Ops/s $\color{#d91a1a}-3.24\%$
test_exec_functorch 1.5201ms 0.3077ms 3.2501 KOps/s 3.3403 KOps/s $\color{#d91a1a}-2.70\%$
test_exec_td 2.7371ms 0.3219ms 3.1065 KOps/s 3.2950 KOps/s $\textbf{\color{#d91a1a}-5.72\%}$
test_vmap_mlp_speed[True-True] 16.3034ms 2.0686ms 483.4082 Ops/s 495.9900 Ops/s $\color{#d91a1a}-2.54\%$
test_vmap_mlp_speed[True-False] 5.8715ms 1.1130ms 898.5097 Ops/s 911.0957 Ops/s $\color{#d91a1a}-1.38\%$
test_vmap_mlp_speed[False-True] 13.7816ms 1.8334ms 545.4367 Ops/s 553.5525 Ops/s $\color{#d91a1a}-1.47\%$
test_vmap_mlp_speed[False-False] 12.0079ms 0.8848ms 1.1302 KOps/s 1.1749 KOps/s $\color{#d91a1a}-3.80\%$

@vmoens vmoens added the bug Something isn't working label Oct 11, 2023
Copy link
Contributor

@matteobettini matteobettini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vmoens vmoens merged commit 6a9f8e3 into main Oct 11, 2023
1 check passed
@vmoens vmoens deleted the fix_memmap_like branch October 11, 2023 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants