Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Track sub-tds in memmap #719

Merged
merged 1 commit into from
Mar 25, 2024
Merged

[BugFix] Track sub-tds in memmap #719

merged 1 commit into from
Mar 25, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Mar 25, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 25, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 39.2730μs 17.1262μs 58.3901 KOps/s 58.2302 KOps/s $\color{#35bf28}+0.27\%$
test_plain_set_stack_nested 50.5630μs 17.2756μs 57.8850 KOps/s 57.6229 KOps/s $\color{#35bf28}+0.45\%$
test_plain_set_nested_inplace 42.7490μs 19.9094μs 50.2275 KOps/s 50.0990 KOps/s $\color{#35bf28}+0.26\%$
test_plain_set_stack_nested_inplace 54.0200μs 19.8798μs 50.3023 KOps/s 51.1738 KOps/s $\color{#d91a1a}-1.70\%$
test_items 17.7330μs 2.3806μs 420.0694 KOps/s 322.7786 KOps/s $\textbf{\color{#35bf28}+30.14\%}$
test_items_nested 0.4939ms 0.2765ms 3.6164 KOps/s 3.7492 KOps/s $\color{#d91a1a}-3.54\%$
test_items_nested_locked 0.4953ms 0.2778ms 3.5993 KOps/s 3.7376 KOps/s $\color{#d91a1a}-3.70\%$
test_items_nested_leaf 0.6085ms 0.1699ms 5.8874 KOps/s 6.0323 KOps/s $\color{#d91a1a}-2.40\%$
test_items_stack_nested 0.4780ms 0.2757ms 3.6272 KOps/s 3.7815 KOps/s $\color{#d91a1a}-4.08\%$
test_items_stack_nested_leaf 0.2810ms 0.1707ms 5.8597 KOps/s 6.0765 KOps/s $\color{#d91a1a}-3.57\%$
test_items_stack_nested_locked 0.5127ms 0.2758ms 3.6262 KOps/s 3.7519 KOps/s $\color{#d91a1a}-3.35\%$
test_keys 21.1590μs 3.8886μs 257.1649 KOps/s 250.8934 KOps/s $\color{#35bf28}+2.50\%$
test_keys_nested 2.1357ms 0.1469ms 6.8051 KOps/s 6.8537 KOps/s $\color{#d91a1a}-0.71\%$
test_keys_nested_locked 0.2779ms 0.1503ms 6.6521 KOps/s 6.6540 KOps/s $\color{#d91a1a}-0.03\%$
test_keys_nested_leaf 40.9393ms 0.1332ms 7.5069 KOps/s 7.9324 KOps/s $\textbf{\color{#d91a1a}-5.36\%}$
test_keys_stack_nested 0.2906ms 0.1494ms 6.6924 KOps/s 6.8568 KOps/s $\color{#d91a1a}-2.40\%$
test_keys_stack_nested_leaf 0.2437ms 0.1310ms 7.6325 KOps/s 7.8457 KOps/s $\color{#d91a1a}-2.72\%$
test_keys_stack_nested_locked 0.2993ms 0.1545ms 6.4714 KOps/s 6.6387 KOps/s $\color{#d91a1a}-2.52\%$
test_values 10.6022μs 1.1705μs 854.3204 KOps/s 867.7216 KOps/s $\color{#d91a1a}-1.54\%$
test_values_nested 0.1064ms 51.4142μs 19.4499 KOps/s 19.7135 KOps/s $\color{#d91a1a}-1.34\%$
test_values_nested_locked 0.1025ms 51.4268μs 19.4451 KOps/s 19.7392 KOps/s $\color{#d91a1a}-1.49\%$
test_values_nested_leaf 0.1112ms 46.2826μs 21.6064 KOps/s 21.7329 KOps/s $\color{#d91a1a}-0.58\%$
test_values_stack_nested 0.1045ms 52.0511μs 19.2119 KOps/s 19.6588 KOps/s $\color{#d91a1a}-2.27\%$
test_values_stack_nested_leaf 96.3490μs 46.2287μs 21.6316 KOps/s 21.9474 KOps/s $\color{#d91a1a}-1.44\%$
test_values_stack_nested_locked 0.1035ms 51.9016μs 19.2672 KOps/s 19.6873 KOps/s $\color{#d91a1a}-2.13\%$
test_membership 15.6590μs 1.3840μs 722.5409 KOps/s 744.6148 KOps/s $\color{#d91a1a}-2.96\%$
test_membership_nested 29.3050μs 3.5482μs 281.8352 KOps/s 287.3479 KOps/s $\color{#d91a1a}-1.92\%$
test_membership_nested_leaf 28.3920μs 3.5929μs 278.3297 KOps/s 265.5025 KOps/s $\color{#35bf28}+4.83\%$
test_membership_stacked_nested 25.2470μs 3.5580μs 281.0587 KOps/s 284.2156 KOps/s $\color{#d91a1a}-1.11\%$
test_membership_stacked_nested_leaf 26.4100μs 3.5229μs 283.8596 KOps/s 285.0852 KOps/s $\color{#d91a1a}-0.43\%$
test_membership_nested_last 40.3850μs 4.3932μs 227.6246 KOps/s 232.8467 KOps/s $\color{#d91a1a}-2.24\%$
test_membership_nested_leaf_last 35.7460μs 4.4323μs 225.6164 KOps/s 234.1556 KOps/s $\color{#d91a1a}-3.65\%$
test_membership_stacked_nested_last 33.2620μs 5.0519μs 197.9467 KOps/s 235.5444 KOps/s $\textbf{\color{#d91a1a}-15.96\%}$
test_membership_stacked_nested_leaf_last 31.0970μs 5.0536μs 197.8792 KOps/s 236.2262 KOps/s $\textbf{\color{#d91a1a}-16.23\%}$
test_nested_getleaf 44.5530μs 10.8081μs 92.5229 KOps/s 94.0960 KOps/s $\color{#d91a1a}-1.67\%$
test_nested_get 35.5960μs 10.2256μs 97.7937 KOps/s 100.3971 KOps/s $\color{#d91a1a}-2.59\%$
test_stacked_getleaf 44.6120μs 10.8280μs 92.3529 KOps/s 95.4735 KOps/s $\color{#d91a1a}-3.27\%$
test_stacked_get 41.1560μs 10.1333μs 98.6849 KOps/s 101.4230 KOps/s $\color{#d91a1a}-2.70\%$
test_nested_getitemleaf 34.3330μs 11.3789μs 87.8823 KOps/s 89.6024 KOps/s $\color{#d91a1a}-1.92\%$
test_nested_getitem 36.2970μs 10.5561μs 94.7321 KOps/s 98.4506 KOps/s $\color{#d91a1a}-3.78\%$
test_stacked_getitemleaf 45.3640μs 11.3270μs 88.2844 KOps/s 90.6913 KOps/s $\color{#d91a1a}-2.65\%$
test_stacked_getitem 35.1660μs 10.3858μs 96.2855 KOps/s 98.1392 KOps/s $\color{#d91a1a}-1.89\%$
test_lock_nested 1.2722ms 0.3484ms 2.8706 KOps/s 2.9065 KOps/s $\color{#d91a1a}-1.23\%$
test_lock_stack_nested 0.4972ms 0.3050ms 3.2792 KOps/s 3.2407 KOps/s $\color{#35bf28}+1.19\%$
test_unlock_nested 91.6536ms 0.4420ms 2.2625 KOps/s 2.3205 KOps/s $\color{#d91a1a}-2.50\%$
test_unlock_stack_nested 0.6956ms 0.3162ms 3.1629 KOps/s 3.1215 KOps/s $\color{#35bf28}+1.33\%$
test_flatten_speed 0.6970ms 0.2662ms 3.7568 KOps/s 3.7808 KOps/s $\color{#d91a1a}-0.64\%$
test_unflatten_speed 0.7176ms 0.4155ms 2.4066 KOps/s 2.4927 KOps/s $\color{#d91a1a}-3.45\%$
test_common_ops 4.2233ms 0.7032ms 1.4221 KOps/s 1.4080 KOps/s $\color{#35bf28}+1.00\%$
test_creation 14.3070μs 1.9009μs 526.0597 KOps/s 542.3143 KOps/s $\color{#d91a1a}-3.00\%$
test_creation_empty 31.4290μs 10.0349μs 99.6525 KOps/s 97.9621 KOps/s $\color{#35bf28}+1.73\%$
test_creation_nested_1 59.1300μs 12.9516μs 77.2105 KOps/s 77.8655 KOps/s $\color{#d91a1a}-0.84\%$
test_creation_nested_2 43.1300μs 16.1389μs 61.9620 KOps/s 61.5548 KOps/s $\color{#35bf28}+0.66\%$
test_clone 77.5340μs 13.5382μs 73.8653 KOps/s 75.3049 KOps/s $\color{#d91a1a}-1.91\%$
test_getitem[int] 34.4940μs 11.6443μs 85.8791 KOps/s 85.2352 KOps/s $\color{#35bf28}+0.76\%$
test_getitem[slice_int] 0.3733ms 22.8819μs 43.7027 KOps/s 44.4690 KOps/s $\color{#d91a1a}-1.72\%$
test_getitem[range] 0.2010ms 42.8901μs 23.3154 KOps/s 23.1431 KOps/s $\color{#35bf28}+0.74\%$
test_getitem[tuple] 53.2700μs 19.0023μs 52.6251 KOps/s 53.9312 KOps/s $\color{#d91a1a}-2.42\%$
test_getitem[list] 0.1090ms 39.0958μs 25.5782 KOps/s 25.8427 KOps/s $\color{#d91a1a}-1.02\%$
test_setitem_dim[int] 67.8460μs 36.4307μs 27.4494 KOps/s 29.4000 KOps/s $\textbf{\color{#d91a1a}-6.63\%}$
test_setitem_dim[slice_int] 0.1070ms 62.1418μs 16.0922 KOps/s 16.3921 KOps/s $\color{#d91a1a}-1.83\%$
test_setitem_dim[range] 0.1401ms 80.9795μs 12.3488 KOps/s 12.5331 KOps/s $\color{#d91a1a}-1.47\%$
test_setitem_dim[tuple] 0.1143ms 50.7164μs 19.7175 KOps/s 20.2155 KOps/s $\color{#d91a1a}-2.46\%$
test_setitem 0.4701ms 20.1684μs 49.5825 KOps/s 49.6277 KOps/s $\color{#d91a1a}-0.09\%$
test_set 62.7260μs 19.4808μs 51.3327 KOps/s 50.7532 KOps/s $\color{#35bf28}+1.14\%$
test_set_shared 3.0435ms 0.1426ms 7.0106 KOps/s 6.8977 KOps/s $\color{#35bf28}+1.64\%$
test_update 0.1167ms 21.1364μs 47.3117 KOps/s 46.6649 KOps/s $\color{#35bf28}+1.39\%$
test_update_nested 0.1072ms 28.6149μs 34.9468 KOps/s 33.8142 KOps/s $\color{#35bf28}+3.35\%$
test_update__nested 0.1010ms 24.4916μs 40.8303 KOps/s 40.7231 KOps/s $\color{#35bf28}+0.26\%$
test_set_nested 83.8360μs 21.4207μs 46.6837 KOps/s 46.5156 KOps/s $\color{#35bf28}+0.36\%$
test_set_nested_new 0.1414ms 24.8572μs 40.2297 KOps/s 38.8690 KOps/s $\color{#35bf28}+3.50\%$
test_select 0.1196ms 41.3834μs 24.1643 KOps/s 25.0740 KOps/s $\color{#d91a1a}-3.63\%$
test_select_nested 3.6570ms 59.9768μs 16.6731 KOps/s 16.6628 KOps/s $\color{#35bf28}+0.06\%$
test_exclude_nested 0.2414ms 0.1205ms 8.2976 KOps/s 8.2477 KOps/s $\color{#35bf28}+0.61\%$
test_empty[True] 0.6447ms 0.4062ms 2.4620 KOps/s 2.3816 KOps/s $\color{#35bf28}+3.38\%$
test_empty[False] 7.9708μs 1.0726μs 932.2748 KOps/s 910.8305 KOps/s $\color{#35bf28}+2.35\%$
test_unbind_speed 0.4078ms 0.2525ms 3.9610 KOps/s 3.9276 KOps/s $\color{#35bf28}+0.85\%$
test_unbind_speed_stack0 0.4875ms 0.2462ms 4.0614 KOps/s 4.0235 KOps/s $\color{#35bf28}+0.94\%$
test_unbind_speed_stack1 0.1271s 0.6945ms 1.4399 KOps/s 1.4251 KOps/s $\color{#35bf28}+1.04\%$
test_split 0.1209s 1.6664ms 600.0780 Ops/s 607.8614 Ops/s $\color{#d91a1a}-1.28\%$
test_chunk 3.8562ms 1.5496ms 645.3399 Ops/s 688.7204 Ops/s $\textbf{\color{#d91a1a}-6.30\%}$
test_creation[device0] 0.2121ms 0.1061ms 9.4258 KOps/s 9.5935 KOps/s $\color{#d91a1a}-1.75\%$
test_creation_from_tensor 4.2205ms 84.8659μs 11.7833 KOps/s 11.7398 KOps/s $\color{#35bf28}+0.37\%$
test_add_one[memmap_tensor0] 77.8340μs 5.7166μs 174.9302 KOps/s 182.9143 KOps/s $\color{#d91a1a}-4.36\%$
test_contiguous[memmap_tensor0] 21.9100μs 0.6555μs 1.5256 MOps/s 1.5630 MOps/s $\color{#d91a1a}-2.39\%$
test_stack[memmap_tensor0] 21.6600μs 3.7209μs 268.7501 KOps/s 275.8831 KOps/s $\color{#d91a1a}-2.59\%$
test_memmaptd_index 0.9783ms 0.2449ms 4.0825 KOps/s 4.1475 KOps/s $\color{#d91a1a}-1.57\%$
test_memmaptd_index_astensor 0.7246ms 0.3095ms 3.2309 KOps/s 3.2726 KOps/s $\color{#d91a1a}-1.27\%$
test_memmaptd_index_op 0.8466ms 0.5983ms 1.6713 KOps/s 1.6677 KOps/s $\color{#35bf28}+0.22\%$
test_serialize_model 0.2393s 0.1193s 8.3797 Ops/s 8.4528 Ops/s $\color{#d91a1a}-0.86\%$
test_serialize_model_pickle 0.4444s 0.3745s 2.6701 Ops/s 2.6063 Ops/s $\color{#35bf28}+2.45\%$
test_serialize_weights 0.1077s 0.1004s 9.9592 Ops/s 10.1001 Ops/s $\color{#d91a1a}-1.40\%$
test_serialize_weights_returnearly 0.1287s 0.1237s 8.0826 Ops/s 7.8977 Ops/s $\color{#35bf28}+2.34\%$
test_serialize_weights_pickle 0.5611s 0.4425s 2.2597 Ops/s 2.3039 Ops/s $\color{#d91a1a}-1.92\%$
test_serialize_weights_filesystem 0.1090s 97.3046ms 10.2770 Ops/s 8.9002 Ops/s $\textbf{\color{#35bf28}+15.47\%}$
test_serialize_model_filesystem 98.1725ms 93.4821ms 10.6972 Ops/s 10.2735 Ops/s $\color{#35bf28}+4.12\%$
test_reshape_pytree 54.0910μs 21.0781μs 47.4426 KOps/s 48.3370 KOps/s $\color{#d91a1a}-1.85\%$
test_reshape_td 65.1210μs 32.5310μs 30.7399 KOps/s 31.3007 KOps/s $\color{#d91a1a}-1.79\%$
test_view_pytree 84.2370μs 20.9970μs 47.6257 KOps/s 48.0818 KOps/s $\color{#d91a1a}-0.95\%$
test_view_td 0.1298s 63.2413μs 15.8125 KOps/s 15.4409 KOps/s $\color{#35bf28}+2.41\%$
test_unbind_pytree 55.3020μs 24.5075μs 40.8038 KOps/s 40.3549 KOps/s $\color{#35bf28}+1.11\%$
test_unbind_td 0.1231ms 37.1059μs 26.9499 KOps/s 26.6159 KOps/s $\color{#35bf28}+1.26\%$
test_split_pytree 67.3350μs 24.5533μs 40.7277 KOps/s 41.1769 KOps/s $\color{#d91a1a}-1.09\%$
test_split_td 0.1262ms 39.5973μs 25.2542 KOps/s 25.2460 KOps/s $\color{#35bf28}+0.03\%$
test_add_pytree 76.8420μs 30.3350μs 32.9652 KOps/s 33.0218 KOps/s $\color{#d91a1a}-0.17\%$
test_add_td 0.1383ms 57.2205μs 17.4763 KOps/s 18.0327 KOps/s $\color{#d91a1a}-3.09\%$
test_distributed 0.2045ms 0.1008ms 9.9206 KOps/s 9.7665 KOps/s $\color{#35bf28}+1.58\%$
test_tdmodule 97.7820μs 17.2399μs 58.0051 KOps/s 56.4327 KOps/s $\color{#35bf28}+2.79\%$
test_tdmodule_dispatch 58.3180μs 33.7538μs 29.6263 KOps/s 28.7204 KOps/s $\color{#35bf28}+3.15\%$
test_tdseq 51.3760μs 20.3421μs 49.1592 KOps/s 47.9519 KOps/s $\color{#35bf28}+2.52\%$
test_tdseq_dispatch 75.8110μs 40.2573μs 24.8402 KOps/s 22.1494 KOps/s $\textbf{\color{#35bf28}+12.15\%}$
test_instantiation_functorch 1.7397ms 1.3212ms 756.9140 Ops/s 750.9721 Ops/s $\color{#35bf28}+0.79\%$
test_instantiation_td 1.5101ms 1.0144ms 985.7641 Ops/s 990.1187 Ops/s $\color{#d91a1a}-0.44\%$
test_exec_functorch 0.3234ms 0.1615ms 6.1932 KOps/s 6.1185 KOps/s $\color{#35bf28}+1.22\%$
test_exec_functional_call 0.4652ms 0.1525ms 6.5554 KOps/s 6.7710 KOps/s $\color{#d91a1a}-3.18\%$
test_exec_td 0.2299ms 0.1460ms 6.8485 KOps/s 6.9721 KOps/s $\color{#d91a1a}-1.77\%$
test_exec_td_decorator 0.7465ms 0.2058ms 4.8581 KOps/s 5.0189 KOps/s $\color{#d91a1a}-3.20\%$
test_vmap_mlp_speed[True-True] 1.5696ms 0.4796ms 2.0852 KOps/s 2.0491 KOps/s $\color{#35bf28}+1.76\%$
test_vmap_mlp_speed[True-False] 1.1645ms 0.4806ms 2.0809 KOps/s 2.1042 KOps/s $\color{#d91a1a}-1.11\%$
test_vmap_mlp_speed[False-True] 0.7180ms 0.3955ms 2.5286 KOps/s 2.5663 KOps/s $\color{#d91a1a}-1.47\%$
test_vmap_mlp_speed[False-False] 0.5820ms 0.3878ms 2.5790 KOps/s 2.5872 KOps/s $\color{#d91a1a}-0.32\%$
test_vmap_mlp_speed_decorator[True-True] 0.9980ms 0.4911ms 2.0364 KOps/s 1.9997 KOps/s $\color{#35bf28}+1.84\%$
test_vmap_mlp_speed_decorator[True-False] 0.7321ms 0.4930ms 2.0283 KOps/s 2.0139 KOps/s $\color{#35bf28}+0.72\%$
test_vmap_mlp_speed_decorator[False-True] 0.9053ms 0.4064ms 2.4609 KOps/s 2.4710 KOps/s $\color{#d91a1a}-0.41\%$
test_vmap_mlp_speed_decorator[False-False] 0.6711ms 0.4043ms 2.4733 KOps/s 2.4365 KOps/s $\color{#35bf28}+1.51\%$
test_to_module_speed[True] 2.0106ms 1.3724ms 728.6510 Ops/s 731.0090 Ops/s $\color{#d91a1a}-0.32\%$
test_to_module_speed[False] 1.9590ms 1.3649ms 732.6480 Ops/s 739.2116 Ops/s $\color{#d91a1a}-0.89\%$

@vmoens vmoens added the bug Something isn't working label Mar 25, 2024
@vmoens vmoens merged commit 62348af into main Mar 25, 2024
45 of 48 checks passed
@vmoens vmoens deleted the fix-edited-td-folders branch March 25, 2024 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants