Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Refactor contiguous #716

Merged
merged 1 commit into from
Mar 20, 2024
Merged

[Refactor] Refactor contiguous #716

merged 1 commit into from
Mar 20, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Mar 20, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 20, 2024
@vmoens vmoens added the Refactor Refactoring code - not a new feature label Mar 20, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}12$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 64.9210μs 16.1859μs 61.7820 KOps/s 63.1402 KOps/s $\color{#d91a1a}-2.15\%$
test_plain_set_stack_nested 38.5930μs 16.6011μs 60.2370 KOps/s 61.2714 KOps/s $\color{#d91a1a}-1.69\%$
test_plain_set_nested_inplace 68.1570μs 18.2622μs 54.7581 KOps/s 55.9874 KOps/s $\color{#d91a1a}-2.20\%$
test_plain_set_stack_nested_inplace 71.3740μs 18.6387μs 53.6518 KOps/s 55.2234 KOps/s $\color{#d91a1a}-2.85\%$
test_items 17.3520μs 2.4442μs 409.1296 KOps/s 403.0683 KOps/s $\color{#35bf28}+1.50\%$
test_items_nested 1.2286ms 0.2663ms 3.7552 KOps/s 3.6273 KOps/s $\color{#35bf28}+3.53\%$
test_items_nested_locked 0.4189ms 0.2700ms 3.7042 KOps/s 3.6046 KOps/s $\color{#35bf28}+2.76\%$
test_items_nested_leaf 0.6566ms 0.1648ms 6.0672 KOps/s 5.9104 KOps/s $\color{#35bf28}+2.65\%$
test_items_stack_nested 1.2546ms 0.2728ms 3.6661 KOps/s 3.5608 KOps/s $\color{#35bf28}+2.96\%$
test_items_stack_nested_leaf 0.2958ms 0.1635ms 6.1154 KOps/s 5.8628 KOps/s $\color{#35bf28}+4.31\%$
test_items_stack_nested_locked 0.5616ms 0.2756ms 3.6282 KOps/s 3.5520 KOps/s $\color{#35bf28}+2.14\%$
test_keys 40.3250μs 3.9114μs 255.6597 KOps/s 259.9640 KOps/s $\color{#d91a1a}-1.66\%$
test_keys_nested 2.0288ms 0.1443ms 6.9313 KOps/s 6.9012 KOps/s $\color{#35bf28}+0.44\%$
test_keys_nested_locked 0.2428ms 0.1477ms 6.7698 KOps/s 6.4997 KOps/s $\color{#35bf28}+4.16\%$
test_keys_nested_leaf 39.5806ms 0.1329ms 7.5254 KOps/s 7.9597 KOps/s $\textbf{\color{#d91a1a}-5.46\%}$
test_keys_stack_nested 0.2558ms 0.1468ms 6.8133 KOps/s 6.7974 KOps/s $\color{#35bf28}+0.23\%$
test_keys_stack_nested_leaf 0.2518ms 0.1272ms 7.8592 KOps/s 7.8184 KOps/s $\color{#35bf28}+0.52\%$
test_keys_stack_nested_locked 0.3353ms 0.1516ms 6.5952 KOps/s 6.4466 KOps/s $\color{#35bf28}+2.30\%$
test_values 25.4050μs 1.1358μs 880.4022 KOps/s 821.4200 KOps/s $\textbf{\color{#35bf28}+7.18\%}$
test_values_nested 0.1026ms 50.2182μs 19.9131 KOps/s 19.5422 KOps/s $\color{#35bf28}+1.90\%$
test_values_nested_locked 0.1018ms 50.5406μs 19.7861 KOps/s 19.5794 KOps/s $\color{#35bf28}+1.06\%$
test_values_nested_leaf 0.1203ms 45.2027μs 22.1226 KOps/s 21.6993 KOps/s $\color{#35bf28}+1.95\%$
test_values_stack_nested 0.1029ms 51.2744μs 19.5029 KOps/s 19.2954 KOps/s $\color{#35bf28}+1.08\%$
test_values_stack_nested_leaf 92.5330μs 44.3057μs 22.5705 KOps/s 21.7922 KOps/s $\color{#35bf28}+3.57\%$
test_values_stack_nested_locked 0.1062ms 51.5604μs 19.3947 KOps/s 19.5152 KOps/s $\color{#d91a1a}-0.62\%$
test_membership 39.9750μs 1.3356μs 748.7274 KOps/s 715.7267 KOps/s $\color{#35bf28}+4.61\%$
test_membership_nested 22.7930μs 3.5061μs 285.2154 KOps/s 287.9751 KOps/s $\color{#d91a1a}-0.96\%$
test_membership_nested_leaf 18.3940μs 3.4429μs 290.4524 KOps/s 284.3188 KOps/s $\color{#35bf28}+2.16\%$
test_membership_stacked_nested 25.9180μs 3.4396μs 290.7294 KOps/s 284.0984 KOps/s $\color{#35bf28}+2.33\%$
test_membership_stacked_nested_leaf 22.7820μs 3.4393μs 290.7565 KOps/s 285.1610 KOps/s $\color{#35bf28}+1.96\%$
test_membership_nested_last 45.6960μs 4.2562μs 234.9515 KOps/s 230.0480 KOps/s $\color{#35bf28}+2.13\%$
test_membership_nested_leaf_last 41.0870μs 4.2086μs 237.6097 KOps/s 228.8375 KOps/s $\color{#35bf28}+3.83\%$
test_membership_stacked_nested_last 63.7990μs 13.4046μs 74.6014 KOps/s 232.3188 KOps/s $\textbf{\color{#d91a1a}-67.89\%}$
test_membership_stacked_nested_leaf_last 63.9090μs 13.2983μs 75.1974 KOps/s 230.8474 KOps/s $\textbf{\color{#d91a1a}-67.43\%}$
test_nested_getleaf 41.8280μs 10.6564μs 93.8403 KOps/s 95.7122 KOps/s $\color{#d91a1a}-1.96\%$
test_nested_get 52.5990μs 9.9871μs 100.1293 KOps/s 99.5558 KOps/s $\color{#35bf28}+0.58\%$
test_stacked_getleaf 43.5710μs 10.6913μs 93.5337 KOps/s 94.5019 KOps/s $\color{#d91a1a}-1.02\%$
test_stacked_get 53.1890μs 9.9890μs 100.1099 KOps/s 102.0167 KOps/s $\color{#d91a1a}-1.87\%$
test_nested_getitemleaf 36.0780μs 11.1653μs 89.5631 KOps/s 89.7023 KOps/s $\color{#d91a1a}-0.16\%$
test_nested_getitem 53.0290μs 10.3022μs 97.0663 KOps/s 91.6216 KOps/s $\textbf{\color{#35bf28}+5.94\%}$
test_stacked_getitemleaf 39.7540μs 11.0231μs 90.7185 KOps/s 89.7779 KOps/s $\color{#35bf28}+1.05\%$
test_stacked_getitem 54.9020μs 10.1959μs 98.0783 KOps/s 97.6375 KOps/s $\color{#35bf28}+0.45\%$
test_lock_nested 0.9171ms 0.3326ms 3.0067 KOps/s 2.9059 KOps/s $\color{#35bf28}+3.47\%$
test_lock_stack_nested 0.3533ms 0.2856ms 3.5016 KOps/s 3.2964 KOps/s $\textbf{\color{#35bf28}+6.22\%}$
test_unlock_nested 81.0547ms 0.4134ms 2.4191 KOps/s 2.3411 KOps/s $\color{#35bf28}+3.33\%$
test_unlock_stack_nested 0.5806ms 0.2992ms 3.3423 KOps/s 3.2075 KOps/s $\color{#35bf28}+4.20\%$
test_flatten_speed 2.9276ms 0.2628ms 3.8052 KOps/s 3.8049 KOps/s $+0.01\%$
test_unflatten_speed 0.6942ms 0.4055ms 2.4658 KOps/s 2.4719 KOps/s $\color{#d91a1a}-0.25\%$
test_common_ops 4.7958ms 0.6650ms 1.5037 KOps/s 1.5134 KOps/s $\color{#d91a1a}-0.65\%$
test_creation 34.5940μs 1.8301μs 546.4162 KOps/s 539.7387 KOps/s $\color{#35bf28}+1.24\%$
test_creation_empty 33.0020μs 9.3064μs 107.4534 KOps/s 119.9827 KOps/s $\textbf{\color{#d91a1a}-10.44\%}$
test_creation_nested_1 55.0920μs 11.8307μs 84.5258 KOps/s 91.7241 KOps/s $\textbf{\color{#d91a1a}-7.85\%}$
test_creation_nested_2 48.6310μs 14.9851μs 66.7330 KOps/s 70.5597 KOps/s $\textbf{\color{#d91a1a}-5.42\%}$
test_clone 1.8920ms 13.2945μs 75.2190 KOps/s 75.3313 KOps/s $\color{#d91a1a}-0.15\%$
test_getitem[int] 31.4080μs 10.9195μs 91.5794 KOps/s 86.7278 KOps/s $\textbf{\color{#35bf28}+5.59\%}$
test_getitem[slice_int] 67.0050μs 21.8141μs 45.8420 KOps/s 44.8507 KOps/s $\color{#35bf28}+2.21\%$
test_getitem[range] 0.1075ms 41.7417μs 23.9569 KOps/s 23.5611 KOps/s $\color{#35bf28}+1.68\%$
test_getitem[tuple] 51.8770μs 17.9415μs 55.7367 KOps/s 53.5217 KOps/s $\color{#35bf28}+4.14\%$
test_getitem[list] 0.1543ms 35.2880μs 28.3382 KOps/s 26.3112 KOps/s $\textbf{\color{#35bf28}+7.70\%}$
test_setitem_dim[int] 47.5890μs 30.8408μs 32.4246 KOps/s 32.4849 KOps/s $\color{#d91a1a}-0.19\%$
test_setitem_dim[slice_int] 0.1237ms 57.0693μs 17.5226 KOps/s 17.4465 KOps/s $\color{#35bf28}+0.44\%$
test_setitem_dim[range] 0.1200ms 74.7901μs 13.3707 KOps/s 13.3221 KOps/s $\color{#35bf28}+0.37\%$
test_setitem_dim[tuple] 68.3970μs 45.5334μs 21.9619 KOps/s 21.1331 KOps/s $\color{#35bf28}+3.92\%$
test_setitem 60.3830μs 18.8581μs 53.0276 KOps/s 54.3679 KOps/s $\color{#d91a1a}-2.47\%$
test_set 64.6710μs 18.2398μs 54.8253 KOps/s 56.4332 KOps/s $\color{#d91a1a}-2.85\%$
test_set_shared 3.7339ms 0.1384ms 7.2237 KOps/s 7.2318 KOps/s $\color{#d91a1a}-0.11\%$
test_update 87.8740μs 20.6812μs 48.3532 KOps/s 50.8044 KOps/s $\color{#d91a1a}-4.82\%$
test_update_nested 0.1169ms 28.0073μs 35.7049 KOps/s 36.8437 KOps/s $\color{#d91a1a}-3.09\%$
test_update__nested 78.5570μs 23.7928μs 42.0296 KOps/s 40.6727 KOps/s $\color{#35bf28}+3.34\%$
test_set_nested 64.8710μs 20.1446μs 49.6410 KOps/s 50.9485 KOps/s $\color{#d91a1a}-2.57\%$
test_set_nested_new 0.8422ms 23.9484μs 41.7564 KOps/s 43.2108 KOps/s $\color{#d91a1a}-3.37\%$
test_select 0.1008ms 38.2428μs 26.1487 KOps/s 25.9619 KOps/s $\color{#35bf28}+0.72\%$
test_select_nested 0.1222ms 59.0838μs 16.9251 KOps/s 17.0906 KOps/s $\color{#d91a1a}-0.97\%$
test_exclude_nested 0.2533ms 0.1194ms 8.3770 KOps/s 8.4686 KOps/s $\color{#d91a1a}-1.08\%$
test_empty[True] 0.7873ms 0.4074ms 2.4548 KOps/s 2.4582 KOps/s $\color{#d91a1a}-0.14\%$
test_empty[False] 5.5404μs 1.0112μs 988.9705 KOps/s 963.8968 KOps/s $\color{#35bf28}+2.60\%$
test_unbind_speed 0.3165ms 0.2439ms 4.1007 KOps/s 3.7274 KOps/s $\textbf{\color{#35bf28}+10.01\%}$
test_unbind_speed_stack0 0.4970ms 0.2301ms 4.3464 KOps/s 4.0903 KOps/s $\textbf{\color{#35bf28}+6.26\%}$
test_unbind_speed_stack1 0.1192s 0.6389ms 1.5652 KOps/s 1.4858 KOps/s $\textbf{\color{#35bf28}+5.34\%}$
test_split 0.1077s 1.6009ms 624.6418 Ops/s 613.7865 Ops/s $\color{#35bf28}+1.77\%$
test_chunk 1.6459ms 1.4434ms 692.8318 Ops/s 686.5285 Ops/s $\color{#35bf28}+0.92\%$
test_creation[device0] 0.1613ms 99.4992μs 10.0503 KOps/s 9.7997 KOps/s $\color{#35bf28}+2.56\%$
test_creation_from_tensor 5.1440ms 80.8983μs 12.3612 KOps/s 12.1656 KOps/s $\color{#35bf28}+1.61\%$
test_add_one[memmap_tensor0] 0.1107ms 5.3822μs 185.7990 KOps/s 180.1963 KOps/s $\color{#35bf28}+3.11\%$
test_contiguous[memmap_tensor0] 16.2400μs 0.6268μs 1.5955 MOps/s 1.4744 MOps/s $\textbf{\color{#35bf28}+8.21\%}$
test_stack[memmap_tensor0] 23.4740μs 3.6991μs 270.3362 KOps/s 281.4251 KOps/s $\color{#d91a1a}-3.94\%$
test_memmaptd_index 0.9278ms 0.2335ms 4.2834 KOps/s 4.2210 KOps/s $\color{#35bf28}+1.48\%$
test_memmaptd_index_astensor 0.7043ms 0.2979ms 3.3571 KOps/s 3.3346 KOps/s $\color{#35bf28}+0.67\%$
test_memmaptd_index_op 0.8436ms 0.5720ms 1.7483 KOps/s 1.7676 KOps/s $\color{#d91a1a}-1.09\%$
test_serialize_model 0.2153s 0.1138s 8.7893 Ops/s 8.4281 Ops/s $\color{#35bf28}+4.29\%$
test_serialize_model_pickle 0.4482s 0.3805s 2.6278 Ops/s 2.6309 Ops/s $\color{#d91a1a}-0.12\%$
test_serialize_weights 0.1028s 94.6484ms 10.5654 Ops/s 10.0262 Ops/s $\textbf{\color{#35bf28}+5.38\%}$
test_serialize_weights_returnearly 0.2404s 0.1385s 7.2180 Ops/s 6.8196 Ops/s $\textbf{\color{#35bf28}+5.84\%}$
test_serialize_weights_pickle 0.5906s 0.4449s 2.2477 Ops/s 2.5242 Ops/s $\textbf{\color{#d91a1a}-10.95\%}$
test_serialize_weights_filesystem 0.1071s 94.6851ms 10.5613 Ops/s 10.2089 Ops/s $\color{#35bf28}+3.45\%$
test_serialize_model_filesystem 98.9722ms 93.3090ms 10.7171 Ops/s 10.4962 Ops/s $\color{#35bf28}+2.10\%$
test_reshape_pytree 52.9390μs 21.2148μs 47.1369 KOps/s 47.5442 KOps/s $\color{#d91a1a}-0.86\%$
test_reshape_td 90.5510μs 31.3400μs 31.9081 KOps/s 31.5260 KOps/s $\color{#35bf28}+1.21\%$
test_view_pytree 61.5650μs 20.6140μs 48.5108 KOps/s 47.4314 KOps/s $\color{#35bf28}+2.28\%$
test_view_td 0.1291s 62.1961μs 16.0782 KOps/s 16.2449 KOps/s $\color{#d91a1a}-1.03\%$
test_unbind_pytree 57.5870μs 24.0748μs 41.5372 KOps/s 40.7938 KOps/s $\color{#35bf28}+1.82\%$
test_unbind_td 0.1161ms 35.5655μs 28.1171 KOps/s 27.4241 KOps/s $\color{#35bf28}+2.53\%$
test_split_pytree 53.3200μs 23.7737μs 42.0633 KOps/s 41.8069 KOps/s $\color{#35bf28}+0.61\%$
test_split_td 0.1120ms 39.0425μs 25.6131 KOps/s 25.0186 KOps/s $\color{#35bf28}+2.38\%$
test_add_pytree 89.6070μs 29.7123μs 33.6561 KOps/s 32.7360 KOps/s $\color{#35bf28}+2.81\%$
test_add_td 94.0960μs 50.0388μs 19.9845 KOps/s 19.5048 KOps/s $\color{#35bf28}+2.46\%$
test_distributed 0.1787ms 0.1003ms 9.9652 KOps/s 9.7384 KOps/s $\color{#35bf28}+2.33\%$
test_tdmodule 33.9730μs 16.2882μs 61.3940 KOps/s 60.0819 KOps/s $\color{#35bf28}+2.18\%$
test_tdmodule_dispatch 54.8020μs 31.8511μs 31.3961 KOps/s 30.8371 KOps/s $\color{#35bf28}+1.81\%$
test_tdseq 36.0170μs 19.2106μs 52.0547 KOps/s 52.1816 KOps/s $\color{#d91a1a}-0.24\%$
test_tdseq_dispatch 57.0370μs 37.3217μs 26.7941 KOps/s 27.1163 KOps/s $\color{#d91a1a}-1.19\%$
test_instantiation_functorch 2.1949ms 1.3181ms 758.6606 Ops/s 758.9810 Ops/s $\color{#d91a1a}-0.04\%$
test_instantiation_td 1.8178ms 1.0225ms 977.9562 Ops/s 979.1843 Ops/s $\color{#d91a1a}-0.13\%$
test_exec_functorch 0.2875ms 0.1592ms 6.2817 KOps/s 6.3201 KOps/s $\color{#d91a1a}-0.61\%$
test_exec_functional_call 0.3531ms 0.1483ms 6.7411 KOps/s 6.6363 KOps/s $\color{#35bf28}+1.58\%$
test_exec_td 0.2712ms 0.1441ms 6.9412 KOps/s 6.8160 KOps/s $\color{#35bf28}+1.84\%$
test_exec_td_decorator 0.3853ms 0.1935ms 5.1682 KOps/s 5.0492 KOps/s $\color{#35bf28}+2.36\%$
test_vmap_mlp_speed[True-True] 0.6562ms 0.4608ms 2.1700 KOps/s 2.1197 KOps/s $\color{#35bf28}+2.37\%$
test_vmap_mlp_speed[True-False] 0.7117ms 0.4581ms 2.1827 KOps/s 2.0639 KOps/s $\textbf{\color{#35bf28}+5.76\%}$
test_vmap_mlp_speed[False-True] 0.4794ms 0.3768ms 2.6538 KOps/s 2.5710 KOps/s $\color{#35bf28}+3.22\%$
test_vmap_mlp_speed[False-False] 0.6603ms 0.3799ms 2.6323 KOps/s 2.5873 KOps/s $\color{#35bf28}+1.74\%$
test_vmap_mlp_speed_decorator[True-True] 0.5723ms 0.4797ms 2.0846 KOps/s 2.0399 KOps/s $\color{#35bf28}+2.19\%$
test_vmap_mlp_speed_decorator[True-False] 0.8053ms 0.4824ms 2.0731 KOps/s 2.0385 KOps/s $\color{#35bf28}+1.70\%$
test_vmap_mlp_speed_decorator[False-True] 0.5235ms 0.3935ms 2.5416 KOps/s 2.4670 KOps/s $\color{#35bf28}+3.02\%$
test_vmap_mlp_speed_decorator[False-False] 0.5971ms 0.3932ms 2.5433 KOps/s 2.4732 KOps/s $\color{#35bf28}+2.83\%$
test_to_module_speed[True] 1.5018ms 1.3903ms 719.2833 Ops/s 724.4820 Ops/s $\color{#d91a1a}-0.72\%$
test_to_module_speed[False] 2.1246ms 1.3711ms 729.3179 Ops/s 739.7335 Ops/s $\color{#d91a1a}-1.41\%$

@vmoens vmoens merged commit b4c91e8 into main Mar 20, 2024
44 of 48 checks passed
@vmoens vmoens deleted the refactor-contiguous branch March 20, 2024 09:15
vmoens added a commit that referenced this pull request Mar 24, 2024
vmoens added a commit that referenced this pull request Mar 24, 2024
(cherry picked from commit b4c91e8)
vmoens added a commit that referenced this pull request Mar 25, 2024
(cherry picked from commit b4c91e8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Refactor Refactoring code - not a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants