Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] use from_file instead of mmap+from_buffer for readonly files #808

Merged
merged 2 commits into from
Jun 10, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jun 10, 2024

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 10, 2024
@vmoens vmoens added the Refactor Refactoring code - not a new feature label Jun 10, 2024
Copy link

github-actions bot commented Jun 10, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}33$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.2727ms 18.5146μs 54.0116 KOps/s 57.9460 KOps/s $\textbf{\color{#d91a1a}-6.79\%}$
test_plain_set_stack_nested 0.2785ms 17.5949μs 56.8346 KOps/s 57.1607 KOps/s $\color{#d91a1a}-0.57\%$
test_plain_set_nested_inplace 52.9380μs 18.2429μs 54.8157 KOps/s 50.4429 KOps/s $\textbf{\color{#35bf28}+8.67\%}$
test_plain_set_stack_nested_inplace 45.6950μs 18.3260μs 54.5672 KOps/s 50.5923 KOps/s $\textbf{\color{#35bf28}+7.86\%}$
test_items 16.1500μs 2.6635μs 375.4445 KOps/s 384.7576 KOps/s $\color{#d91a1a}-2.42\%$
test_items_nested 0.4016ms 0.2655ms 3.7669 KOps/s 3.7647 KOps/s $\color{#35bf28}+0.06\%$
test_items_nested_locked 1.1236ms 0.2680ms 3.7313 KOps/s 3.7364 KOps/s $\color{#d91a1a}-0.14\%$
test_items_nested_leaf 0.1480ms 76.1620μs 13.1299 KOps/s 12.8894 KOps/s $\color{#35bf28}+1.87\%$
test_items_stack_nested 0.3405ms 0.2699ms 3.7051 KOps/s 3.6961 KOps/s $\color{#35bf28}+0.24\%$
test_items_stack_nested_leaf 0.1917ms 79.0886μs 12.6441 KOps/s 11.9649 KOps/s $\textbf{\color{#35bf28}+5.68\%}$
test_items_stack_nested_locked 1.0170ms 0.2692ms 3.7154 KOps/s 3.7118 KOps/s $\color{#35bf28}+0.09\%$
test_keys 24.4350μs 3.8029μs 262.9571 KOps/s 255.2959 KOps/s $\color{#35bf28}+3.00\%$
test_keys_nested 0.2529ms 0.1362ms 7.3441 KOps/s 7.1271 KOps/s $\color{#35bf28}+3.05\%$
test_keys_nested_locked 0.7401ms 0.1415ms 7.0681 KOps/s 6.9209 KOps/s $\color{#35bf28}+2.13\%$
test_keys_nested_leaf 0.2185ms 0.1166ms 8.5769 KOps/s 8.3734 KOps/s $\color{#35bf28}+2.43\%$
test_keys_stack_nested 0.2869ms 0.1369ms 7.3050 KOps/s 7.1770 KOps/s $\color{#35bf28}+1.78\%$
test_keys_stack_nested_leaf 0.2070ms 0.1149ms 8.7005 KOps/s 8.4141 KOps/s $\color{#35bf28}+3.40\%$
test_keys_stack_nested_locked 0.2008ms 0.1414ms 7.0722 KOps/s 6.9730 KOps/s $\color{#35bf28}+1.42\%$
test_values 7.7795μs 1.1804μs 847.1778 KOps/s 861.0168 KOps/s $\color{#d91a1a}-1.61\%$
test_values_nested 0.1860ms 51.9600μs 19.2456 KOps/s 19.6810 KOps/s $\color{#d91a1a}-2.21\%$
test_values_nested_locked 0.3054ms 51.9235μs 19.2591 KOps/s 19.7050 KOps/s $\color{#d91a1a}-2.26\%$
test_values_nested_leaf 0.1040ms 46.9658μs 21.2921 KOps/s 21.5650 KOps/s $\color{#d91a1a}-1.27\%$
test_values_stack_nested 97.9220μs 52.7980μs 18.9401 KOps/s 19.3199 KOps/s $\color{#d91a1a}-1.97\%$
test_values_stack_nested_leaf 77.7950μs 46.5575μs 21.4788 KOps/s 21.7479 KOps/s $\color{#d91a1a}-1.24\%$
test_values_stack_nested_locked 86.8210μs 52.7520μs 18.9566 KOps/s 19.4860 KOps/s $\color{#d91a1a}-2.72\%$
test_membership 16.0300μs 1.3555μs 737.7584 KOps/s 735.2388 KOps/s $\color{#35bf28}+0.34\%$
test_membership_nested 32.1200μs 3.5112μs 284.8028 KOps/s 288.7505 KOps/s $\color{#d91a1a}-1.37\%$
test_membership_nested_leaf 22.4920μs 3.5336μs 282.9990 KOps/s 292.3248 KOps/s $\color{#d91a1a}-3.19\%$
test_membership_stacked_nested 25.9490μs 3.4788μs 287.4529 KOps/s 258.5084 KOps/s $\textbf{\color{#35bf28}+11.20\%}$
test_membership_stacked_nested_leaf 0.1193ms 3.5389μs 282.5765 KOps/s 294.6031 KOps/s $\color{#d91a1a}-4.08\%$
test_membership_nested_last 0.2241ms 4.2577μs 234.8708 KOps/s 241.9792 KOps/s $\color{#d91a1a}-2.94\%$
test_membership_nested_leaf_last 30.8580μs 4.2638μs 234.5310 KOps/s 238.9135 KOps/s $\color{#d91a1a}-1.83\%$
test_membership_stacked_nested_last 28.1220μs 4.8239μs 207.3001 KOps/s 208.1500 KOps/s $\color{#d91a1a}-0.41\%$
test_membership_stacked_nested_leaf_last 21.8300μs 4.8551μs 205.9678 KOps/s 208.3851 KOps/s $\color{#d91a1a}-1.16\%$
test_nested_getleaf 52.3760μs 10.8712μs 91.9861 KOps/s 92.9227 KOps/s $\color{#d91a1a}-1.01\%$
test_nested_get 50.7850μs 10.2488μs 97.5729 KOps/s 98.6171 KOps/s $\color{#d91a1a}-1.06\%$
test_stacked_getleaf 0.2615ms 11.2197μs 89.1293 KOps/s 94.2741 KOps/s $\textbf{\color{#d91a1a}-5.46\%}$
test_stacked_get 57.3580μs 10.2121μs 97.9230 KOps/s 99.5787 KOps/s $\color{#d91a1a}-1.66\%$
test_nested_getitemleaf 45.1540μs 11.2492μs 88.8951 KOps/s 87.3725 KOps/s $\color{#35bf28}+1.74\%$
test_nested_getitem 30.6970μs 10.7002μs 93.4562 KOps/s 89.7092 KOps/s $\color{#35bf28}+4.18\%$
test_stacked_getitemleaf 49.9930μs 11.3728μs 87.9293 KOps/s 88.0735 KOps/s $\color{#d91a1a}-0.16\%$
test_stacked_getitem 31.6090μs 10.5192μs 95.0645 KOps/s 96.0376 KOps/s $\color{#d91a1a}-1.01\%$
test_lock_nested 60.1212ms 0.4054ms 2.4667 KOps/s 2.8716 KOps/s $\textbf{\color{#d91a1a}-14.10\%}$
test_lock_stack_nested 0.3618ms 0.3070ms 3.2577 KOps/s 3.1984 KOps/s $\color{#35bf28}+1.85\%$
test_unlock_nested 0.7674ms 0.3526ms 2.8360 KOps/s 2.4406 KOps/s $\textbf{\color{#35bf28}+16.20\%}$
test_unlock_stack_nested 0.6053ms 0.3155ms 3.1694 KOps/s 3.1344 KOps/s $\color{#35bf28}+1.12\%$
test_flatten_speed 0.2329ms 94.4436μs 10.5883 KOps/s 10.3727 KOps/s $\color{#35bf28}+2.08\%$
test_unflatten_speed 0.7386ms 0.4138ms 2.4165 KOps/s 2.4702 KOps/s $\color{#d91a1a}-2.17\%$
test_common_ops 4.1928ms 0.6737ms 1.4843 KOps/s 1.3485 KOps/s $\textbf{\color{#35bf28}+10.07\%}$
test_creation 25.5680μs 1.9351μs 516.7730 KOps/s 532.5191 KOps/s $\color{#d91a1a}-2.96\%$
test_creation_empty 26.5000μs 8.0864μs 123.6651 KOps/s 78.9303 KOps/s $\textbf{\color{#35bf28}+56.68\%}$
test_creation_nested_1 31.1580μs 10.9291μs 91.4984 KOps/s 67.4244 KOps/s $\textbf{\color{#35bf28}+35.71\%}$
test_creation_nested_2 43.0700μs 14.1709μs 70.5671 KOps/s 57.0033 KOps/s $\textbf{\color{#35bf28}+23.79\%}$
test_clone 0.1474ms 13.4029μs 74.6107 KOps/s 71.4109 KOps/s $\color{#35bf28}+4.48\%$
test_getitem[int] 35.4860μs 11.5314μs 86.7198 KOps/s 86.3591 KOps/s $\color{#35bf28}+0.42\%$
test_getitem[slice_int] 53.5600μs 22.6025μs 44.2428 KOps/s 43.1458 KOps/s $\color{#35bf28}+2.54\%$
test_getitem[range] 81.9330μs 59.4488μs 16.8212 KOps/s 13.7243 KOps/s $\textbf{\color{#35bf28}+22.56\%}$
test_getitem[tuple] 68.2270μs 19.0092μs 52.6060 KOps/s 51.7861 KOps/s $\color{#35bf28}+1.58\%$
test_getitem[list] 0.1499ms 39.7885μs 25.1329 KOps/s 24.0321 KOps/s $\color{#35bf28}+4.58\%$
test_setitem_dim[int] 71.6030μs 32.2757μs 30.9831 KOps/s 27.1151 KOps/s $\textbf{\color{#35bf28}+14.26\%}$
test_setitem_dim[slice_int] 88.7860μs 57.5912μs 17.3638 KOps/s 15.8124 KOps/s $\textbf{\color{#35bf28}+9.81\%}$
test_setitem_dim[range] 0.1375ms 79.6849μs 12.5494 KOps/s 11.5485 KOps/s $\textbf{\color{#35bf28}+8.67\%}$
test_setitem_dim[tuple] 0.1096ms 47.1099μs 21.2270 KOps/s 17.9798 KOps/s $\textbf{\color{#35bf28}+18.06\%}$
test_setitem 72.5450μs 18.5574μs 53.8868 KOps/s 47.4037 KOps/s $\textbf{\color{#35bf28}+13.68\%}$
test_set 0.2609ms 17.9965μs 55.5664 KOps/s 48.6647 KOps/s $\textbf{\color{#35bf28}+14.18\%}$
test_set_shared 3.4649ms 0.1435ms 6.9667 KOps/s 6.8637 KOps/s $\color{#35bf28}+1.50\%$
test_update 0.2668ms 18.6893μs 53.5065 KOps/s 42.8358 KOps/s $\textbf{\color{#35bf28}+24.91\%}$
test_update_nested 75.6810μs 26.4448μs 37.8146 KOps/s 32.4332 KOps/s $\textbf{\color{#35bf28}+16.59\%}$
test_update__nested 76.6530μs 25.2000μs 39.6825 KOps/s 39.7929 KOps/s $\color{#d91a1a}-0.28\%$
test_set_nested 0.2510ms 20.0234μs 49.9415 KOps/s 43.6842 KOps/s $\textbf{\color{#35bf28}+14.32\%}$
test_set_nested_new 83.6050μs 23.9420μs 41.7676 KOps/s 36.8316 KOps/s $\textbf{\color{#35bf28}+13.40\%}$
test_select 0.1065ms 38.9602μs 25.6672 KOps/s 23.7784 KOps/s $\textbf{\color{#35bf28}+7.94\%}$
test_select_nested 0.1535ms 59.5686μs 16.7874 KOps/s 16.5713 KOps/s $\color{#35bf28}+1.30\%$
test_exclude_nested 0.2666ms 0.1232ms 8.1170 KOps/s 8.4582 KOps/s $\color{#d91a1a}-4.03\%$
test_empty[True] 0.6725ms 0.3969ms 2.5197 KOps/s 2.5412 KOps/s $\color{#d91a1a}-0.85\%$
test_empty[False] 10.3568μs 1.1657μs 857.8258 KOps/s 847.4869 KOps/s $\color{#35bf28}+1.22\%$
test_unbind_speed 0.4370ms 0.2563ms 3.9015 KOps/s 3.8614 KOps/s $\color{#35bf28}+1.04\%$
test_unbind_speed_stack0 0.4567ms 0.2497ms 4.0048 KOps/s 3.9179 KOps/s $\color{#35bf28}+2.22\%$
test_unbind_speed_stack1 91.4027ms 0.7486ms 1.3358 KOps/s 1.2658 KOps/s $\textbf{\color{#35bf28}+5.54\%}$
test_split 74.7153ms 1.5943ms 627.2510 Ops/s 607.2579 Ops/s $\color{#35bf28}+3.29\%$
test_chunk 75.4501ms 1.6065ms 622.4547 Ops/s 610.0858 Ops/s $\color{#35bf28}+2.03\%$
test_creation[device0] 3.6787ms 90.7313μs 11.0215 KOps/s 11.5324 KOps/s $\color{#d91a1a}-4.43\%$
test_creation_from_tensor 0.2741ms 88.0888μs 11.3522 KOps/s 11.5337 KOps/s $\color{#d91a1a}-1.57\%$
test_add_one[memmap_tensor0] 0.1026ms 5.1638μs 193.6554 KOps/s 180.2155 KOps/s $\textbf{\color{#35bf28}+7.46\%}$
test_contiguous[memmap_tensor0] 14.5470μs 0.6470μs 1.5456 MOps/s 1.5280 MOps/s $\color{#35bf28}+1.15\%$
test_stack[memmap_tensor0] 27.0510μs 3.5061μs 285.2151 KOps/s 276.8686 KOps/s $\color{#35bf28}+3.01\%$
test_memmaptd_index 0.9813ms 0.2575ms 3.8835 KOps/s 3.5727 KOps/s $\textbf{\color{#35bf28}+8.70\%}$
test_memmaptd_index_astensor 0.7616ms 0.3325ms 3.0074 KOps/s 2.9703 KOps/s $\color{#35bf28}+1.25\%$
test_memmaptd_index_op 0.9619ms 0.5702ms 1.7536 KOps/s 1.5604 KOps/s $\textbf{\color{#35bf28}+12.38\%}$
test_serialize_model 0.1830s 0.1172s 8.5322 Ops/s 8.5727 Ops/s $\color{#d91a1a}-0.47\%$
test_serialize_model_pickle 0.4496s 0.3783s 2.6433 Ops/s 2.6097 Ops/s $\color{#35bf28}+1.29\%$
test_serialize_weights 0.1818s 0.1154s 8.6652 Ops/s 8.6607 Ops/s $\color{#35bf28}+0.05\%$
test_serialize_weights_returnearly 0.2049s 0.1401s 7.1376 Ops/s 7.7251 Ops/s $\textbf{\color{#d91a1a}-7.60\%}$
test_serialize_weights_pickle 0.6565s 0.4709s 2.1238 Ops/s 2.3445 Ops/s $\textbf{\color{#d91a1a}-9.42\%}$
test_serialize_weights_filesystem 0.1043s 96.0515ms 10.4111 Ops/s 10.2154 Ops/s $\color{#35bf28}+1.92\%$
test_serialize_model_filesystem 97.1398ms 95.7112ms 10.4481 Ops/s 10.4403 Ops/s $\color{#35bf28}+0.08\%$
test_reshape_pytree 85.1080μs 25.6878μs 38.9290 KOps/s 38.2473 KOps/s $\color{#35bf28}+1.78\%$
test_reshape_td 0.1228ms 34.1973μs 29.2421 KOps/s 28.6946 KOps/s $\color{#35bf28}+1.91\%$
test_view_pytree 72.2350μs 25.7987μs 38.7616 KOps/s 39.5824 KOps/s $\color{#d91a1a}-2.07\%$
test_view_td 0.2911ms 38.3156μs 26.0990 KOps/s 25.5811 KOps/s $\color{#35bf28}+2.02\%$
test_unbind_pytree 69.1790μs 29.2518μs 34.1859 KOps/s 34.0998 KOps/s $\color{#35bf28}+0.25\%$
test_unbind_td 0.3954ms 37.1197μs 26.9399 KOps/s 25.9088 KOps/s $\color{#35bf28}+3.98\%$
test_split_pytree 66.2440μs 29.6138μs 33.7681 KOps/s 33.7791 KOps/s $\color{#d91a1a}-0.03\%$
test_split_td 0.1438ms 40.1568μs 24.9024 KOps/s 24.2583 KOps/s $\color{#35bf28}+2.66\%$
test_add_pytree 97.8020μs 34.7723μs 28.7585 KOps/s 28.2876 KOps/s $\color{#35bf28}+1.66\%$
test_add_td 0.2394ms 50.9015μs 19.6458 KOps/s 17.1104 KOps/s $\textbf{\color{#35bf28}+14.82\%}$
test_distributed 0.2613ms 0.1022ms 9.7876 KOps/s 9.7104 KOps/s $\color{#35bf28}+0.79\%$
test_tdmodule 0.1167ms 16.7109μs 59.8412 KOps/s 53.9485 KOps/s $\textbf{\color{#35bf28}+10.92\%}$
test_tdmodule_dispatch 53.0890μs 31.9296μs 31.3189 KOps/s 26.9609 KOps/s $\textbf{\color{#35bf28}+16.16\%}$
test_tdseq 40.9260μs 19.1277μs 52.2803 KOps/s 46.1922 KOps/s $\textbf{\color{#35bf28}+13.18\%}$
test_tdseq_dispatch 56.5560μs 36.5074μs 27.3917 KOps/s 23.2326 KOps/s $\textbf{\color{#35bf28}+17.90\%}$
test_instantiation_functorch 1.6304ms 1.3385ms 747.0866 Ops/s 760.2517 Ops/s $\color{#d91a1a}-1.73\%$
test_instantiation_td 1.7355ms 1.0292ms 971.6668 Ops/s 979.2481 Ops/s $\color{#d91a1a}-0.77\%$
test_exec_functorch 0.3541ms 0.1602ms 6.2405 KOps/s 6.2446 KOps/s $\color{#d91a1a}-0.07\%$
test_exec_functional_call 0.2843ms 0.1511ms 6.6164 KOps/s 6.7112 KOps/s $\color{#d91a1a}-1.41\%$
test_exec_td 0.3508ms 0.1443ms 6.9281 KOps/s 6.9466 KOps/s $\color{#d91a1a}-0.27\%$
test_exec_td_decorator 1.0686ms 0.2242ms 4.4611 KOps/s 4.4832 KOps/s $\color{#d91a1a}-0.49\%$
test_vmap_mlp_speed[True-True] 0.8384ms 0.4844ms 2.0645 KOps/s 2.0342 KOps/s $\color{#35bf28}+1.49\%$
test_vmap_mlp_speed[True-False] 0.7777ms 0.4825ms 2.0725 KOps/s 1.9294 KOps/s $\textbf{\color{#35bf28}+7.42\%}$
test_vmap_mlp_speed[False-True] 1.8596ms 0.4092ms 2.4437 KOps/s 2.5014 KOps/s $\color{#d91a1a}-2.31\%$
test_vmap_mlp_speed[False-False] 0.5857ms 0.3947ms 2.5333 KOps/s 2.5091 KOps/s $\color{#35bf28}+0.96\%$
test_vmap_mlp_speed_decorator[True-True] 1.3080ms 0.5540ms 1.8052 KOps/s 1.7648 KOps/s $\color{#35bf28}+2.28\%$
test_vmap_mlp_speed_decorator[True-False] 0.8860ms 0.5522ms 1.8108 KOps/s 1.7740 KOps/s $\color{#35bf28}+2.07\%$
test_vmap_mlp_speed_decorator[False-True] 0.7375ms 0.4617ms 2.1658 KOps/s 2.1552 KOps/s $\color{#35bf28}+0.49\%$
test_vmap_mlp_speed_decorator[False-False] 1.0619ms 0.4643ms 2.1539 KOps/s 2.1535 KOps/s $\color{#35bf28}+0.02\%$
test_to_module_speed[True] 2.4842ms 1.7271ms 578.9984 Ops/s 591.8048 Ops/s $\color{#d91a1a}-2.16\%$
test_to_module_speed[False] 2.3633ms 1.7065ms 585.9995 Ops/s 603.2827 Ops/s $\color{#d91a1a}-2.86\%$
test_tc_init 54.8320μs 23.4081μs 42.7203 KOps/s 32.9424 KOps/s $\textbf{\color{#35bf28}+29.68\%}$
test_tc_init_nested 0.1555ms 44.9665μs 22.2388 KOps/s 17.0740 KOps/s $\textbf{\color{#35bf28}+30.25\%}$
test_tc_first_layer_tensor 4.9061μs 0.7001μs 1.4283 MOps/s 1.4565 MOps/s $\color{#d91a1a}-1.93\%$
test_tc_first_layer_nontensor 1.8605μs 0.6673μs 1.4986 MOps/s 1.4757 MOps/s $\color{#35bf28}+1.55\%$
test_tc_second_layer_tensor 26.3390μs 1.8322μs 545.7951 KOps/s 551.0102 KOps/s $\color{#d91a1a}-0.95\%$
test_tc_second_layer_nontensor 9.5277μs 1.5123μs 661.2433 KOps/s 667.6953 KOps/s $\color{#d91a1a}-0.97\%$
test_unbind 96.6495ms 8.6316ms 115.8528 Ops/s 118.7328 Ops/s $\color{#d91a1a}-2.43\%$
test_full_like 17.2720ms 12.3558ms 80.9335 Ops/s 85.6447 Ops/s $\textbf{\color{#d91a1a}-5.50\%}$
test_zeros_like 15.4765ms 6.3587ms 157.2643 Ops/s 155.8501 Ops/s $\color{#35bf28}+0.91\%$
test_ones_like 11.3124ms 6.7885ms 147.3073 Ops/s 147.9961 Ops/s $\color{#d91a1a}-0.47\%$
test_clone 16.1451ms 8.9150ms 112.1699 Ops/s 119.6761 Ops/s $\textbf{\color{#d91a1a}-6.27\%}$
test_squeeze 72.9960μs 14.4515μs 69.1971 KOps/s 72.4818 KOps/s $\color{#d91a1a}-4.53\%$
test_unsqueeze 0.1261ms 60.8614μs 16.4308 KOps/s 16.1226 KOps/s $\color{#35bf28}+1.91\%$
test_split 0.2116ms 0.1128ms 8.8622 KOps/s 8.9370 KOps/s $\color{#d91a1a}-0.84\%$
test_permute 0.3046ms 0.1286ms 7.7787 KOps/s 7.7788 KOps/s $-0.00\%$
test_stack 30.5940ms 24.4430ms 40.9116 Ops/s 41.9292 Ops/s $\color{#d91a1a}-2.43\%$
test_cat 31.6828ms 24.4839ms 40.8432 Ops/s 41.5337 Ops/s $\color{#d91a1a}-1.66\%$

Copy link

github-actions bot commented Jun 10, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}25$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1004ms 12.9904μs 76.9797 KOps/s 77.1153 KOps/s $\color{#d91a1a}-0.18\%$
test_plain_set_stack_nested 31.4810μs 13.1357μs 76.1287 KOps/s 76.3613 KOps/s $\color{#d91a1a}-0.30\%$
test_plain_set_nested_inplace 0.2067ms 14.2610μs 70.1214 KOps/s 70.1881 KOps/s $\color{#d91a1a}-0.09\%$
test_plain_set_stack_nested_inplace 0.1980ms 14.4189μs 69.3533 KOps/s 69.3531 KOps/s $+0.00\%$
test_items 0.1894ms 4.6691μs 214.1728 KOps/s 211.6821 KOps/s $\color{#35bf28}+1.18\%$
test_items_nested 0.5318ms 0.3421ms 2.9235 KOps/s 2.9362 KOps/s $\color{#d91a1a}-0.43\%$
test_items_nested_locked 0.5413ms 0.3489ms 2.8658 KOps/s 2.9243 KOps/s $\color{#d91a1a}-2.00\%$
test_items_nested_leaf 0.1133ms 84.1317μs 11.8861 KOps/s 11.9609 KOps/s $\color{#d91a1a}-0.62\%$
test_items_stack_nested 0.3739ms 0.3407ms 2.9349 KOps/s 2.8897 KOps/s $\color{#35bf28}+1.56\%$
test_items_stack_nested_leaf 0.1754ms 86.2343μs 11.5963 KOps/s 11.8780 KOps/s $\color{#d91a1a}-2.37\%$
test_items_stack_nested_locked 0.4000ms 0.3454ms 2.8950 KOps/s 2.9028 KOps/s $\color{#d91a1a}-0.27\%$
test_keys 21.6510μs 4.3779μs 228.4208 KOps/s 227.6078 KOps/s $\color{#35bf28}+0.36\%$
test_keys_nested 85.0320μs 67.8008μs 14.7491 KOps/s 14.8178 KOps/s $\color{#d91a1a}-0.46\%$
test_keys_nested_locked 0.7827ms 73.8347μs 13.5438 KOps/s 13.7844 KOps/s $\color{#d91a1a}-1.75\%$
test_keys_nested_leaf 93.4620μs 58.4386μs 17.1120 KOps/s 17.3024 KOps/s $\color{#d91a1a}-1.10\%$
test_keys_stack_nested 0.1028ms 68.7897μs 14.5371 KOps/s 14.8697 KOps/s $\color{#d91a1a}-2.24\%$
test_keys_stack_nested_leaf 82.3510μs 58.8872μs 16.9816 KOps/s 17.1654 KOps/s $\color{#d91a1a}-1.07\%$
test_keys_stack_nested_locked 0.1062ms 74.0430μs 13.5057 KOps/s 13.9233 KOps/s $\color{#d91a1a}-3.00\%$
test_values 9.3737μs 1.8346μs 545.0735 KOps/s 545.8520 KOps/s $\color{#d91a1a}-0.14\%$
test_values_nested 0.1566ms 35.3028μs 28.3263 KOps/s 28.4248 KOps/s $\color{#d91a1a}-0.35\%$
test_values_nested_locked 56.6310μs 37.2735μs 26.8287 KOps/s 26.5308 KOps/s $\color{#35bf28}+1.12\%$
test_values_nested_leaf 47.4310μs 31.3232μs 31.9252 KOps/s 32.1625 KOps/s $\color{#d91a1a}-0.74\%$
test_values_stack_nested 60.3410μs 35.8326μs 27.9075 KOps/s 27.5492 KOps/s $\color{#35bf28}+1.30\%$
test_values_stack_nested_leaf 47.8810μs 32.1079μs 31.1450 KOps/s 31.1228 KOps/s $\color{#35bf28}+0.07\%$
test_values_stack_nested_locked 52.8810μs 37.7996μs 26.4553 KOps/s 25.7928 KOps/s $\color{#35bf28}+2.57\%$
test_membership 3.3057μs 0.7381μs 1.3548 MOps/s 1.3562 MOps/s $\color{#d91a1a}-0.11\%$
test_membership_nested 61.5210μs 2.5739μs 388.5148 KOps/s 383.8238 KOps/s $\color{#35bf28}+1.22\%$
test_membership_nested_leaf 18.9410μs 2.5973μs 385.0203 KOps/s 386.0992 KOps/s $\color{#d91a1a}-0.28\%$
test_membership_stacked_nested 21.1310μs 2.5914μs 385.8970 KOps/s 384.3342 KOps/s $\color{#35bf28}+0.41\%$
test_membership_stacked_nested_leaf 19.4400μs 2.5859μs 386.7177 KOps/s 388.8727 KOps/s $\color{#d91a1a}-0.55\%$
test_membership_nested_last 18.5210μs 3.1158μs 320.9443 KOps/s 319.8583 KOps/s $\color{#35bf28}+0.34\%$
test_membership_nested_leaf_last 19.8200μs 3.1096μs 321.5804 KOps/s 320.4185 KOps/s $\color{#35bf28}+0.36\%$
test_membership_stacked_nested_last 24.5100μs 3.0984μs 322.7506 KOps/s 281.9671 KOps/s $\textbf{\color{#35bf28}+14.46\%}$
test_membership_stacked_nested_leaf_last 16.0500μs 3.1138μs 321.1544 KOps/s 281.0003 KOps/s $\textbf{\color{#35bf28}+14.29\%}$
test_nested_getleaf 53.6510μs 8.3491μs 119.7741 KOps/s 119.2427 KOps/s $\color{#35bf28}+0.45\%$
test_nested_get 23.2000μs 7.8833μs 126.8512 KOps/s 126.6812 KOps/s $\color{#35bf28}+0.13\%$
test_stacked_getleaf 30.1600μs 8.4057μs 118.9671 KOps/s 118.3994 KOps/s $\color{#35bf28}+0.48\%$
test_stacked_get 24.1910μs 7.9066μs 126.4767 KOps/s 125.1162 KOps/s $\color{#35bf28}+1.09\%$
test_nested_getitemleaf 0.1440ms 8.5540μs 116.9050 KOps/s 116.9990 KOps/s $\color{#d91a1a}-0.08\%$
test_nested_getitem 22.7300μs 8.0628μs 124.0258 KOps/s 124.0704 KOps/s $\color{#d91a1a}-0.04\%$
test_stacked_getitemleaf 22.6200μs 8.5978μs 116.3091 KOps/s 115.4514 KOps/s $\color{#35bf28}+0.74\%$
test_stacked_getitem 29.9600μs 8.0682μs 123.9440 KOps/s 122.8375 KOps/s $\color{#35bf28}+0.90\%$
test_lock_nested 57.5754ms 0.4063ms 2.4612 KOps/s 2.3387 KOps/s $\textbf{\color{#35bf28}+5.24\%}$
test_lock_stack_nested 0.3777ms 0.3063ms 3.2642 KOps/s 3.1988 KOps/s $\color{#35bf28}+2.04\%$
test_unlock_nested 59.2340ms 0.4101ms 2.4384 KOps/s 2.7459 KOps/s $\textbf{\color{#d91a1a}-11.20\%}$
test_unlock_stack_nested 0.4163ms 0.3134ms 3.1907 KOps/s 3.1077 KOps/s $\color{#35bf28}+2.67\%$
test_flatten_speed 0.2738ms 0.1032ms 9.6867 KOps/s 9.7332 KOps/s $\color{#d91a1a}-0.48\%$
test_unflatten_speed 0.4909ms 0.2978ms 3.3575 KOps/s 3.4481 KOps/s $\color{#d91a1a}-2.63\%$
test_common_ops 1.1765ms 0.6031ms 1.6580 KOps/s 1.5943 KOps/s $\color{#35bf28}+4.00\%$
test_creation 16.9000μs 1.6863μs 593.0283 KOps/s 592.7644 KOps/s $\color{#35bf28}+0.04\%$
test_creation_empty 0.2043ms 9.2940μs 107.5960 KOps/s 108.6429 KOps/s $\color{#d91a1a}-0.96\%$
test_creation_nested_1 37.8510μs 11.0453μs 90.5364 KOps/s 91.4645 KOps/s $\color{#d91a1a}-1.01\%$
test_creation_nested_2 31.1310μs 13.3503μs 74.9047 KOps/s 76.4294 KOps/s $\color{#d91a1a}-1.99\%$
test_clone 68.6310μs 11.6809μs 85.6096 KOps/s 78.2995 KOps/s $\textbf{\color{#35bf28}+9.34\%}$
test_getitem[int] 33.3110μs 10.9589μs 91.2499 KOps/s 87.7883 KOps/s $\color{#35bf28}+3.94\%$
test_getitem[slice_int] 43.1310μs 21.0283μs 47.5551 KOps/s 45.5897 KOps/s $\color{#35bf28}+4.31\%$
test_getitem[range] 67.5420μs 47.8410μs 20.9026 KOps/s 20.8156 KOps/s $\color{#35bf28}+0.42\%$
test_getitem[tuple] 48.5010μs 18.7960μs 53.2027 KOps/s 51.5726 KOps/s $\color{#35bf28}+3.16\%$
test_getitem[list] 0.1600ms 34.3182μs 29.1391 KOps/s 27.1697 KOps/s $\textbf{\color{#35bf28}+7.25\%}$
test_setitem_dim[int] 75.8110μs 30.1852μs 33.1288 KOps/s 31.4886 KOps/s $\textbf{\color{#35bf28}+5.21\%}$
test_setitem_dim[slice_int] 0.1757ms 50.7379μs 19.7091 KOps/s 19.0531 KOps/s $\color{#35bf28}+3.44\%$
test_setitem_dim[range] 88.3810μs 68.9637μs 14.5004 KOps/s 14.1992 KOps/s $\color{#35bf28}+2.12\%$
test_setitem_dim[tuple] 68.8210μs 45.2917μs 22.0791 KOps/s 21.6357 KOps/s $\color{#35bf28}+2.05\%$
test_setitem 0.1101ms 16.7773μs 59.6042 KOps/s 55.6846 KOps/s $\textbf{\color{#35bf28}+7.04\%}$
test_set 69.8310μs 16.3561μs 61.1392 KOps/s 57.2855 KOps/s $\textbf{\color{#35bf28}+6.73\%}$
test_set_shared 1.0396ms 99.4263μs 10.0577 KOps/s 9.3514 KOps/s $\textbf{\color{#35bf28}+7.55\%}$
test_update 83.2420μs 18.4937μs 54.0724 KOps/s 46.4774 KOps/s $\textbf{\color{#35bf28}+16.34\%}$
test_update_nested 68.3710μs 23.8762μs 41.8827 KOps/s 39.1844 KOps/s $\textbf{\color{#35bf28}+6.89\%}$
test_update__nested 0.1208ms 22.0631μs 45.3246 KOps/s 40.6638 KOps/s $\textbf{\color{#35bf28}+11.46\%}$
test_set_nested 54.8710μs 17.2856μs 57.8515 KOps/s 53.9638 KOps/s $\textbf{\color{#35bf28}+7.20\%}$
test_set_nested_new 59.5310μs 20.3965μs 49.0281 KOps/s 45.5441 KOps/s $\textbf{\color{#35bf28}+7.65\%}$
test_select 87.8720μs 33.2514μs 30.0739 KOps/s 26.9332 KOps/s $\textbf{\color{#35bf28}+11.66\%}$
test_select_nested 86.4310μs 55.6407μs 17.9724 KOps/s 18.2330 KOps/s $\color{#d91a1a}-1.43\%$
test_exclude_nested 0.1887ms 0.1102ms 9.0754 KOps/s 8.9946 KOps/s $\color{#35bf28}+0.90\%$
test_empty[True] 0.4136ms 0.3494ms 2.8620 KOps/s 2.8491 KOps/s $\color{#35bf28}+0.45\%$
test_empty[False] 2.0786μs 0.9240μs 1.0822 MOps/s 1.0711 MOps/s $\color{#35bf28}+1.04\%$
test_to 0.1032ms 78.1064μs 12.8031 KOps/s 12.7346 KOps/s $\color{#35bf28}+0.54\%$
test_to_nonblocking 0.2141ms 63.3908μs 15.7752 KOps/s 15.1690 KOps/s $\color{#35bf28}+4.00\%$
test_unbind_speed 0.3302ms 0.2654ms 3.7676 KOps/s 3.6315 KOps/s $\color{#35bf28}+3.75\%$
test_unbind_speed_stack0 0.3960ms 0.2663ms 3.7554 KOps/s 3.6596 KOps/s $\color{#35bf28}+2.62\%$
test_unbind_speed_stack1 75.0457ms 0.8380ms 1.1934 KOps/s 1.1579 KOps/s $\color{#35bf28}+3.06\%$
test_split 75.2094ms 1.6997ms 588.3245 Ops/s 640.8934 Ops/s $\textbf{\color{#d91a1a}-8.20\%}$
test_chunk 1.6080ms 1.5685ms 637.5373 Ops/s 594.8318 Ops/s $\textbf{\color{#35bf28}+7.18\%}$
test_creation[device0] 0.1992ms 59.1409μs 16.9088 KOps/s 15.6778 KOps/s $\textbf{\color{#35bf28}+7.85\%}$
test_creation_from_tensor 0.2042ms 56.3072μs 17.7597 KOps/s 16.6699 KOps/s $\textbf{\color{#35bf28}+6.54\%}$
test_add_one[memmap_tensor0] 0.1366ms 7.1421μs 140.0158 KOps/s 131.4054 KOps/s $\textbf{\color{#35bf28}+6.55\%}$
test_contiguous[memmap_tensor0] 20.3610μs 0.7185μs 1.3918 MOps/s 1.4124 MOps/s $\color{#d91a1a}-1.46\%$
test_stack[memmap_tensor0] 42.2410μs 5.0880μs 196.5407 KOps/s 195.0689 KOps/s $\color{#35bf28}+0.75\%$
test_memmaptd_index 1.1502ms 0.2953ms 3.3866 KOps/s 3.2982 KOps/s $\color{#35bf28}+2.68\%$
test_memmaptd_index_astensor 0.6388ms 0.3658ms 2.7334 KOps/s 2.6595 KOps/s $\color{#35bf28}+2.78\%$
test_memmaptd_index_op 1.1584ms 0.6796ms 1.4715 KOps/s 1.3986 KOps/s $\textbf{\color{#35bf28}+5.21\%}$
test_serialize_model 0.1064s 0.1034s 9.6732 Ops/s 8.5488 Ops/s $\textbf{\color{#35bf28}+13.15\%}$
test_serialize_model_pickle 1.3661s 1.2391s 0.8071 Ops/s 0.8074 Ops/s $\color{#d91a1a}-0.04\%$
test_serialize_weights 0.1802s 0.1104s 9.0557 Ops/s 8.7058 Ops/s $\color{#35bf28}+4.02\%$
test_serialize_weights_returnearly 0.2423s 0.1019s 9.8128 Ops/s 10.1879 Ops/s $\color{#d91a1a}-3.68\%$
test_serialize_weights_pickle 1.3533s 1.2365s 0.8087 Ops/s 0.8009 Ops/s $\color{#35bf28}+0.98\%$
test_reshape_pytree 93.3810μs 26.4478μs 37.8104 KOps/s 35.4348 KOps/s $\textbf{\color{#35bf28}+6.70\%}$
test_reshape_td 90.5820μs 31.9686μs 31.2807 KOps/s 31.0524 KOps/s $\color{#35bf28}+0.74\%$
test_view_pytree 0.1705ms 26.0873μs 38.3328 KOps/s 37.2699 KOps/s $\color{#35bf28}+2.85\%$
test_view_td 66.6510μs 36.1114μs 27.6921 KOps/s 26.4914 KOps/s $\color{#35bf28}+4.53\%$
test_unbind_pytree 64.7310μs 32.0318μs 31.2190 KOps/s 30.2990 KOps/s $\color{#35bf28}+3.04\%$
test_unbind_td 0.4014ms 41.2026μs 24.2703 KOps/s 23.7051 KOps/s $\color{#35bf28}+2.38\%$
test_split_pytree 57.6710μs 34.9665μs 28.5988 KOps/s 28.0048 KOps/s $\color{#35bf28}+2.12\%$
test_split_td 0.1073ms 40.5883μs 24.6376 KOps/s 24.7891 KOps/s $\color{#d91a1a}-0.61\%$
test_add_pytree 0.1834ms 39.5709μs 25.2711 KOps/s 23.7788 KOps/s $\textbf{\color{#35bf28}+6.28\%}$
test_add_td 0.2255ms 54.2830μs 18.4220 KOps/s 18.2438 KOps/s $\color{#35bf28}+0.98\%$
test_distributed 4.1594ms 80.7307μs 12.3869 KOps/s 10.4972 KOps/s $\textbf{\color{#35bf28}+18.00\%}$
test_tdmodule 0.1422ms 15.6800μs 63.7753 KOps/s 66.3276 KOps/s $\color{#d91a1a}-3.85\%$
test_tdmodule_dispatch 50.9010μs 30.1538μs 33.1633 KOps/s 33.3456 KOps/s $\color{#d91a1a}-0.55\%$
test_tdseq 34.0200μs 17.4082μs 57.4442 KOps/s 57.8409 KOps/s $\color{#d91a1a}-0.69\%$
test_tdseq_dispatch 50.1910μs 33.7372μs 29.6409 KOps/s 29.6784 KOps/s $\color{#d91a1a}-0.13\%$
test_instantiation_functorch 1.7200ms 1.5706ms 636.7174 Ops/s 629.2309 Ops/s $\color{#35bf28}+1.19\%$
test_instantiation_td 1.5387ms 1.0766ms 928.8247 Ops/s 851.7520 Ops/s $\textbf{\color{#35bf28}+9.05\%}$
test_exec_functorch 0.2239ms 0.1563ms 6.3975 KOps/s 6.2070 KOps/s $\color{#35bf28}+3.07\%$
test_exec_functional_call 0.3137ms 0.1435ms 6.9674 KOps/s 6.8144 KOps/s $\color{#35bf28}+2.24\%$
test_exec_td 0.1707ms 0.1387ms 7.2111 KOps/s 6.8823 KOps/s $\color{#35bf28}+4.78\%$
test_exec_td_decorator 0.4949ms 0.2188ms 4.5694 KOps/s 4.5753 KOps/s $\color{#d91a1a}-0.13\%$
test_vmap_mlp_speed[True-True] 0.8193ms 0.6203ms 1.6120 KOps/s 1.5972 KOps/s $\color{#35bf28}+0.92\%$
test_vmap_mlp_speed[True-False] 0.7819ms 0.6185ms 1.6167 KOps/s 1.5993 KOps/s $\color{#35bf28}+1.09\%$
test_vmap_mlp_speed[False-True] 0.7264ms 0.5688ms 1.7580 KOps/s 1.8068 KOps/s $\color{#d91a1a}-2.70\%$
test_vmap_mlp_speed[False-False] 0.7062ms 0.5494ms 1.8203 KOps/s 1.8127 KOps/s $\color{#35bf28}+0.42\%$
test_vmap_mlp_speed_decorator[True-True] 1.1395ms 0.6870ms 1.4557 KOps/s 1.4396 KOps/s $\color{#35bf28}+1.12\%$
test_vmap_mlp_speed_decorator[True-False] 0.8485ms 0.6849ms 1.4601 KOps/s 1.4479 KOps/s $\color{#35bf28}+0.84\%$
test_vmap_mlp_speed_decorator[False-True] 0.7765ms 0.6086ms 1.6431 KOps/s 1.6346 KOps/s $\color{#35bf28}+0.52\%$
test_vmap_mlp_speed_decorator[False-False] 0.7993ms 0.6092ms 1.6416 KOps/s 1.6390 KOps/s $\color{#35bf28}+0.16\%$
test_vmap_transformer_speed[True-True] 8.4705ms 8.1799ms 122.2509 Ops/s 120.1829 Ops/s $\color{#35bf28}+1.72\%$
test_vmap_transformer_speed[True-False] 8.6357ms 8.2111ms 121.7867 Ops/s 120.4085 Ops/s $\color{#35bf28}+1.14\%$
test_vmap_transformer_speed[False-True] 8.3459ms 8.1078ms 123.3381 Ops/s 121.4150 Ops/s $\color{#35bf28}+1.58\%$
test_vmap_transformer_speed[False-False] 8.8301ms 8.1633ms 122.5000 Ops/s 121.5368 Ops/s $\color{#35bf28}+0.79\%$
test_vmap_transformer_speed_decorator[True-True] 20.2955ms 19.9154ms 50.2124 Ops/s 49.6955 Ops/s $\color{#35bf28}+1.04\%$
test_vmap_transformer_speed_decorator[True-False] 20.4971ms 19.9208ms 50.1989 Ops/s 49.7917 Ops/s $\color{#35bf28}+0.82\%$
test_vmap_transformer_speed_decorator[False-True] 20.2115ms 19.7983ms 50.5094 Ops/s 49.9736 Ops/s $\color{#35bf28}+1.07\%$
test_vmap_transformer_speed_decorator[False-False] 20.0132ms 19.7769ms 50.5642 Ops/s 50.0538 Ops/s $\color{#35bf28}+1.02\%$
test_to_module_speed[True] 2.1141ms 1.5723ms 635.9973 Ops/s 642.4245 Ops/s $\color{#d91a1a}-1.00\%$
test_to_module_speed[False] 1.6535ms 1.5365ms 650.8329 Ops/s 647.7895 Ops/s $\color{#35bf28}+0.47\%$
test_tc_init 85.8310μs 25.3702μs 39.4163 KOps/s 39.3794 KOps/s $\color{#35bf28}+0.09\%$
test_tc_init_nested 93.7910μs 50.7250μs 19.7141 KOps/s 19.2576 KOps/s $\color{#35bf28}+2.37\%$
test_tc_first_layer_tensor 0.7558μs 0.3745μs 2.6700 MOps/s 2.6742 MOps/s $\color{#d91a1a}-0.16\%$
test_tc_first_layer_nontensor 4.0462μs 0.4038μs 2.4762 MOps/s 2.5066 MOps/s $\color{#d91a1a}-1.21\%$
test_tc_second_layer_tensor 4.2720μs 0.9975μs 1.0025 MOps/s 999.4594 KOps/s $\color{#35bf28}+0.31\%$
test_tc_second_layer_nontensor 4.4250μs 0.8523μs 1.1733 MOps/s 1.1656 MOps/s $\color{#35bf28}+0.66\%$
test_unbind 92.7126ms 6.6425ms 150.5448 Ops/s 188.2249 Ops/s $\textbf{\color{#d91a1a}-20.02\%}$
test_full_like 14.2627ms 13.4837ms 74.1633 Ops/s 73.1956 Ops/s $\color{#35bf28}+1.32\%$
test_zeros_like 8.2594ms 7.8389ms 127.5697 Ops/s 126.6101 Ops/s $\color{#35bf28}+0.76\%$
test_ones_like 8.3005ms 7.8327ms 127.6701 Ops/s 128.1007 Ops/s $\color{#d91a1a}-0.34\%$
test_clone 9.9532ms 9.4858ms 105.4208 Ops/s 105.0317 Ops/s $\color{#35bf28}+0.37\%$
test_squeeze 0.1438ms 11.3147μs 88.3808 KOps/s 89.4251 KOps/s $\color{#d91a1a}-1.17\%$
test_unsqueeze 0.1828ms 53.8997μs 18.5530 KOps/s 18.4146 KOps/s $\color{#35bf28}+0.75\%$
test_split 0.2280ms 0.1017ms 9.8323 KOps/s 9.7586 KOps/s $\color{#35bf28}+0.75\%$
test_permute 0.2351ms 0.1145ms 8.7370 KOps/s 8.7716 KOps/s $\color{#d91a1a}-0.39\%$
test_stack 28.8452ms 27.8106ms 35.9576 Ops/s 36.0777 Ops/s $\color{#d91a1a}-0.33\%$
test_cat 28.3947ms 27.7569ms 36.0271 Ops/s 36.3131 Ops/s $\color{#d91a1a}-0.79\%$

@vmoens vmoens merged commit 08be67d into main Jun 10, 2024
37 of 38 checks passed
@vmoens vmoens deleted the from-file-only branch June 10, 2024 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Refactor Refactoring code - not a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants