Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Allow fake-tensor detection pass through in torch 2.0 #802

Merged
merged 1 commit into from
Jun 4, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jun 4, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 4, 2024
@vmoens vmoens added bug Something isn't working versioning labels Jun 4, 2024
@vmoens vmoens merged commit 6128f73 into main Jun 4, 2024
21 of 28 checks passed
@vmoens vmoens deleted the pt2.0-compat branch June 4, 2024 10:12
Copy link

github-actions bot commented Jun 4, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 39.0320μs 16.1588μs 61.8858 KOps/s 61.3000 KOps/s $\color{#35bf28}+0.96\%$
test_plain_set_stack_nested 72.8160μs 16.2233μs 61.6398 KOps/s 60.5802 KOps/s $\color{#35bf28}+1.75\%$
test_plain_set_nested_inplace 66.5140μs 18.3289μs 54.5586 KOps/s 53.9782 KOps/s $\color{#35bf28}+1.08\%$
test_plain_set_stack_nested_inplace 61.7550μs 18.5398μs 53.9379 KOps/s 53.6752 KOps/s $\color{#35bf28}+0.49\%$
test_items 37.7410μs 2.5505μs 392.0843 KOps/s 396.6388 KOps/s $\color{#d91a1a}-1.15\%$
test_items_nested 0.4382ms 0.2616ms 3.8223 KOps/s 3.7641 KOps/s $\color{#35bf28}+1.55\%$
test_items_nested_locked 0.4516ms 0.2642ms 3.7851 KOps/s 3.7176 KOps/s $\color{#35bf28}+1.81\%$
test_items_nested_leaf 0.1460ms 76.0457μs 13.1500 KOps/s 12.9220 KOps/s $\color{#35bf28}+1.76\%$
test_items_stack_nested 0.4371ms 0.2678ms 3.7339 KOps/s 3.6854 KOps/s $\color{#35bf28}+1.31\%$
test_items_stack_nested_leaf 0.1557ms 79.1637μs 12.6320 KOps/s 12.4015 KOps/s $\color{#35bf28}+1.86\%$
test_items_stack_nested_locked 0.4758ms 0.2682ms 3.7282 KOps/s 3.6675 KOps/s $\color{#35bf28}+1.66\%$
test_keys 35.1360μs 4.0383μs 247.6293 KOps/s 256.0010 KOps/s $\color{#d91a1a}-3.27\%$
test_keys_nested 0.2349ms 0.1378ms 7.2552 KOps/s 7.0548 KOps/s $\color{#35bf28}+2.84\%$
test_keys_nested_locked 0.8019ms 0.1427ms 7.0096 KOps/s 6.8315 KOps/s $\color{#35bf28}+2.61\%$
test_keys_nested_leaf 0.2313ms 0.1162ms 8.6094 KOps/s 8.4010 KOps/s $\color{#35bf28}+2.48\%$
test_keys_stack_nested 0.2867ms 0.1380ms 7.2489 KOps/s 7.1169 KOps/s $\color{#35bf28}+1.85\%$
test_keys_stack_nested_leaf 0.2008ms 0.1163ms 8.5990 KOps/s 8.4059 KOps/s $\color{#35bf28}+2.30\%$
test_keys_stack_nested_locked 0.2005ms 0.1426ms 7.0107 KOps/s 6.9339 KOps/s $\color{#35bf28}+1.11\%$
test_values 10.9127μs 1.1763μs 850.1248 KOps/s 844.3399 KOps/s $\color{#35bf28}+0.69\%$
test_values_nested 0.1128ms 50.4426μs 19.8245 KOps/s 19.5240 KOps/s $\color{#35bf28}+1.54\%$
test_values_nested_locked 0.1026ms 50.2507μs 19.9002 KOps/s 19.5781 KOps/s $\color{#35bf28}+1.65\%$
test_values_nested_leaf 0.1079ms 45.3544μs 22.0486 KOps/s 21.6229 KOps/s $\color{#35bf28}+1.97\%$
test_values_stack_nested 0.1162ms 51.2375μs 19.5169 KOps/s 19.2146 KOps/s $\color{#35bf28}+1.57\%$
test_values_stack_nested_leaf 91.9010μs 45.4129μs 22.0202 KOps/s 21.4565 KOps/s $\color{#35bf28}+2.63\%$
test_values_stack_nested_locked 0.1023ms 51.1352μs 19.5560 KOps/s 19.1736 KOps/s $\color{#35bf28}+1.99\%$
test_membership 16.7220μs 1.3790μs 725.1664 KOps/s 746.8887 KOps/s $\color{#d91a1a}-2.91\%$
test_membership_nested 47.4690μs 3.4365μs 290.9976 KOps/s 296.3589 KOps/s $\color{#d91a1a}-1.81\%$
test_membership_nested_leaf 39.2230μs 3.4786μs 287.4729 KOps/s 292.4235 KOps/s $\color{#d91a1a}-1.69\%$
test_membership_stacked_nested 31.9790μs 3.4248μs 291.9916 KOps/s 284.7108 KOps/s $\color{#35bf28}+2.56\%$
test_membership_stacked_nested_leaf 22.7230μs 3.4746μs 287.8039 KOps/s 291.2321 KOps/s $\color{#d91a1a}-1.18\%$
test_membership_nested_last 28.4140μs 4.2586μs 234.8199 KOps/s 240.2103 KOps/s $\color{#d91a1a}-2.24\%$
test_membership_nested_leaf_last 42.0180μs 4.2868μs 233.2734 KOps/s 240.1258 KOps/s $\color{#d91a1a}-2.85\%$
test_membership_stacked_nested_last 26.8100μs 4.3546μs 229.6438 KOps/s 174.7706 KOps/s $\textbf{\color{#35bf28}+31.40\%}$
test_membership_stacked_nested_leaf_last 25.1070μs 4.3175μs 231.6152 KOps/s 174.1648 KOps/s $\textbf{\color{#35bf28}+32.99\%}$
test_nested_getleaf 49.2920μs 10.6009μs 94.3317 KOps/s 93.6616 KOps/s $\color{#35bf28}+0.72\%$
test_nested_get 40.6350μs 9.9897μs 100.1026 KOps/s 98.5651 KOps/s $\color{#35bf28}+1.56\%$
test_stacked_getleaf 68.0270μs 10.4585μs 95.6160 KOps/s 95.3767 KOps/s $\color{#35bf28}+0.25\%$
test_stacked_get 42.3690μs 9.8978μs 101.0326 KOps/s 100.5283 KOps/s $\color{#35bf28}+0.50\%$
test_nested_getitemleaf 40.5760μs 11.0842μs 90.2185 KOps/s 89.1588 KOps/s $\color{#35bf28}+1.19\%$
test_nested_getitem 32.2400μs 10.1642μs 98.3841 KOps/s 97.0938 KOps/s $\color{#35bf28}+1.33\%$
test_stacked_getitemleaf 52.7590μs 10.8426μs 92.2288 KOps/s 91.4629 KOps/s $\color{#35bf28}+0.84\%$
test_stacked_getitem 32.9210μs 9.9882μs 100.1184 KOps/s 98.5618 KOps/s $\color{#35bf28}+1.58\%$
test_lock_nested 0.7804ms 0.3531ms 2.8322 KOps/s 2.4223 KOps/s $\textbf{\color{#35bf28}+16.92\%}$
test_lock_stack_nested 0.5496ms 0.3158ms 3.1670 KOps/s 3.2116 KOps/s $\color{#d91a1a}-1.39\%$
test_unlock_nested 0.7652ms 0.3597ms 2.7801 KOps/s 2.3859 KOps/s $\textbf{\color{#35bf28}+16.52\%}$
test_unlock_stack_nested 0.3844ms 0.3232ms 3.0944 KOps/s 3.1300 KOps/s $\color{#d91a1a}-1.14\%$
test_flatten_speed 0.5987ms 96.9153μs 10.3183 KOps/s 10.2819 KOps/s $\color{#35bf28}+0.35\%$
test_unflatten_speed 0.5829ms 0.4098ms 2.4401 KOps/s 2.4037 KOps/s $\color{#35bf28}+1.52\%$
test_common_ops 1.2262ms 0.7010ms 1.4265 KOps/s 1.4347 KOps/s $\color{#d91a1a}-0.57\%$
test_creation 18.6550μs 1.9216μs 520.4012 KOps/s 530.6737 KOps/s $\color{#d91a1a}-1.94\%$
test_creation_empty 21.7000μs 9.4038μs 106.3401 KOps/s 106.4745 KOps/s $\color{#d91a1a}-0.13\%$
test_creation_nested_1 40.3250μs 12.1691μs 82.1756 KOps/s 83.0445 KOps/s $\color{#d91a1a}-1.05\%$
test_creation_nested_2 40.0350μs 15.4847μs 64.5801 KOps/s 64.6259 KOps/s $\color{#d91a1a}-0.07\%$
test_clone 0.2212ms 14.0571μs 71.1382 KOps/s 74.0055 KOps/s $\color{#d91a1a}-3.87\%$
test_getitem[int] 35.3050μs 11.9258μs 83.8518 KOps/s 86.8195 KOps/s $\color{#d91a1a}-3.42\%$
test_getitem[slice_int] 68.6480μs 23.1653μs 43.1680 KOps/s 41.2798 KOps/s $\color{#35bf28}+4.57\%$
test_getitem[range] 83.5460μs 60.3228μs 16.5775 KOps/s 16.6638 KOps/s $\color{#d91a1a}-0.52\%$
test_getitem[tuple] 47.3480μs 19.3536μs 51.6700 KOps/s 50.8548 KOps/s $\color{#35bf28}+1.60\%$
test_getitem[list] 0.1386ms 42.2365μs 23.6762 KOps/s 23.0906 KOps/s $\color{#35bf28}+2.54\%$
test_setitem_dim[int] 54.6920μs 36.3203μs 27.5328 KOps/s 27.8146 KOps/s $\color{#d91a1a}-1.01\%$
test_setitem_dim[slice_int] 99.8370μs 64.0072μs 15.6232 KOps/s 15.4460 KOps/s $\color{#35bf28}+1.15\%$
test_setitem_dim[range] 0.1594ms 85.5726μs 11.6860 KOps/s 11.8942 KOps/s $\color{#d91a1a}-1.75\%$
test_setitem_dim[tuple] 93.1340μs 52.0960μs 19.1953 KOps/s 19.5656 KOps/s $\color{#d91a1a}-1.89\%$
test_setitem 80.6100μs 20.2472μs 49.3896 KOps/s 50.3635 KOps/s $\color{#d91a1a}-1.93\%$
test_set 59.0700μs 19.6366μs 50.9253 KOps/s 51.9080 KOps/s $\color{#d91a1a}-1.89\%$
test_set_shared 3.0161ms 0.1442ms 6.9359 KOps/s 7.0243 KOps/s $\color{#d91a1a}-1.26\%$
test_update 0.1473ms 21.1994μs 47.1711 KOps/s 48.0305 KOps/s $\color{#d91a1a}-1.79\%$
test_update_nested 0.1123ms 29.9873μs 33.3475 KOps/s 34.5010 KOps/s $\color{#d91a1a}-3.34\%$
test_update__nested 72.5550μs 26.0521μs 38.3847 KOps/s 37.5422 KOps/s $\color{#35bf28}+2.24\%$
test_set_nested 80.5970μs 21.7263μs 46.0272 KOps/s 46.9344 KOps/s $\color{#d91a1a}-1.93\%$
test_set_nested_new 79.7090μs 26.0790μs 38.3450 KOps/s 38.8650 KOps/s $\color{#d91a1a}-1.34\%$
test_select 0.1304ms 41.8725μs 23.8820 KOps/s 24.7004 KOps/s $\color{#d91a1a}-3.31\%$
test_select_nested 0.1398ms 60.9235μs 16.4140 KOps/s 16.4145 KOps/s $-0.00\%$
test_exclude_nested 0.2257ms 0.1238ms 8.0754 KOps/s 8.1286 KOps/s $\color{#d91a1a}-0.65\%$
test_empty[True] 0.5955ms 0.3993ms 2.5046 KOps/s 2.4755 KOps/s $\color{#35bf28}+1.18\%$
test_empty[False] 8.8264μs 1.1435μs 874.5098 KOps/s 865.7769 KOps/s $\color{#35bf28}+1.01\%$
test_unbind_speed 1.6786ms 0.2754ms 3.6315 KOps/s 3.8019 KOps/s $\color{#d91a1a}-4.48\%$
test_unbind_speed_stack0 0.6198ms 0.2582ms 3.8727 KOps/s 3.8867 KOps/s $\color{#d91a1a}-0.36\%$
test_unbind_speed_stack1 82.6902ms 0.7502ms 1.3330 KOps/s 1.2842 KOps/s $\color{#35bf28}+3.80\%$
test_split 77.6977ms 1.6480ms 606.8009 Ops/s 610.1766 Ops/s $\color{#d91a1a}-0.55\%$
test_chunk 76.3349ms 1.6453ms 607.7773 Ops/s 611.7764 Ops/s $\color{#d91a1a}-0.65\%$
test_creation[device0] 3.8271ms 87.5772μs 11.4185 KOps/s 11.6999 KOps/s $\color{#d91a1a}-2.40\%$
test_creation_from_tensor 0.2522ms 84.8678μs 11.7830 KOps/s 11.9101 KOps/s $\color{#d91a1a}-1.07\%$
test_add_one[memmap_tensor0] 0.1100ms 5.5033μs 181.7076 KOps/s 178.4003 KOps/s $\color{#35bf28}+1.85\%$
test_contiguous[memmap_tensor0] 22.1210μs 0.6418μs 1.5580 MOps/s 1.5576 MOps/s $\color{#35bf28}+0.02\%$
test_stack[memmap_tensor0] 25.4980μs 3.5576μs 281.0854 KOps/s 281.0261 KOps/s $\color{#35bf28}+0.02\%$
test_memmaptd_index 1.2153ms 0.2550ms 3.9215 KOps/s 3.7785 KOps/s $\color{#35bf28}+3.78\%$
test_memmaptd_index_astensor 0.7883ms 0.3298ms 3.0317 KOps/s 2.9386 KOps/s $\color{#35bf28}+3.17\%$
test_memmaptd_index_op 0.9361ms 0.6124ms 1.6330 KOps/s 1.6223 KOps/s $\color{#35bf28}+0.66\%$
test_serialize_model 0.2008s 0.1193s 8.3793 Ops/s 8.2886 Ops/s $\color{#35bf28}+1.09\%$
test_serialize_model_pickle 0.4508s 0.3779s 2.6461 Ops/s 2.6021 Ops/s $\color{#35bf28}+1.69\%$
test_serialize_weights 0.1917s 0.1183s 8.4555 Ops/s 8.5010 Ops/s $\color{#d91a1a}-0.53\%$
test_serialize_weights_returnearly 0.2155s 0.1398s 7.1527 Ops/s 7.5554 Ops/s $\textbf{\color{#d91a1a}-5.33\%}$
test_serialize_weights_pickle 1.0457s 0.5720s 1.7481 Ops/s 2.4341 Ops/s $\textbf{\color{#d91a1a}-28.18\%}$
test_serialize_weights_filesystem 0.1055s 99.0740ms 10.0935 Ops/s 10.0190 Ops/s $\color{#35bf28}+0.74\%$
test_serialize_model_filesystem 0.1811s 0.1090s 9.1778 Ops/s 9.6078 Ops/s $\color{#d91a1a}-4.48\%$
test_reshape_pytree 51.4860μs 25.5161μs 39.1909 KOps/s 38.6761 KOps/s $\color{#35bf28}+1.33\%$
test_reshape_td 88.0840μs 34.7638μs 28.7655 KOps/s 28.4107 KOps/s $\color{#35bf28}+1.25\%$
test_view_pytree 62.0260μs 25.2491μs 39.6053 KOps/s 39.2269 KOps/s $\color{#35bf28}+0.96\%$
test_view_td 97.9230μs 39.0496μs 25.6085 KOps/s 24.7689 KOps/s $\color{#35bf28}+3.39\%$
test_unbind_pytree 91.8020μs 29.6393μs 33.7389 KOps/s 33.5292 KOps/s $\color{#35bf28}+0.63\%$
test_unbind_td 0.4804ms 39.1626μs 25.5345 KOps/s 25.8894 KOps/s $\color{#d91a1a}-1.37\%$
test_split_pytree 76.8130μs 30.1044μs 33.2178 KOps/s 34.3658 KOps/s $\color{#d91a1a}-3.34\%$
test_split_td 0.5770ms 42.2799μs 23.6519 KOps/s 23.7001 KOps/s $\color{#d91a1a}-0.20\%$
test_add_pytree 92.8640μs 35.9642μs 27.8054 KOps/s 28.3652 KOps/s $\color{#d91a1a}-1.97\%$
test_add_td 0.1239ms 58.8353μs 16.9966 KOps/s 17.8589 KOps/s $\color{#d91a1a}-4.83\%$
test_distributed 0.2752ms 0.1033ms 9.6848 KOps/s 9.5727 KOps/s $\color{#35bf28}+1.17\%$
test_tdmodule 0.1097ms 17.3725μs 57.5624 KOps/s 59.4499 KOps/s $\color{#d91a1a}-3.17\%$
test_tdmodule_dispatch 50.1630μs 33.5715μs 29.7872 KOps/s 30.0091 KOps/s $\color{#d91a1a}-0.74\%$
test_tdseq 45.5450μs 20.3590μs 49.1183 KOps/s 49.7053 KOps/s $\color{#d91a1a}-1.18\%$
test_tdseq_dispatch 82.2740μs 39.4541μs 25.3459 KOps/s 25.6098 KOps/s $\color{#d91a1a}-1.03\%$
test_instantiation_functorch 2.1217ms 1.3387ms 746.9915 Ops/s 760.1598 Ops/s $\color{#d91a1a}-1.73\%$
test_instantiation_td 1.6560ms 1.0452ms 956.7279 Ops/s 975.2500 Ops/s $\color{#d91a1a}-1.90\%$
test_exec_functorch 0.2974ms 0.1607ms 6.2215 KOps/s 6.1036 KOps/s $\color{#35bf28}+1.93\%$
test_exec_functional_call 0.3550ms 0.1499ms 6.6722 KOps/s 6.4478 KOps/s $\color{#35bf28}+3.48\%$
test_exec_td 0.2243ms 0.1451ms 6.8907 KOps/s 6.6124 KOps/s $\color{#35bf28}+4.21\%$
test_exec_td_decorator 0.9652ms 0.2227ms 4.4894 KOps/s 4.4628 KOps/s $\color{#35bf28}+0.60\%$
test_vmap_mlp_speed[True-True] 0.6837ms 0.4820ms 2.0746 KOps/s 2.0649 KOps/s $\color{#35bf28}+0.47\%$
test_vmap_mlp_speed[True-False] 0.8482ms 0.4784ms 2.0904 KOps/s 2.0721 KOps/s $\color{#35bf28}+0.89\%$
test_vmap_mlp_speed[False-True] 0.6510ms 0.3871ms 2.5833 KOps/s 2.5206 KOps/s $\color{#35bf28}+2.49\%$
test_vmap_mlp_speed[False-False] 0.6244ms 0.3867ms 2.5860 KOps/s 2.5167 KOps/s $\color{#35bf28}+2.75\%$
test_vmap_mlp_speed_decorator[True-True] 1.3220ms 0.5531ms 1.8079 KOps/s 1.8102 KOps/s $\color{#d91a1a}-0.13\%$
test_vmap_mlp_speed_decorator[True-False] 1.0037ms 0.5519ms 1.8118 KOps/s 1.8156 KOps/s $\color{#d91a1a}-0.21\%$
test_vmap_mlp_speed_decorator[False-True] 0.9073ms 0.4527ms 2.2092 KOps/s 2.1402 KOps/s $\color{#35bf28}+3.22\%$
test_vmap_mlp_speed_decorator[False-False] 0.6836ms 0.4495ms 2.2247 KOps/s 2.0456 KOps/s $\textbf{\color{#35bf28}+8.76\%}$
test_to_module_speed[True] 2.0025ms 1.7044ms 586.7073 Ops/s 584.7119 Ops/s $\color{#35bf28}+0.34\%$
test_to_module_speed[False] 1.9457ms 1.6649ms 600.6396 Ops/s 598.9103 Ops/s $\color{#35bf28}+0.29\%$
test_tc_init 72.6360μs 26.1016μs 38.3119 KOps/s 38.9323 KOps/s $\color{#d91a1a}-1.59\%$
test_tc_init_nested 0.1452ms 50.2632μs 19.8953 KOps/s 19.0047 KOps/s $\color{#35bf28}+4.69\%$
test_tc_first_layer_tensor 2.7065μs 0.6713μs 1.4897 MOps/s 1.4449 MOps/s $\color{#35bf28}+3.10\%$
test_tc_first_layer_nontensor 2.5798μs 0.6827μs 1.4647 MOps/s 1.4646 MOps/s $+0.00\%$
test_tc_second_layer_tensor 26.8500μs 1.8265μs 547.5062 KOps/s 540.3751 KOps/s $\color{#35bf28}+1.32\%$
test_tc_second_layer_nontensor 15.1783μs 1.5426μs 648.2757 KOps/s 650.9552 KOps/s $\color{#d91a1a}-0.41\%$
test_unbind 0.1037s 8.4270ms 118.6656 Ops/s 128.9333 Ops/s $\textbf{\color{#d91a1a}-7.96\%}$
test_full_like 21.8453ms 12.7081ms 78.6898 Ops/s 83.9345 Ops/s $\textbf{\color{#d91a1a}-6.25\%}$
test_zeros_like 14.5463ms 6.4320ms 155.4722 Ops/s 153.2131 Ops/s $\color{#35bf28}+1.47\%$
test_ones_like 13.8814ms 7.0478ms 141.8880 Ops/s 149.6360 Ops/s $\textbf{\color{#d91a1a}-5.18\%}$
test_clone 17.5469ms 9.0347ms 110.6850 Ops/s 112.8175 Ops/s $\color{#d91a1a}-1.89\%$
test_squeeze 72.5450μs 14.5614μs 68.6746 KOps/s 68.0629 KOps/s $\color{#35bf28}+0.90\%$
test_unsqueeze 0.1275ms 62.6214μs 15.9690 KOps/s 16.0801 KOps/s $\color{#d91a1a}-0.69\%$
test_split 0.1928ms 0.1165ms 8.5820 KOps/s 8.8439 KOps/s $\color{#d91a1a}-2.96\%$
test_permute 0.2286ms 0.1330ms 7.5184 KOps/s 7.8225 KOps/s $\color{#d91a1a}-3.89\%$
test_stack 35.1629ms 25.4338ms 39.3178 Ops/s 41.6849 Ops/s $\textbf{\color{#d91a1a}-5.68\%}$
test_cat 31.4785ms 25.5851ms 39.0852 Ops/s 41.0581 Ops/s $\color{#d91a1a}-4.81\%$

Copy link

github-actions bot commented Jun 4, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.4863ms 13.6204μs 73.4191 KOps/s 75.3939 KOps/s $\color{#d91a1a}-2.62\%$
test_plain_set_stack_nested 0.2013ms 13.8484μs 72.2105 KOps/s 74.6287 KOps/s $\color{#d91a1a}-3.24\%$
test_plain_set_nested_inplace 0.2047ms 15.0247μs 66.5573 KOps/s 68.7854 KOps/s $\color{#d91a1a}-3.24\%$
test_plain_set_stack_nested_inplace 0.2025ms 15.1571μs 65.9756 KOps/s 68.1298 KOps/s $\color{#d91a1a}-3.16\%$
test_items 0.1893ms 4.6895μs 213.2431 KOps/s 209.7541 KOps/s $\color{#35bf28}+1.66\%$
test_items_nested 0.5230ms 0.3370ms 2.9669 KOps/s 2.9297 KOps/s $\color{#35bf28}+1.27\%$
test_items_nested_locked 0.5247ms 0.3465ms 2.8864 KOps/s 2.9179 KOps/s $\color{#d91a1a}-1.08\%$
test_items_nested_leaf 0.1316ms 83.6175μs 11.9592 KOps/s 11.9973 KOps/s $\color{#d91a1a}-0.32\%$
test_items_stack_nested 0.5371ms 0.3481ms 2.8724 KOps/s 2.9169 KOps/s $\color{#d91a1a}-1.53\%$
test_items_stack_nested_leaf 0.2681ms 85.3337μs 11.7187 KOps/s 11.9648 KOps/s $\color{#d91a1a}-2.06\%$
test_items_stack_nested_locked 0.5266ms 0.3433ms 2.9132 KOps/s 2.9048 KOps/s $\color{#35bf28}+0.29\%$
test_keys 0.1887ms 4.3454μs 230.1279 KOps/s 230.5786 KOps/s $\color{#d91a1a}-0.20\%$
test_keys_nested 0.2489ms 67.1273μs 14.8971 KOps/s 14.8077 KOps/s $\color{#35bf28}+0.60\%$
test_keys_nested_locked 0.8232ms 73.5168μs 13.6023 KOps/s 13.8980 KOps/s $\color{#d91a1a}-2.13\%$
test_keys_nested_leaf 2.4231ms 58.5615μs 17.0761 KOps/s 17.2575 KOps/s $\color{#d91a1a}-1.05\%$
test_keys_stack_nested 99.6410μs 67.5642μs 14.8007 KOps/s 14.8025 KOps/s $\color{#d91a1a}-0.01\%$
test_keys_stack_nested_leaf 0.2540ms 58.5367μs 17.0833 KOps/s 17.2340 KOps/s $\color{#d91a1a}-0.87\%$
test_keys_stack_nested_locked 0.1183ms 72.2221μs 13.8462 KOps/s 13.7578 KOps/s $\color{#35bf28}+0.64\%$
test_values 55.4373μs 1.8091μs 552.7742 KOps/s 553.4950 KOps/s $\color{#d91a1a}-0.13\%$
test_values_nested 68.0810μs 36.0162μs 27.7653 KOps/s 28.5411 KOps/s $\color{#d91a1a}-2.72\%$
test_values_nested_locked 0.2292ms 38.1388μs 26.2200 KOps/s 27.0126 KOps/s $\color{#d91a1a}-2.93\%$
test_values_nested_leaf 63.7910μs 32.2912μs 30.9682 KOps/s 31.9641 KOps/s $\color{#d91a1a}-3.12\%$
test_values_stack_nested 95.6810μs 36.5620μs 27.3508 KOps/s 27.8548 KOps/s $\color{#d91a1a}-1.81\%$
test_values_stack_nested_leaf 56.7510μs 32.9404μs 30.3578 KOps/s 31.4608 KOps/s $\color{#d91a1a}-3.51\%$
test_values_stack_nested_locked 68.7310μs 38.8350μs 25.7500 KOps/s 26.6140 KOps/s $\color{#d91a1a}-3.25\%$
test_membership 1.8275μs 0.6984μs 1.4318 MOps/s 1.3882 MOps/s $\color{#35bf28}+3.14\%$
test_membership_nested 26.7300μs 2.5415μs 393.4703 KOps/s 389.0147 KOps/s $\color{#35bf28}+1.15\%$
test_membership_nested_leaf 26.8200μs 2.5410μs 393.5436 KOps/s 389.2237 KOps/s $\color{#35bf28}+1.11\%$
test_membership_stacked_nested 23.3000μs 2.5890μs 386.2436 KOps/s 393.4256 KOps/s $\color{#d91a1a}-1.83\%$
test_membership_stacked_nested_leaf 40.4910μs 2.5627μs 390.2127 KOps/s 391.7307 KOps/s $\color{#d91a1a}-0.39\%$
test_membership_nested_last 32.4710μs 3.0746μs 325.2477 KOps/s 324.6305 KOps/s $\color{#35bf28}+0.19\%$
test_membership_nested_leaf_last 27.5010μs 3.0950μs 323.1063 KOps/s 326.3676 KOps/s $\color{#d91a1a}-1.00\%$
test_membership_stacked_nested_last 47.2510μs 3.8511μs 259.6651 KOps/s 281.7535 KOps/s $\textbf{\color{#d91a1a}-7.84\%}$
test_membership_stacked_nested_leaf_last 93.5310μs 3.8611μs 258.9945 KOps/s 284.8311 KOps/s $\textbf{\color{#d91a1a}-9.07\%}$
test_nested_getleaf 25.5210μs 8.4010μs 119.0329 KOps/s 119.2282 KOps/s $\color{#d91a1a}-0.16\%$
test_nested_get 23.1900μs 7.8795μs 126.9111 KOps/s 127.1278 KOps/s $\color{#d91a1a}-0.17\%$
test_stacked_getleaf 29.2710μs 8.3757μs 119.3930 KOps/s 118.7707 KOps/s $\color{#35bf28}+0.52\%$
test_stacked_get 30.4200μs 7.8760μs 126.9676 KOps/s 126.3099 KOps/s $\color{#35bf28}+0.52\%$
test_nested_getitemleaf 34.1310μs 8.5480μs 116.9870 KOps/s 117.0606 KOps/s $\color{#d91a1a}-0.06\%$
test_nested_getitem 35.3000μs 8.0198μs 124.6916 KOps/s 124.6215 KOps/s $\color{#35bf28}+0.06\%$
test_stacked_getitemleaf 37.5700μs 8.5664μs 116.7351 KOps/s 116.2512 KOps/s $\color{#35bf28}+0.42\%$
test_stacked_getitem 27.1200μs 8.0798μs 123.7652 KOps/s 124.1249 KOps/s $\color{#d91a1a}-0.29\%$
test_lock_nested 58.8036ms 0.4132ms 2.4200 KOps/s 2.4194 KOps/s $\color{#35bf28}+0.02\%$
test_lock_stack_nested 0.3350ms 0.3067ms 3.2606 KOps/s 3.2528 KOps/s $\color{#35bf28}+0.24\%$
test_unlock_nested 0.7089ms 0.3547ms 2.8192 KOps/s 2.8441 KOps/s $\color{#d91a1a}-0.88\%$
test_unlock_stack_nested 0.4769ms 0.3160ms 3.1645 KOps/s 3.1771 KOps/s $\color{#d91a1a}-0.40\%$
test_flatten_speed 0.1854ms 0.1017ms 9.8293 KOps/s 9.7438 KOps/s $\color{#35bf28}+0.88\%$
test_unflatten_speed 0.4391ms 0.2914ms 3.4320 KOps/s 3.4185 KOps/s $\color{#35bf28}+0.40\%$
test_common_ops 1.2413ms 0.6214ms 1.6092 KOps/s 1.6558 KOps/s $\color{#d91a1a}-2.81\%$
test_creation 36.4410μs 1.6123μs 620.2436 KOps/s 614.7658 KOps/s $\color{#35bf28}+0.89\%$
test_creation_empty 28.8410μs 10.4105μs 96.0569 KOps/s 104.3335 KOps/s $\textbf{\color{#d91a1a}-7.93\%}$
test_creation_nested_1 48.9710μs 12.1868μs 82.0559 KOps/s 87.7691 KOps/s $\textbf{\color{#d91a1a}-6.51\%}$
test_creation_nested_2 55.2110μs 14.4164μs 69.3657 KOps/s 73.3193 KOps/s $\textbf{\color{#d91a1a}-5.39\%}$
test_clone 0.1557ms 12.0665μs 82.8742 KOps/s 83.8198 KOps/s $\color{#d91a1a}-1.13\%$
test_getitem[int] 32.2200μs 10.6667μs 93.7493 KOps/s 94.0208 KOps/s $\color{#d91a1a}-0.29\%$
test_getitem[slice_int] 72.1510μs 20.2076μs 49.4864 KOps/s 48.5753 KOps/s $\color{#35bf28}+1.88\%$
test_getitem[range] 65.1410μs 46.5708μs 21.4727 KOps/s 21.3982 KOps/s $\color{#35bf28}+0.35\%$
test_getitem[tuple] 64.1910μs 18.3839μs 54.3954 KOps/s 54.5319 KOps/s $\color{#d91a1a}-0.25\%$
test_getitem[list] 0.1682ms 33.5157μs 29.8368 KOps/s 26.7552 KOps/s $\textbf{\color{#35bf28}+11.52\%}$
test_setitem_dim[int] 53.7710μs 31.6383μs 31.6072 KOps/s 30.4085 KOps/s $\color{#35bf28}+3.94\%$
test_setitem_dim[slice_int] 75.2710μs 50.9875μs 19.6126 KOps/s 18.5519 KOps/s $\textbf{\color{#35bf28}+5.72\%}$
test_setitem_dim[range] 0.1907ms 67.9032μs 14.7269 KOps/s 14.5698 KOps/s $\color{#35bf28}+1.08\%$
test_setitem_dim[tuple] 73.4210μs 44.7011μs 22.3708 KOps/s 22.4290 KOps/s $\color{#d91a1a}-0.26\%$
test_setitem 0.1207ms 17.8129μs 56.1392 KOps/s 57.0409 KOps/s $\color{#d91a1a}-1.58\%$
test_set 56.6810μs 17.5571μs 56.9572 KOps/s 56.7724 KOps/s $\color{#35bf28}+0.33\%$
test_set_shared 1.2180ms 99.3159μs 10.0689 KOps/s 9.8520 KOps/s $\color{#35bf28}+2.20\%$
test_update 87.7820μs 20.2830μs 49.3023 KOps/s 51.3277 KOps/s $\color{#d91a1a}-3.95\%$
test_update_nested 67.7110μs 25.8202μs 38.7294 KOps/s 40.4575 KOps/s $\color{#d91a1a}-4.27\%$
test_update__nested 0.1116ms 22.8737μs 43.7183 KOps/s 43.7117 KOps/s $\color{#35bf28}+0.01\%$
test_set_nested 66.2110μs 18.5282μs 53.9719 KOps/s 55.7913 KOps/s $\color{#d91a1a}-3.26\%$
test_set_nested_new 68.2410μs 21.3545μs 46.8285 KOps/s 47.9562 KOps/s $\color{#d91a1a}-2.35\%$
test_select 0.1739ms 34.6949μs 28.8227 KOps/s 28.3728 KOps/s $\color{#35bf28}+1.59\%$
test_select_nested 0.6453ms 53.9186μs 18.5465 KOps/s 18.3648 KOps/s $\color{#35bf28}+0.99\%$
test_exclude_nested 0.1806ms 0.1101ms 9.0860 KOps/s 9.0905 KOps/s $\color{#d91a1a}-0.05\%$
test_empty[True] 0.5031ms 0.3419ms 2.9245 KOps/s 2.8723 KOps/s $\color{#35bf28}+1.82\%$
test_empty[False] 4.2821μs 0.9204μs 1.0865 MOps/s 1.0916 MOps/s $\color{#d91a1a}-0.47\%$
test_to 99.4420μs 77.1868μs 12.9556 KOps/s 13.0586 KOps/s $\color{#d91a1a}-0.79\%$
test_to_nonblocking 0.2659ms 61.4295μs 16.2788 KOps/s 15.9448 KOps/s $\color{#35bf28}+2.09\%$
test_unbind_speed 0.4051ms 0.2676ms 3.7363 KOps/s 3.7345 KOps/s $\color{#35bf28}+0.05\%$
test_unbind_speed_stack0 0.2954ms 0.2687ms 3.7209 KOps/s 3.6999 KOps/s $\color{#35bf28}+0.57\%$
test_unbind_speed_stack1 76.2574ms 0.8125ms 1.2308 KOps/s 1.2226 KOps/s $\color{#35bf28}+0.68\%$
test_split 76.6976ms 1.6596ms 602.5489 Ops/s 595.6292 Ops/s $\color{#35bf28}+1.16\%$
test_chunk 76.1169ms 1.6554ms 604.0763 Ops/s 594.9098 Ops/s $\color{#35bf28}+1.54\%$
test_creation[device0] 0.1847ms 58.3230μs 17.1459 KOps/s 17.2295 KOps/s $\color{#d91a1a}-0.49\%$
test_creation_from_tensor 0.2093ms 54.8854μs 18.2198 KOps/s 18.4720 KOps/s $\color{#d91a1a}-1.37\%$
test_add_one[memmap_tensor0] 81.4420μs 7.2150μs 138.6004 KOps/s 138.0541 KOps/s $\color{#35bf28}+0.40\%$
test_contiguous[memmap_tensor0] 23.2200μs 0.6744μs 1.4827 MOps/s 1.4539 MOps/s $\color{#35bf28}+1.98\%$
test_stack[memmap_tensor0] 34.4210μs 4.8348μs 206.8346 KOps/s 209.4806 KOps/s $\color{#d91a1a}-1.26\%$
test_memmaptd_index 1.1778ms 0.2898ms 3.4510 KOps/s 3.4808 KOps/s $\color{#d91a1a}-0.86\%$
test_memmaptd_index_astensor 0.6416ms 0.3637ms 2.7494 KOps/s 2.7777 KOps/s $\color{#d91a1a}-1.02\%$
test_memmaptd_index_op 1.2503ms 0.7002ms 1.4282 KOps/s 1.4469 KOps/s $\color{#d91a1a}-1.29\%$
test_serialize_model 0.1846s 0.1134s 8.8160 Ops/s 8.3816 Ops/s $\textbf{\color{#35bf28}+5.18\%}$
test_serialize_model_pickle 1.3859s 1.2394s 0.8068 Ops/s 0.8054 Ops/s $\color{#35bf28}+0.18\%$
test_serialize_weights 0.1822s 0.1103s 9.0652 Ops/s 8.7341 Ops/s $\color{#35bf28}+3.79\%$
test_serialize_weights_returnearly 0.2612s 99.4605ms 10.0542 Ops/s 9.9602 Ops/s $\color{#35bf28}+0.94\%$
test_serialize_weights_pickle 1.3588s 1.2482s 0.8012 Ops/s 0.8088 Ops/s $\color{#d91a1a}-0.94\%$
test_reshape_pytree 0.2179ms 26.1621μs 38.2233 KOps/s 38.2078 KOps/s $\color{#35bf28}+0.04\%$
test_reshape_td 61.0410μs 30.4621μs 32.8277 KOps/s 32.2042 KOps/s $\color{#35bf28}+1.94\%$
test_view_pytree 0.2090ms 25.6755μs 38.9476 KOps/s 38.4973 KOps/s $\color{#35bf28}+1.17\%$
test_view_td 0.1624ms 35.8910μs 27.8621 KOps/s 28.0209 KOps/s $\color{#d91a1a}-0.57\%$
test_unbind_pytree 0.1412ms 32.3539μs 30.9082 KOps/s 31.0808 KOps/s $\color{#d91a1a}-0.56\%$
test_unbind_td 0.4771ms 42.5262μs 23.5149 KOps/s 23.8881 KOps/s $\color{#d91a1a}-1.56\%$
test_split_pytree 73.6220μs 34.6331μs 28.8741 KOps/s 28.0923 KOps/s $\color{#35bf28}+2.78\%$
test_split_td 0.1275ms 39.0330μs 25.6194 KOps/s 25.7249 KOps/s $\color{#d91a1a}-0.41\%$
test_add_pytree 0.1728ms 39.7252μs 25.1730 KOps/s 25.5647 KOps/s $\color{#d91a1a}-1.53\%$
test_add_td 0.1567ms 53.5008μs 18.6913 KOps/s 18.1544 KOps/s $\color{#35bf28}+2.96\%$
test_distributed 2.5881ms 91.9217μs 10.8788 KOps/s 14.0333 KOps/s $\textbf{\color{#d91a1a}-22.48\%}$
test_tdmodule 0.1542ms 15.7650μs 63.4318 KOps/s 65.0990 KOps/s $\color{#d91a1a}-2.56\%$
test_tdmodule_dispatch 46.5910μs 30.3400μs 32.9597 KOps/s 33.0037 KOps/s $\color{#d91a1a}-0.13\%$
test_tdseq 33.9200μs 17.4314μs 57.3676 KOps/s 58.9104 KOps/s $\color{#d91a1a}-2.62\%$
test_tdseq_dispatch 0.1276ms 34.0904μs 29.3338 KOps/s 30.0560 KOps/s $\color{#d91a1a}-2.40\%$
test_instantiation_functorch 1.6417ms 1.5248ms 655.8318 Ops/s 654.3637 Ops/s $\color{#35bf28}+0.22\%$
test_instantiation_td 1.5141ms 1.0504ms 952.0224 Ops/s 967.5564 Ops/s $\color{#d91a1a}-1.61\%$
test_exec_functorch 0.2953ms 0.1549ms 6.4557 KOps/s 6.4470 KOps/s $\color{#35bf28}+0.13\%$
test_exec_functional_call 0.1948ms 0.1442ms 6.9363 KOps/s 6.9839 KOps/s $\color{#d91a1a}-0.68\%$
test_exec_td 0.2151ms 0.1424ms 7.0210 KOps/s 7.1132 KOps/s $\color{#d91a1a}-1.30\%$
test_exec_td_decorator 0.4124ms 0.2170ms 4.6088 KOps/s 4.6431 KOps/s $\color{#d91a1a}-0.74\%$
test_vmap_mlp_speed[True-True] 0.7598ms 0.5893ms 1.6970 KOps/s 1.7060 KOps/s $\color{#d91a1a}-0.53\%$
test_vmap_mlp_speed[True-False] 0.7411ms 0.5856ms 1.7077 KOps/s 1.7035 KOps/s $\color{#35bf28}+0.25\%$
test_vmap_mlp_speed[False-True] 0.6701ms 0.5139ms 1.9460 KOps/s 1.9236 KOps/s $\color{#35bf28}+1.16\%$
test_vmap_mlp_speed[False-False] 0.6822ms 0.5153ms 1.9406 KOps/s 1.9496 KOps/s $\color{#d91a1a}-0.46\%$
test_vmap_mlp_speed_decorator[True-True] 1.1394ms 0.6532ms 1.5310 KOps/s 1.5385 KOps/s $\color{#d91a1a}-0.49\%$
test_vmap_mlp_speed_decorator[True-False] 0.8044ms 0.6495ms 1.5395 KOps/s 1.5381 KOps/s $\color{#35bf28}+0.09\%$
test_vmap_mlp_speed_decorator[False-True] 0.7363ms 0.5732ms 1.7444 KOps/s 1.7443 KOps/s $+0.01\%$
test_vmap_mlp_speed_decorator[False-False] 0.7310ms 0.5740ms 1.7423 KOps/s 1.7490 KOps/s $\color{#d91a1a}-0.38\%$
test_vmap_transformer_speed[True-True] 7.8831ms 7.7146ms 129.6246 Ops/s 130.7554 Ops/s $\color{#d91a1a}-0.86\%$
test_vmap_transformer_speed[True-False] 8.1360ms 7.8055ms 128.1151 Ops/s 129.8291 Ops/s $\color{#d91a1a}-1.32\%$
test_vmap_transformer_speed[False-True] 8.0755ms 7.7213ms 129.5117 Ops/s 131.4826 Ops/s $\color{#d91a1a}-1.50\%$
test_vmap_transformer_speed[False-False] 8.0660ms 7.6984ms 129.8975 Ops/s 130.6666 Ops/s $\color{#d91a1a}-0.59\%$
test_vmap_transformer_speed_decorator[True-True] 19.6687ms 18.8459ms 53.0620 Ops/s 52.9373 Ops/s $\color{#35bf28}+0.24\%$
test_vmap_transformer_speed_decorator[True-False] 19.8342ms 18.8950ms 52.9242 Ops/s 53.2641 Ops/s $\color{#d91a1a}-0.64\%$
test_vmap_transformer_speed_decorator[False-True] 19.3570ms 18.6896ms 53.5056 Ops/s 53.7910 Ops/s $\color{#d91a1a}-0.53\%$
test_vmap_transformer_speed_decorator[False-False] 19.1860ms 18.6155ms 53.7188 Ops/s 53.6210 Ops/s $\color{#35bf28}+0.18\%$
test_to_module_speed[True] 1.7390ms 1.5143ms 660.3522 Ops/s 646.5715 Ops/s $\color{#35bf28}+2.13\%$
test_to_module_speed[False] 1.6207ms 1.5021ms 665.7400 Ops/s 656.2163 Ops/s $\color{#35bf28}+1.45\%$
test_tc_init 58.7110μs 27.9355μs 35.7967 KOps/s 37.1084 KOps/s $\color{#d91a1a}-3.53\%$
test_tc_init_nested 83.0110μs 58.3372μs 17.1417 KOps/s 19.1756 KOps/s $\textbf{\color{#d91a1a}-10.61\%}$
test_tc_first_layer_tensor 0.7555μs 0.3660μs 2.7322 MOps/s 2.7409 MOps/s $\color{#d91a1a}-0.32\%$
test_tc_first_layer_nontensor 1.3916μs 0.3945μs 2.5351 MOps/s 2.5604 MOps/s $\color{#d91a1a}-0.99\%$
test_tc_second_layer_tensor 4.2582μs 0.9806μs 1.0198 MOps/s 1.0223 MOps/s $\color{#d91a1a}-0.25\%$
test_tc_second_layer_nontensor 4.3118μs 0.8343μs 1.1986 MOps/s 1.2046 MOps/s $\color{#d91a1a}-0.49\%$
test_unbind 0.1039s 8.0845ms 123.6929 Ops/s 147.4537 Ops/s $\textbf{\color{#d91a1a}-16.11\%}$
test_full_like 13.8535ms 13.2983ms 75.1975 Ops/s 72.4146 Ops/s $\color{#35bf28}+3.84\%$
test_zeros_like 8.0761ms 7.8372ms 127.5960 Ops/s 125.9241 Ops/s $\color{#35bf28}+1.33\%$
test_ones_like 8.1415ms 7.8352ms 127.6299 Ops/s 126.3662 Ops/s $\color{#35bf28}+1.00\%$
test_clone 10.8754ms 9.5871ms 104.3063 Ops/s 102.3626 Ops/s $\color{#35bf28}+1.90\%$
test_squeeze 55.5400μs 11.0222μs 90.7260 KOps/s 91.5703 KOps/s $\color{#d91a1a}-0.92\%$
test_unsqueeze 0.1826ms 50.9967μs 19.6091 KOps/s 19.8140 KOps/s $\color{#d91a1a}-1.03\%$
test_split 0.2367ms 95.1297μs 10.5120 KOps/s 10.5573 KOps/s $\color{#d91a1a}-0.43\%$
test_permute 0.2486ms 0.1085ms 9.2133 KOps/s 9.2312 KOps/s $\color{#d91a1a}-0.19\%$
test_stack 28.7399ms 27.9310ms 35.8025 Ops/s 35.4222 Ops/s $\color{#35bf28}+1.07\%$
test_cat 28.6212ms 27.8993ms 35.8432 Ops/s 35.3417 Ops/s $\color{#35bf28}+1.42\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. versioning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants