Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix h5 auto batch size #798

Merged
merged 2 commits into from
May 30, 2024
Merged

[BugFix] Fix h5 auto batch size #798

merged 2 commits into from
May 30, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented May 30, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 30, 2024
Copy link

github-actions bot commented May 30, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}20$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 39.8840μs 16.9042μs 59.1569 KOps/s 54.9812 KOps/s $\textbf{\color{#35bf28}+7.59\%}$
test_plain_set_stack_nested 37.3200μs 17.0638μs 58.6036 KOps/s 53.8870 KOps/s $\textbf{\color{#35bf28}+8.75\%}$
test_plain_set_nested_inplace 57.1960μs 19.2225μs 52.0224 KOps/s 48.0917 KOps/s $\textbf{\color{#35bf28}+8.17\%}$
test_plain_set_stack_nested_inplace 60.5830μs 19.2570μs 51.9292 KOps/s 48.8966 KOps/s $\textbf{\color{#35bf28}+6.20\%}$
test_items 17.2420μs 2.4914μs 401.3818 KOps/s 366.4193 KOps/s $\textbf{\color{#35bf28}+9.54\%}$
test_items_nested 0.4358ms 0.2742ms 3.6471 KOps/s 3.8047 KOps/s $\color{#d91a1a}-4.14\%$
test_items_nested_locked 1.1363ms 0.2762ms 3.6210 KOps/s 3.7464 KOps/s $\color{#d91a1a}-3.35\%$
test_items_nested_leaf 0.1629ms 78.7390μs 12.7002 KOps/s 12.6152 KOps/s $\color{#35bf28}+0.67\%$
test_items_stack_nested 1.7757ms 0.2764ms 3.6174 KOps/s 3.7383 KOps/s $\color{#d91a1a}-3.24\%$
test_items_stack_nested_leaf 0.1476ms 79.0930μs 12.6433 KOps/s 12.3574 KOps/s $\color{#35bf28}+2.31\%$
test_items_stack_nested_locked 0.4160ms 0.2737ms 3.6531 KOps/s 3.7767 KOps/s $\color{#d91a1a}-3.27\%$
test_keys 16.8310μs 3.9968μs 250.1998 KOps/s 259.4504 KOps/s $\color{#d91a1a}-3.57\%$
test_keys_nested 0.2410ms 0.1398ms 7.1518 KOps/s 7.0534 KOps/s $\color{#35bf28}+1.39\%$
test_keys_nested_locked 0.6922ms 0.1436ms 6.9629 KOps/s 6.7488 KOps/s $\color{#35bf28}+3.17\%$
test_keys_nested_leaf 0.2124ms 0.1185ms 8.4391 KOps/s 8.4939 KOps/s $\color{#d91a1a}-0.65\%$
test_keys_stack_nested 0.2448ms 0.1405ms 7.1198 KOps/s 6.9813 KOps/s $\color{#35bf28}+1.98\%$
test_keys_stack_nested_leaf 0.2358ms 0.1181ms 8.4661 KOps/s 8.1772 KOps/s $\color{#35bf28}+3.53\%$
test_keys_stack_nested_locked 0.2114ms 0.1438ms 6.9560 KOps/s 6.7030 KOps/s $\color{#35bf28}+3.77\%$
test_values 4.6446μs 1.1389μs 878.0284 KOps/s 850.7945 KOps/s $\color{#35bf28}+3.20\%$
test_values_nested 89.2760μs 50.7111μs 19.7195 KOps/s 18.8829 KOps/s $\color{#35bf28}+4.43\%$
test_values_nested_locked 88.0740μs 50.9424μs 19.6300 KOps/s 18.9382 KOps/s $\color{#35bf28}+3.65\%$
test_values_nested_leaf 76.2230μs 45.9874μs 21.7451 KOps/s 20.7698 KOps/s $\color{#35bf28}+4.70\%$
test_values_stack_nested 87.3230μs 50.9253μs 19.6366 KOps/s 18.7600 KOps/s $\color{#35bf28}+4.67\%$
test_values_stack_nested_leaf 92.8130μs 45.6432μs 21.9091 KOps/s 21.0965 KOps/s $\color{#35bf28}+3.85\%$
test_values_stack_nested_locked 0.1053ms 50.8282μs 19.6741 KOps/s 18.4395 KOps/s $\textbf{\color{#35bf28}+6.70\%}$
test_membership 11.4310μs 1.3400μs 746.2479 KOps/s 742.7522 KOps/s $\color{#35bf28}+0.47\%$
test_membership_nested 19.2860μs 3.5288μs 283.3828 KOps/s 288.1074 KOps/s $\color{#d91a1a}-1.64\%$
test_membership_nested_leaf 28.0330μs 3.5635μs 280.6266 KOps/s 286.8593 KOps/s $\color{#d91a1a}-2.17\%$
test_membership_stacked_nested 20.6180μs 3.4917μs 286.3929 KOps/s 286.6680 KOps/s $\color{#d91a1a}-0.10\%$
test_membership_stacked_nested_leaf 29.3840μs 3.5519μs 281.5380 KOps/s 288.7939 KOps/s $\color{#d91a1a}-2.51\%$
test_membership_nested_last 25.9390μs 4.3427μs 230.2691 KOps/s 234.9774 KOps/s $\color{#d91a1a}-2.00\%$
test_membership_nested_leaf_last 24.8870μs 4.3930μs 227.6366 KOps/s 234.1534 KOps/s $\color{#d91a1a}-2.78\%$
test_membership_stacked_nested_last 18.1940μs 4.3385μs 230.4954 KOps/s 239.9229 KOps/s $\color{#d91a1a}-3.93\%$
test_membership_stacked_nested_leaf_last 19.6060μs 4.3733μs 228.6582 KOps/s 235.5802 KOps/s $\color{#d91a1a}-2.94\%$
test_nested_getleaf 36.1180μs 10.5870μs 94.4557 KOps/s 93.8552 KOps/s $\color{#35bf28}+0.64\%$
test_nested_get 38.8830μs 10.0031μs 99.9690 KOps/s 101.0254 KOps/s $\color{#d91a1a}-1.05\%$
test_stacked_getleaf 29.9460μs 10.5405μs 94.8724 KOps/s 95.4387 KOps/s $\color{#d91a1a}-0.59\%$
test_stacked_get 33.3320μs 10.1562μs 98.4623 KOps/s 101.9049 KOps/s $\color{#d91a1a}-3.38\%$
test_nested_getitemleaf 32.7710μs 11.2108μs 89.1993 KOps/s 91.4450 KOps/s $\color{#d91a1a}-2.46\%$
test_nested_getitem 29.2040μs 10.3851μs 96.2920 KOps/s 97.7728 KOps/s $\color{#d91a1a}-1.51\%$
test_stacked_getitemleaf 33.0410μs 11.1298μs 89.8487 KOps/s 91.3947 KOps/s $\color{#d91a1a}-1.69\%$
test_stacked_getitem 29.8960μs 10.2874μs 97.2065 KOps/s 98.8913 KOps/s $\color{#d91a1a}-1.70\%$
test_lock_nested 52.2181ms 0.4099ms 2.4395 KOps/s 2.8178 KOps/s $\textbf{\color{#d91a1a}-13.43\%}$
test_lock_stack_nested 0.4472ms 0.3164ms 3.1603 KOps/s 3.1435 KOps/s $\color{#35bf28}+0.53\%$
test_unlock_nested 0.7355ms 0.3540ms 2.8251 KOps/s 2.4072 KOps/s $\textbf{\color{#35bf28}+17.36\%}$
test_unlock_stack_nested 0.3964ms 0.3255ms 3.0719 KOps/s 3.0804 KOps/s $\color{#d91a1a}-0.28\%$
test_flatten_speed 0.1805ms 0.1002ms 9.9781 KOps/s 10.2295 KOps/s $\color{#d91a1a}-2.46\%$
test_unflatten_speed 0.5820ms 0.4149ms 2.4103 KOps/s 2.3663 KOps/s $\color{#35bf28}+1.86\%$
test_common_ops 4.5301ms 0.7228ms 1.3836 KOps/s 1.3395 KOps/s $\color{#35bf28}+3.29\%$
test_creation 19.1150μs 1.9481μs 513.3234 KOps/s 508.7629 KOps/s $\color{#35bf28}+0.90\%$
test_creation_empty 23.2830μs 10.9334μs 91.4631 KOps/s 86.2039 KOps/s $\textbf{\color{#35bf28}+6.10\%}$
test_creation_nested_1 78.8170μs 13.7863μs 72.5355 KOps/s 67.6625 KOps/s $\textbf{\color{#35bf28}+7.20\%}$
test_creation_nested_2 41.0870μs 17.1754μs 58.2227 KOps/s 55.3384 KOps/s $\textbf{\color{#35bf28}+5.21\%}$
test_clone 66.9650μs 13.9756μs 71.5532 KOps/s 71.6028 KOps/s $\color{#d91a1a}-0.07\%$
test_getitem[int] 39.4040μs 11.8298μs 84.5321 KOps/s 86.2776 KOps/s $\color{#d91a1a}-2.02\%$
test_getitem[slice_int] 55.7640μs 23.1956μs 43.1116 KOps/s 42.4458 KOps/s $\color{#35bf28}+1.57\%$
test_getitem[range] 86.8010μs 61.1618μs 16.3501 KOps/s 16.6228 KOps/s $\color{#d91a1a}-1.64\%$
test_getitem[tuple] 58.5590μs 19.5286μs 51.2070 KOps/s 52.0666 KOps/s $\color{#d91a1a}-1.65\%$
test_getitem[list] 93.5440μs 42.0249μs 23.7954 KOps/s 24.1733 KOps/s $\color{#d91a1a}-1.56\%$
test_setitem_dim[int] 67.4460μs 35.6789μs 28.0278 KOps/s 27.4384 KOps/s $\color{#35bf28}+2.15\%$
test_setitem_dim[slice_int] 0.1123ms 62.3747μs 16.0321 KOps/s 15.6062 KOps/s $\color{#35bf28}+2.73\%$
test_setitem_dim[range] 0.1303ms 84.5542μs 11.8267 KOps/s 11.7214 KOps/s $\color{#35bf28}+0.90\%$
test_setitem_dim[tuple] 76.0420μs 51.7563μs 19.3213 KOps/s 19.1554 KOps/s $\color{#35bf28}+0.87\%$
test_setitem 60.1620μs 21.3618μs 46.8125 KOps/s 47.5460 KOps/s $\color{#d91a1a}-1.54\%$
test_set 57.3770μs 20.7092μs 48.2878 KOps/s 48.1711 KOps/s $\color{#35bf28}+0.24\%$
test_set_shared 1.5910ms 0.1400ms 7.1411 KOps/s 7.0319 KOps/s $\color{#35bf28}+1.55\%$
test_update 0.1251ms 22.8014μs 43.8570 KOps/s 43.0510 KOps/s $\color{#35bf28}+1.87\%$
test_update_nested 71.8540μs 31.4308μs 31.8159 KOps/s 31.3856 KOps/s $\color{#35bf28}+1.37\%$
test_update__nested 66.8250μs 26.4980μs 37.7387 KOps/s 38.5816 KOps/s $\color{#d91a1a}-2.18\%$
test_set_nested 64.9810μs 22.6472μs 44.1556 KOps/s 44.2922 KOps/s $\color{#d91a1a}-0.31\%$
test_set_nested_new 78.4270μs 27.6934μs 36.1096 KOps/s 37.4120 KOps/s $\color{#d91a1a}-3.48\%$
test_select 0.1033ms 42.5794μs 23.4855 KOps/s 23.5324 KOps/s $\color{#d91a1a}-0.20\%$
test_select_nested 0.1299ms 61.6022μs 16.2332 KOps/s 16.4699 KOps/s $\color{#d91a1a}-1.44\%$
test_exclude_nested 0.2603ms 0.1246ms 8.0282 KOps/s 8.0862 KOps/s $\color{#d91a1a}-0.72\%$
test_empty[True] 0.7271ms 0.4005ms 2.4969 KOps/s 2.4522 KOps/s $\color{#35bf28}+1.82\%$
test_empty[False] 7.7445μs 1.2132μs 824.2481 KOps/s 854.1842 KOps/s $\color{#d91a1a}-3.50\%$
test_unbind_speed 0.3339ms 0.2667ms 3.7497 KOps/s 3.7327 KOps/s $\color{#35bf28}+0.46\%$
test_unbind_speed_stack0 4.3208ms 0.2649ms 3.7755 KOps/s 3.8542 KOps/s $\color{#d91a1a}-2.04\%$
test_unbind_speed_stack1 67.3243ms 0.7400ms 1.3514 KOps/s 1.2967 KOps/s $\color{#35bf28}+4.22\%$
test_split 66.9079ms 1.6449ms 607.9273 Ops/s 612.7234 Ops/s $\color{#d91a1a}-0.78\%$
test_chunk 67.8922ms 1.6336ms 612.1582 Ops/s 616.2573 Ops/s $\color{#d91a1a}-0.67\%$
test_creation[device0] 0.1636ms 83.4964μs 11.9766 KOps/s 11.6825 KOps/s $\color{#35bf28}+2.52\%$
test_creation_from_tensor 3.2344ms 85.4653μs 11.7007 KOps/s 11.5131 KOps/s $\color{#35bf28}+1.63\%$
test_add_one[memmap_tensor0] 70.5220μs 5.3572μs 186.6630 KOps/s 182.1874 KOps/s $\color{#35bf28}+2.46\%$
test_contiguous[memmap_tensor0] 10.0380μs 0.6372μs 1.5694 MOps/s 1.5277 MOps/s $\color{#35bf28}+2.73\%$
test_stack[memmap_tensor0] 17.2720μs 3.6345μs 275.1396 KOps/s 279.1328 KOps/s $\color{#d91a1a}-1.43\%$
test_memmaptd_index 0.9440ms 0.2565ms 3.8989 KOps/s 3.8838 KOps/s $\color{#35bf28}+0.39\%$
test_memmaptd_index_astensor 0.7734ms 0.3302ms 3.0287 KOps/s 2.9856 KOps/s $\color{#35bf28}+1.44\%$
test_memmaptd_index_op 0.8723ms 0.6221ms 1.6073 KOps/s 1.5496 KOps/s $\color{#35bf28}+3.72\%$
test_serialize_model 0.1861s 0.1166s 8.5750 Ops/s 8.3742 Ops/s $\color{#35bf28}+2.40\%$
test_serialize_model_pickle 0.4694s 0.3768s 2.6543 Ops/s 2.6380 Ops/s $\color{#35bf28}+0.62\%$
test_serialize_weights 0.1072s 0.1015s 9.8562 Ops/s 8.6474 Ops/s $\textbf{\color{#35bf28}+13.98\%}$
test_serialize_weights_returnearly 0.1979s 0.1386s 7.2150 Ops/s 7.7724 Ops/s $\textbf{\color{#d91a1a}-7.17\%}$
test_serialize_weights_pickle 0.7530s 0.5111s 1.9567 Ops/s 1.5691 Ops/s $\textbf{\color{#35bf28}+24.70\%}$
test_serialize_weights_filesystem 98.1820ms 92.5309ms 10.8072 Ops/s 10.7730 Ops/s $\color{#35bf28}+0.32\%$
test_serialize_model_filesystem 0.1581s 0.1019s 9.8123 Ops/s 9.8609 Ops/s $\color{#d91a1a}-0.49\%$
test_reshape_pytree 76.3630μs 25.9817μs 38.4886 KOps/s 37.7756 KOps/s $\color{#35bf28}+1.89\%$
test_reshape_td 89.6270μs 35.7596μs 27.9645 KOps/s 29.4512 KOps/s $\textbf{\color{#d91a1a}-5.05\%}$
test_view_pytree 72.6150μs 25.9395μs 38.5513 KOps/s 37.7384 KOps/s $\color{#35bf28}+2.15\%$
test_view_td 82.9240μs 40.2937μs 24.8178 KOps/s 26.1042 KOps/s $\color{#d91a1a}-4.93\%$
test_unbind_pytree 77.3140μs 29.4605μs 33.9438 KOps/s 33.5687 KOps/s $\color{#35bf28}+1.12\%$
test_unbind_td 0.3982ms 39.1518μs 25.5416 KOps/s 25.9114 KOps/s $\color{#d91a1a}-1.43\%$
test_split_pytree 94.9070μs 30.1545μs 33.1625 KOps/s 33.1057 KOps/s $\color{#35bf28}+0.17\%$
test_split_td 0.1312ms 41.8029μs 23.9218 KOps/s 24.3492 KOps/s $\color{#d91a1a}-1.76\%$
test_add_pytree 0.1316ms 35.6944μs 28.0156 KOps/s 27.6750 KOps/s $\color{#35bf28}+1.23\%$
test_add_td 0.1298ms 56.3849μs 17.7353 KOps/s 17.7132 KOps/s $\color{#35bf28}+0.12\%$
test_distributed 0.2193ms 0.1043ms 9.5909 KOps/s 9.7540 KOps/s $\color{#d91a1a}-1.67\%$
test_tdmodule 50.4140μs 17.4178μs 57.4125 KOps/s 53.0395 KOps/s $\textbf{\color{#35bf28}+8.24\%}$
test_tdmodule_dispatch 56.5860μs 35.1021μs 28.4884 KOps/s 26.6778 KOps/s $\textbf{\color{#35bf28}+6.79\%}$
test_tdseq 38.2910μs 20.7489μs 48.1954 KOps/s 45.7738 KOps/s $\textbf{\color{#35bf28}+5.29\%}$
test_tdseq_dispatch 79.4080μs 40.8601μs 24.4737 KOps/s 23.1413 KOps/s $\textbf{\color{#35bf28}+5.76\%}$
test_instantiation_functorch 1.6106ms 1.3301ms 751.8091 Ops/s 745.5155 Ops/s $\color{#35bf28}+0.84\%$
test_instantiation_td 1.4828ms 1.0445ms 957.3548 Ops/s 968.3705 Ops/s $\color{#d91a1a}-1.14\%$
test_exec_functorch 0.3546ms 0.1611ms 6.2060 KOps/s 6.1904 KOps/s $\color{#35bf28}+0.25\%$
test_exec_functional_call 0.2322ms 0.1493ms 6.6985 KOps/s 6.6469 KOps/s $\color{#35bf28}+0.78\%$
test_exec_td 0.2310ms 0.1474ms 6.7863 KOps/s 6.9352 KOps/s $\color{#d91a1a}-2.15\%$
test_exec_td_decorator 0.3037ms 0.2201ms 4.5441 KOps/s 4.4947 KOps/s $\color{#35bf28}+1.10\%$
test_vmap_mlp_speed[True-True] 0.7693ms 0.4927ms 2.0297 KOps/s 2.0176 KOps/s $\color{#35bf28}+0.60\%$
test_vmap_mlp_speed[True-False] 0.7294ms 0.4896ms 2.0423 KOps/s 2.0268 KOps/s $\color{#35bf28}+0.77\%$
test_vmap_mlp_speed[False-True] 0.8265ms 0.3989ms 2.5066 KOps/s 2.5226 KOps/s $\color{#d91a1a}-0.64\%$
test_vmap_mlp_speed[False-False] 0.6342ms 0.3976ms 2.5151 KOps/s 2.5193 KOps/s $\color{#d91a1a}-0.17\%$
test_vmap_mlp_speed_decorator[True-True] 0.6645ms 0.5596ms 1.7870 KOps/s 1.7759 KOps/s $\color{#35bf28}+0.62\%$
test_vmap_mlp_speed_decorator[True-False] 0.8672ms 0.5622ms 1.7786 KOps/s 1.7735 KOps/s $\color{#35bf28}+0.29\%$
test_vmap_mlp_speed_decorator[False-True] 0.6817ms 0.4602ms 2.1730 KOps/s 2.1513 KOps/s $\color{#35bf28}+1.01\%$
test_vmap_mlp_speed_decorator[False-False] 0.5785ms 0.4594ms 2.1769 KOps/s 2.1664 KOps/s $\color{#35bf28}+0.49\%$
test_to_module_speed[True] 1.7909ms 1.7029ms 587.2507 Ops/s 576.9294 Ops/s $\color{#35bf28}+1.79\%$
test_to_module_speed[False] 1.7391ms 1.6659ms 600.2683 Ops/s 585.6720 Ops/s $\color{#35bf28}+2.49\%$
test_tc_init 56.0450μs 28.7631μs 34.7667 KOps/s 32.0511 KOps/s $\textbf{\color{#35bf28}+8.47\%}$
test_tc_init_nested 96.7510μs 61.6728μs 16.2146 KOps/s 15.7168 KOps/s $\color{#35bf28}+3.17\%$
test_tc_first_layer_tensor 1.6776μs 0.6832μs 1.4638 MOps/s 1.4257 MOps/s $\color{#35bf28}+2.67\%$
test_tc_first_layer_nontensor 1.9636μs 0.6771μs 1.4768 MOps/s 1.4425 MOps/s $\color{#35bf28}+2.38\%$
test_tc_second_layer_tensor 14.3670μs 1.8437μs 542.3916 KOps/s 538.7432 KOps/s $\color{#35bf28}+0.68\%$
test_tc_second_layer_nontensor 8.1953μs 1.5342μs 651.8146 KOps/s 674.5839 KOps/s $\color{#d91a1a}-3.38\%$
test_unbind 79.1462ms 6.8547ms 145.8848 Ops/s 154.3698 Ops/s $\textbf{\color{#d91a1a}-5.50\%}$
test_full_like 18.0895ms 10.9475ms 91.3452 Ops/s 83.6799 Ops/s $\textbf{\color{#35bf28}+9.16\%}$
test_zeros_like 10.3735ms 5.9990ms 166.6958 Ops/s 158.0867 Ops/s $\textbf{\color{#35bf28}+5.45\%}$
test_ones_like 14.6328ms 6.6459ms 150.4690 Ops/s 158.2326 Ops/s $\color{#d91a1a}-4.91\%$
test_clone 14.8330ms 8.3505ms 119.7538 Ops/s 122.2998 Ops/s $\color{#d91a1a}-2.08\%$
test_squeeze 79.6690μs 14.6946μs 68.0520 KOps/s 64.5233 KOps/s $\textbf{\color{#35bf28}+5.47\%}$
test_unsqueeze 73.3677ms 88.7009μs 11.2738 KOps/s 14.4112 KOps/s $\textbf{\color{#d91a1a}-21.77\%}$
test_split 0.2185ms 0.1141ms 8.7648 KOps/s 8.6529 KOps/s $\color{#35bf28}+1.29\%$
test_permute 0.2458ms 0.1407ms 7.1091 KOps/s 7.2022 KOps/s $\color{#d91a1a}-1.29\%$
test_stack 24.9314ms 24.2079ms 41.3088 Ops/s 41.8391 Ops/s $\color{#d91a1a}-1.27\%$
test_cat 28.4918ms 24.5079ms 40.8031 Ops/s 41.7958 Ops/s $\color{#d91a1a}-2.38\%$

@vmoens vmoens added the bug Something isn't working label May 30, 2024
@vmoens vmoens merged commit 2454623 into main May 30, 2024
33 of 37 checks passed
@vmoens vmoens deleted the fix-autobatchsize-h5 branch May 30, 2024 09:23
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.5087ms 12.9486μs 77.2287 KOps/s 79.0829 KOps/s $\color{#d91a1a}-2.34\%$
test_plain_set_stack_nested 40.9510μs 13.0513μs 76.6207 KOps/s 78.7070 KOps/s $\color{#d91a1a}-2.65\%$
test_plain_set_nested_inplace 49.0510μs 14.0674μs 71.0865 KOps/s 71.9462 KOps/s $\color{#d91a1a}-1.19\%$
test_plain_set_stack_nested_inplace 56.4920μs 14.2086μs 70.3797 KOps/s 71.3893 KOps/s $\color{#d91a1a}-1.41\%$
test_items 26.2000μs 4.6318μs 215.9006 KOps/s 210.2393 KOps/s $\color{#35bf28}+2.69\%$
test_items_nested 0.3958ms 0.3433ms 2.9128 KOps/s 2.9061 KOps/s $\color{#35bf28}+0.23\%$
test_items_nested_locked 0.4223ms 0.3525ms 2.8366 KOps/s 2.8576 KOps/s $\color{#d91a1a}-0.73\%$
test_items_nested_leaf 0.1041ms 83.1412μs 12.0277 KOps/s 12.0233 KOps/s $\color{#35bf28}+0.04\%$
test_items_stack_nested 0.4123ms 0.3526ms 2.8361 KOps/s 2.8582 KOps/s $\color{#d91a1a}-0.77\%$
test_items_stack_nested_leaf 0.1242ms 85.5698μs 11.6864 KOps/s 11.9961 KOps/s $\color{#d91a1a}-2.58\%$
test_items_stack_nested_locked 0.4142ms 0.3576ms 2.7962 KOps/s 2.8536 KOps/s $\color{#d91a1a}-2.01\%$
test_keys 18.8000μs 4.4064μs 226.9438 KOps/s 229.6933 KOps/s $\color{#d91a1a}-1.20\%$
test_keys_nested 97.2220μs 67.2066μs 14.8795 KOps/s 14.9336 KOps/s $\color{#d91a1a}-0.36\%$
test_keys_nested_locked 0.7832ms 72.2174μs 13.8471 KOps/s 13.7619 KOps/s $\color{#35bf28}+0.62\%$
test_keys_nested_leaf 91.3320μs 57.4265μs 17.4136 KOps/s 17.3117 KOps/s $\color{#35bf28}+0.59\%$
test_keys_stack_nested 0.1025ms 67.6897μs 14.7733 KOps/s 15.0243 KOps/s $\color{#d91a1a}-1.67\%$
test_keys_stack_nested_leaf 90.8620μs 58.0075μs 17.2391 KOps/s 17.2595 KOps/s $\color{#d91a1a}-0.12\%$
test_keys_stack_nested_locked 97.6810μs 72.6458μs 13.7654 KOps/s 14.1516 KOps/s $\color{#d91a1a}-2.73\%$
test_values 8.5300μs 1.8157μs 550.7565 KOps/s 554.4631 KOps/s $\color{#d91a1a}-0.67\%$
test_values_nested 62.3610μs 34.9287μs 28.6298 KOps/s 28.6734 KOps/s $\color{#d91a1a}-0.15\%$
test_values_nested_locked 60.1310μs 37.1891μs 26.8896 KOps/s 26.9573 KOps/s $\color{#d91a1a}-0.25\%$
test_values_nested_leaf 51.3410μs 31.0512μs 32.2048 KOps/s 32.1957 KOps/s $\color{#35bf28}+0.03\%$
test_values_stack_nested 60.2210μs 35.5422μs 28.1356 KOps/s 27.7653 KOps/s $\color{#35bf28}+1.33\%$
test_values_stack_nested_leaf 60.9910μs 31.6568μs 31.5888 KOps/s 31.3352 KOps/s $\color{#35bf28}+0.81\%$
test_values_stack_nested_locked 58.0010μs 37.6011μs 26.5950 KOps/s 26.4220 KOps/s $\color{#35bf28}+0.65\%$
test_membership 2.0050μs 0.7172μs 1.3942 MOps/s 1.4006 MOps/s $\color{#d91a1a}-0.45\%$
test_membership_nested 29.8700μs 2.5828μs 387.1773 KOps/s 392.3771 KOps/s $\color{#d91a1a}-1.33\%$
test_membership_nested_leaf 20.6210μs 2.6126μs 382.7536 KOps/s 386.7910 KOps/s $\color{#d91a1a}-1.04\%$
test_membership_stacked_nested 14.4210μs 2.5997μs 384.6649 KOps/s 389.4211 KOps/s $\color{#d91a1a}-1.22\%$
test_membership_stacked_nested_leaf 33.6610μs 2.6368μs 379.2514 KOps/s 386.8834 KOps/s $\color{#d91a1a}-1.97\%$
test_membership_nested_last 21.6890μs 3.1101μs 321.5331 KOps/s 325.3120 KOps/s $\color{#d91a1a}-1.16\%$
test_membership_nested_leaf_last 34.3100μs 3.1265μs 319.8467 KOps/s 324.5298 KOps/s $\color{#d91a1a}-1.44\%$
test_membership_stacked_nested_last 23.4010μs 3.5981μs 277.9229 KOps/s 160.0610 KOps/s $\textbf{\color{#35bf28}+73.64\%}$
test_membership_stacked_nested_leaf_last 34.5200μs 3.5715μs 279.9907 KOps/s 159.4279 KOps/s $\textbf{\color{#35bf28}+75.62\%}$
test_nested_getleaf 28.1800μs 8.3800μs 119.3320 KOps/s 119.2979 KOps/s $\color{#35bf28}+0.03\%$
test_nested_get 35.4910μs 7.9052μs 126.4997 KOps/s 127.1482 KOps/s $\color{#d91a1a}-0.51\%$
test_stacked_getleaf 36.3510μs 8.4285μs 118.6455 KOps/s 119.1471 KOps/s $\color{#d91a1a}-0.42\%$
test_stacked_get 25.8310μs 7.9241μs 126.1976 KOps/s 126.4059 KOps/s $\color{#d91a1a}-0.16\%$
test_nested_getitemleaf 33.5100μs 8.5570μs 116.8635 KOps/s 117.0851 KOps/s $\color{#d91a1a}-0.19\%$
test_nested_getitem 30.0010μs 8.0682μs 123.9434 KOps/s 124.0931 KOps/s $\color{#d91a1a}-0.12\%$
test_stacked_getitemleaf 32.1210μs 8.6246μs 115.9478 KOps/s 116.1020 KOps/s $\color{#d91a1a}-0.13\%$
test_stacked_getitem 37.5210μs 8.0493μs 124.2340 KOps/s 123.5406 KOps/s $\color{#35bf28}+0.56\%$
test_lock_nested 58.6341ms 0.4187ms 2.3885 KOps/s 2.3522 KOps/s $\color{#35bf28}+1.54\%$
test_lock_stack_nested 0.3490ms 0.3146ms 3.1785 KOps/s 3.1814 KOps/s $\color{#d91a1a}-0.09\%$
test_unlock_nested 0.7349ms 0.3596ms 2.7812 KOps/s 2.7429 KOps/s $\color{#35bf28}+1.40\%$
test_unlock_stack_nested 0.3501ms 0.3212ms 3.1133 KOps/s 3.1142 KOps/s $\color{#d91a1a}-0.03\%$
test_flatten_speed 0.1853ms 0.1021ms 9.7941 KOps/s 9.8931 KOps/s $\color{#d91a1a}-1.00\%$
test_unflatten_speed 0.3282ms 0.2919ms 3.4261 KOps/s 3.4503 KOps/s $\color{#d91a1a}-0.70\%$
test_common_ops 1.1758ms 0.5997ms 1.6674 KOps/s 1.7091 KOps/s $\color{#d91a1a}-2.44\%$
test_creation 33.5500μs 1.6718μs 598.1561 KOps/s 604.8181 KOps/s $\color{#d91a1a}-1.10\%$
test_creation_empty 39.8010μs 8.7883μs 113.7880 KOps/s 122.0020 KOps/s $\textbf{\color{#d91a1a}-6.73\%}$
test_creation_nested_1 31.0510μs 10.4932μs 95.2994 KOps/s 98.6446 KOps/s $\color{#d91a1a}-3.39\%$
test_creation_nested_2 30.9410μs 12.5667μs 79.5752 KOps/s 80.7912 KOps/s $\color{#d91a1a}-1.51\%$
test_clone 85.5100μs 12.1442μs 82.3441 KOps/s 78.7927 KOps/s $\color{#35bf28}+4.51\%$
test_getitem[int] 27.4810μs 11.4487μs 87.3460 KOps/s 86.7855 KOps/s $\color{#35bf28}+0.65\%$
test_getitem[slice_int] 40.9200μs 21.7532μs 45.9703 KOps/s 45.8073 KOps/s $\color{#35bf28}+0.36\%$
test_getitem[range] 68.1010μs 51.1966μs 19.5325 KOps/s 20.0541 KOps/s $\color{#d91a1a}-2.60\%$
test_getitem[tuple] 71.9910μs 19.4603μs 51.3867 KOps/s 50.9324 KOps/s $\color{#35bf28}+0.89\%$
test_getitem[list] 0.1308ms 35.7500μs 27.9720 KOps/s 27.1341 KOps/s $\color{#35bf28}+3.09\%$
test_setitem_dim[int] 47.5010μs 30.7123μs 32.5602 KOps/s 30.8863 KOps/s $\textbf{\color{#35bf28}+5.42\%}$
test_setitem_dim[slice_int] 74.0810μs 51.0646μs 19.5831 KOps/s 18.4935 KOps/s $\textbf{\color{#35bf28}+5.89\%}$
test_setitem_dim[range] 87.5810μs 68.5389μs 14.5903 KOps/s 13.7388 KOps/s $\textbf{\color{#35bf28}+6.20\%}$
test_setitem_dim[tuple] 87.3020μs 44.9667μs 22.2387 KOps/s 21.0890 KOps/s $\textbf{\color{#35bf28}+5.45\%}$
test_setitem 46.1000μs 17.5346μs 57.0302 KOps/s 53.4372 KOps/s $\textbf{\color{#35bf28}+6.72\%}$
test_set 49.7100μs 16.9124μs 59.1281 KOps/s 54.5767 KOps/s $\textbf{\color{#35bf28}+8.34\%}$
test_set_shared 1.2967ms 0.1003ms 9.9741 KOps/s 10.0812 KOps/s $\color{#d91a1a}-1.06\%$
test_update 87.1220μs 18.9685μs 52.7190 KOps/s 50.5892 KOps/s $\color{#35bf28}+4.21\%$
test_update_nested 63.1010μs 24.0689μs 41.5474 KOps/s 40.1110 KOps/s $\color{#35bf28}+3.58\%$
test_update__nested 57.1210μs 23.3213μs 42.8792 KOps/s 38.4539 KOps/s $\textbf{\color{#35bf28}+11.51\%}$
test_set_nested 77.6210μs 18.1812μs 55.0020 KOps/s 52.3960 KOps/s $\color{#35bf28}+4.97\%$
test_set_nested_new 53.0710μs 20.8883μs 47.8737 KOps/s 46.3749 KOps/s $\color{#35bf28}+3.23\%$
test_select 67.9610μs 33.6312μs 29.7343 KOps/s 28.9451 KOps/s $\color{#35bf28}+2.73\%$
test_select_nested 0.5384ms 54.6517μs 18.2977 KOps/s 18.5457 KOps/s $\color{#d91a1a}-1.34\%$
test_exclude_nested 0.1532ms 0.1108ms 9.0272 KOps/s 8.8895 KOps/s $\color{#35bf28}+1.55\%$
test_empty[True] 0.4183ms 0.3528ms 2.8343 KOps/s 2.8504 KOps/s $\color{#d91a1a}-0.57\%$
test_empty[False] 3.1251μs 0.9191μs 1.0881 MOps/s 1.0897 MOps/s $\color{#d91a1a}-0.15\%$
test_to 0.1043ms 76.9391μs 12.9973 KOps/s 12.8627 KOps/s $\color{#35bf28}+1.05\%$
test_to_nonblocking 0.1008ms 62.3691μs 16.0336 KOps/s 15.5832 KOps/s $\color{#35bf28}+2.89\%$
test_unbind_speed 0.3202ms 0.2773ms 3.6067 KOps/s 3.5513 KOps/s $\color{#35bf28}+1.56\%$
test_unbind_speed_stack0 0.3139ms 0.2799ms 3.5730 KOps/s 3.5934 KOps/s $\color{#d91a1a}-0.57\%$
test_unbind_speed_stack1 75.4955ms 0.8324ms 1.2013 KOps/s 1.2255 KOps/s $\color{#d91a1a}-1.98\%$
test_split 75.7234ms 1.7017ms 587.6613 Ops/s 579.1127 Ops/s $\color{#35bf28}+1.48\%$
test_chunk 75.5595ms 1.7068ms 585.8750 Ops/s 576.6418 Ops/s $\color{#35bf28}+1.60\%$
test_creation[device0] 0.1333ms 62.1641μs 16.0864 KOps/s 15.9254 KOps/s $\color{#35bf28}+1.01\%$
test_creation_from_tensor 0.1315ms 58.7775μs 17.0133 KOps/s 16.4219 KOps/s $\color{#35bf28}+3.60\%$
test_add_one[memmap_tensor0] 68.9510μs 7.4886μs 133.5358 KOps/s 131.3234 KOps/s $\color{#35bf28}+1.68\%$
test_contiguous[memmap_tensor0] 25.0810μs 0.6771μs 1.4768 MOps/s 1.4707 MOps/s $\color{#35bf28}+0.41\%$
test_stack[memmap_tensor0] 36.3110μs 4.9785μs 200.8648 KOps/s 200.5286 KOps/s $\color{#35bf28}+0.17\%$
test_memmaptd_index 1.1360ms 0.2984ms 3.3518 KOps/s 3.1444 KOps/s $\textbf{\color{#35bf28}+6.60\%}$
test_memmaptd_index_astensor 0.7067ms 0.3696ms 2.7059 KOps/s 2.6634 KOps/s $\color{#35bf28}+1.60\%$
test_memmaptd_index_op 1.1790ms 0.6933ms 1.4423 KOps/s 1.4307 KOps/s $\color{#35bf28}+0.81\%$
test_serialize_model 0.1827s 0.1114s 8.9793 Ops/s 8.5946 Ops/s $\color{#35bf28}+4.48\%$
test_serialize_model_pickle 1.3499s 1.2355s 0.8094 Ops/s 0.8084 Ops/s $\color{#35bf28}+0.11\%$
test_serialize_weights 0.1808s 0.1095s 9.1331 Ops/s 8.7115 Ops/s $\color{#35bf28}+4.84\%$
test_serialize_weights_returnearly 0.2485s 0.1007s 9.9276 Ops/s 10.2564 Ops/s $\color{#d91a1a}-3.21\%$
test_serialize_weights_pickle 1.3743s 1.2542s 0.7973 Ops/s 0.7983 Ops/s $\color{#d91a1a}-0.12\%$
test_reshape_pytree 55.9800μs 26.7315μs 37.4090 KOps/s 37.6085 KOps/s $\color{#d91a1a}-0.53\%$
test_reshape_td 60.5710μs 32.5192μs 30.7511 KOps/s 30.8583 KOps/s $\color{#d91a1a}-0.35\%$
test_view_pytree 0.2608ms 26.5862μs 37.6135 KOps/s 38.0519 KOps/s $\color{#d91a1a}-1.15\%$
test_view_td 60.6910μs 37.4968μs 26.6690 KOps/s 27.1947 KOps/s $\color{#d91a1a}-1.93\%$
test_unbind_pytree 0.2282ms 33.1357μs 30.1789 KOps/s 30.0098 KOps/s $\color{#35bf28}+0.56\%$
test_unbind_td 0.4250ms 43.5101μs 22.9831 KOps/s 21.8757 KOps/s $\textbf{\color{#35bf28}+5.06\%}$
test_split_pytree 70.5020μs 36.9178μs 27.0872 KOps/s 27.7575 KOps/s $\color{#d91a1a}-2.41\%$
test_split_td 0.4965ms 42.1038μs 23.7508 KOps/s 23.1926 KOps/s $\color{#35bf28}+2.41\%$
test_add_pytree 0.2393ms 40.0350μs 24.9781 KOps/s 24.9722 KOps/s $\color{#35bf28}+0.02\%$
test_add_td 83.8910μs 52.6096μs 19.0079 KOps/s 18.4559 KOps/s $\color{#35bf28}+2.99\%$
test_distributed 1.6277ms 68.7884μs 14.5373 KOps/s 11.4025 KOps/s $\textbf{\color{#35bf28}+27.49\%}$
test_tdmodule 30.2110μs 14.5790μs 68.5916 KOps/s 67.7339 KOps/s $\color{#35bf28}+1.27\%$
test_tdmodule_dispatch 44.4420μs 28.6593μs 34.8927 KOps/s 34.1641 KOps/s $\color{#35bf28}+2.13\%$
test_tdseq 31.5410μs 16.8016μs 59.5182 KOps/s 58.3701 KOps/s $\color{#35bf28}+1.97\%$
test_tdseq_dispatch 54.2800μs 32.3959μs 30.8681 KOps/s 30.6300 KOps/s $\color{#35bf28}+0.78\%$
test_instantiation_functorch 1.7459ms 1.5641ms 639.3429 Ops/s 635.7864 Ops/s $\color{#35bf28}+0.56\%$
test_instantiation_td 1.5808ms 1.0757ms 929.5855 Ops/s 927.7306 Ops/s $\color{#35bf28}+0.20\%$
test_exec_functorch 0.1996ms 0.1554ms 6.4353 KOps/s 6.4280 KOps/s $\color{#35bf28}+0.11\%$
test_exec_functional_call 0.2140ms 0.1443ms 6.9299 KOps/s 6.8789 KOps/s $\color{#35bf28}+0.74\%$
test_exec_td 0.1983ms 0.1460ms 6.8515 KOps/s 6.9176 KOps/s $\color{#d91a1a}-0.96\%$
test_exec_td_decorator 0.7028ms 0.2178ms 4.5914 KOps/s 4.6914 KOps/s $\color{#d91a1a}-2.13\%$
test_vmap_mlp_speed[True-True] 0.6956ms 0.6097ms 1.6403 KOps/s 1.6500 KOps/s $\color{#d91a1a}-0.59\%$
test_vmap_mlp_speed[True-False] 0.6787ms 0.6079ms 1.6451 KOps/s 1.6544 KOps/s $\color{#d91a1a}-0.56\%$
test_vmap_mlp_speed[False-True] 0.6213ms 0.5615ms 1.7811 KOps/s 1.8373 KOps/s $\color{#d91a1a}-3.06\%$
test_vmap_mlp_speed[False-False] 0.6296ms 0.5642ms 1.7725 KOps/s 1.8499 KOps/s $\color{#d91a1a}-4.18\%$
test_vmap_mlp_speed_decorator[True-True] 1.5088ms 0.6746ms 1.4823 KOps/s 1.5043 KOps/s $\color{#d91a1a}-1.46\%$
test_vmap_mlp_speed_decorator[True-False] 0.7818ms 0.6668ms 1.4998 KOps/s 1.5020 KOps/s $\color{#d91a1a}-0.15\%$
test_vmap_mlp_speed_decorator[False-True] 0.7152ms 0.5929ms 1.6866 KOps/s 1.6811 KOps/s $\color{#35bf28}+0.33\%$
test_vmap_mlp_speed_decorator[False-False] 0.7031ms 0.6110ms 1.6366 KOps/s 1.6665 KOps/s $\color{#d91a1a}-1.79\%$
test_vmap_transformer_speed[True-True] 8.2149ms 8.1024ms 123.4197 Ops/s 122.9363 Ops/s $\color{#35bf28}+0.39\%$
test_vmap_transformer_speed[True-False] 8.1591ms 8.0841ms 123.7000 Ops/s 122.2583 Ops/s $\color{#35bf28}+1.18\%$
test_vmap_transformer_speed[False-True] 8.1480ms 8.0387ms 124.3977 Ops/s 124.1601 Ops/s $\color{#35bf28}+0.19\%$
test_vmap_transformer_speed[False-False] 8.3889ms 8.0407ms 124.3666 Ops/s 124.5204 Ops/s $\color{#d91a1a}-0.12\%$
test_vmap_transformer_speed_decorator[True-True] 20.5266ms 19.7107ms 50.7339 Ops/s 50.7390 Ops/s $\color{#d91a1a}-0.01\%$
test_vmap_transformer_speed_decorator[True-False] 19.7376ms 19.6267ms 50.9511 Ops/s 50.9657 Ops/s $\color{#d91a1a}-0.03\%$
test_vmap_transformer_speed_decorator[False-True] 20.2623ms 19.5328ms 51.1959 Ops/s 51.1612 Ops/s $\color{#35bf28}+0.07\%$
test_vmap_transformer_speed_decorator[False-False] 20.3291ms 19.6007ms 51.0185 Ops/s 51.3245 Ops/s $\color{#d91a1a}-0.60\%$
test_to_module_speed[True] 1.6399ms 1.5141ms 660.4553 Ops/s 645.9906 Ops/s $\color{#35bf28}+2.24\%$
test_to_module_speed[False] 1.6464ms 1.4934ms 669.6290 Ops/s 663.9585 Ops/s $\color{#35bf28}+0.85\%$
test_tc_init 50.6310μs 24.2625μs 41.2159 KOps/s 41.9176 KOps/s $\color{#d91a1a}-1.67\%$
test_tc_init_nested 91.6320μs 51.9301μs 19.2566 KOps/s 20.3631 KOps/s $\textbf{\color{#d91a1a}-5.43\%}$
test_tc_first_layer_tensor 0.8145μs 0.3561μs 2.8084 MOps/s 2.8093 MOps/s $\color{#d91a1a}-0.03\%$
test_tc_first_layer_nontensor 1.5985μs 0.3859μs 2.5912 MOps/s 2.5761 MOps/s $\color{#35bf28}+0.59\%$
test_tc_second_layer_tensor 14.9500μs 1.0739μs 931.2265 KOps/s 939.2171 KOps/s $\color{#d91a1a}-0.85\%$
test_tc_second_layer_nontensor 6.1942μs 0.8260μs 1.2106 MOps/s 1.2322 MOps/s $\color{#d91a1a}-1.76\%$
test_unbind 0.1017s 8.1962ms 122.0078 Ops/s 123.7158 Ops/s $\color{#d91a1a}-1.38\%$
test_full_like 13.7560ms 13.2656ms 75.3829 Ops/s 87.6672 Ops/s $\textbf{\color{#d91a1a}-14.01\%}$
test_zeros_like 96.5304ms 8.2778ms 120.8056 Ops/s 142.1129 Ops/s $\textbf{\color{#d91a1a}-14.99\%}$
test_ones_like 7.9737ms 7.8280ms 127.7462 Ops/s 126.3143 Ops/s $\color{#35bf28}+1.13\%$
test_clone 9.7805ms 9.5722ms 104.4687 Ops/s 105.1390 Ops/s $\color{#d91a1a}-0.64\%$
test_squeeze 63.6610μs 10.9105μs 91.6546 KOps/s 83.6619 KOps/s $\textbf{\color{#35bf28}+9.55\%}$
test_unsqueeze 0.1143ms 61.5723μs 16.2411 KOps/s 15.8346 KOps/s $\color{#35bf28}+2.57\%$
test_split 0.1586ms 0.1010ms 9.9031 KOps/s 9.7364 KOps/s $\color{#35bf28}+1.71\%$
test_permute 0.2050ms 0.1278ms 7.8234 KOps/s 7.9535 KOps/s $\color{#d91a1a}-1.64\%$
test_stack 27.4730ms 27.2812ms 36.6553 Ops/s 36.1249 Ops/s $\color{#35bf28}+1.47\%$
test_cat 27.6724ms 27.1845ms 36.7857 Ops/s 36.3662 Ops/s $\color{#35bf28}+1.15\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants