Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] resilient _exclude_td_from_pytree #1038

Merged
merged 1 commit into from
Oct 11, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Oct 11, 2024

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Oct 11, 2024
ghstack-source-id: 7b3ee829689779777d301f0cfff119e48567f9bb
Pull Request resolved: #1038
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 11, 2024
@vmoens vmoens merged commit 20460d8 into gh/vmoens/29/base Oct 11, 2024
17 of 30 checks passed
vmoens added a commit that referenced this pull request Oct 11, 2024
ghstack-source-id: 7b3ee829689779777d301f0cfff119e48567f9bb
Pull Request resolved: #1038
@vmoens vmoens deleted the gh/vmoens/29/head branch October 11, 2024 07:48
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 216. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}12$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 47.8590μs 24.9025μs 40.1566 KOps/s 40.2328 KOps/s $\color{#d91a1a}-0.19\%$
test_plain_set_stack_nested 56.4550μs 25.4234μs 39.3339 KOps/s 39.4117 KOps/s $\color{#d91a1a}-0.20\%$
test_plain_set_nested_inplace 88.2250μs 27.7254μs 36.0680 KOps/s 36.5994 KOps/s $\color{#d91a1a}-1.45\%$
test_plain_set_stack_nested_inplace 89.9470μs 27.8796μs 35.8686 KOps/s 36.8665 KOps/s $\color{#d91a1a}-2.71\%$
test_items 23.7240μs 4.1947μs 238.3958 KOps/s 236.7938 KOps/s $\color{#35bf28}+0.68\%$
test_items_nested 0.5428ms 0.3821ms 2.6172 KOps/s 2.6046 KOps/s $\color{#35bf28}+0.48\%$
test_items_nested_locked 0.4815ms 0.3811ms 2.6240 KOps/s 2.6012 KOps/s $\color{#35bf28}+0.88\%$
test_items_nested_leaf 0.1574ms 79.6355μs 12.5572 KOps/s 12.5738 KOps/s $\color{#d91a1a}-0.13\%$
test_items_stack_nested 0.5263ms 0.3853ms 2.5956 KOps/s 2.5701 KOps/s $\color{#35bf28}+0.99\%$
test_items_stack_nested_leaf 0.1628ms 80.6736μs 12.3956 KOps/s 12.1720 KOps/s $\color{#35bf28}+1.84\%$
test_items_stack_nested_locked 0.5203ms 0.3806ms 2.6273 KOps/s 2.5887 KOps/s $\color{#35bf28}+1.49\%$
test_keys 22.4520μs 3.4524μs 289.6503 KOps/s 282.8045 KOps/s $\color{#35bf28}+2.42\%$
test_keys_nested 0.1822ms 0.1353ms 7.3931 KOps/s 7.4870 KOps/s $\color{#d91a1a}-1.25\%$
test_keys_nested_locked 1.5988ms 0.1412ms 7.0801 KOps/s 7.1202 KOps/s $\color{#d91a1a}-0.56\%$
test_keys_nested_leaf 0.2249ms 0.1189ms 8.4091 KOps/s 8.5241 KOps/s $\color{#d91a1a}-1.35\%$
test_keys_stack_nested 0.2316ms 0.1346ms 7.4303 KOps/s 7.4326 KOps/s $\color{#d91a1a}-0.03\%$
test_keys_stack_nested_leaf 0.2059ms 0.1186ms 8.4314 KOps/s 8.5496 KOps/s $\color{#d91a1a}-1.38\%$
test_keys_stack_nested_locked 0.2615ms 0.1412ms 7.0838 KOps/s 7.1226 KOps/s $\color{#d91a1a}-0.54\%$
test_values 5.7166μs 1.0312μs 969.7041 KOps/s 937.7524 KOps/s $\color{#35bf28}+3.41\%$
test_values_nested 0.1648ms 93.1940μs 10.7303 KOps/s 10.6201 KOps/s $\color{#35bf28}+1.04\%$
test_values_nested_locked 0.1678ms 92.8763μs 10.7670 KOps/s 10.6408 KOps/s $\color{#35bf28}+1.19\%$
test_values_nested_leaf 0.1543ms 79.0898μs 12.6439 KOps/s 12.3814 KOps/s $\color{#35bf28}+2.12\%$
test_values_stack_nested 0.1678ms 93.3006μs 10.7180 KOps/s 9.5891 KOps/s $\textbf{\color{#35bf28}+11.77\%}$
test_values_stack_nested_leaf 0.1413ms 78.8265μs 12.6861 KOps/s 12.4319 KOps/s $\color{#35bf28}+2.04\%$
test_values_stack_nested_locked 0.1803ms 92.9499μs 10.7585 KOps/s 10.5249 KOps/s $\color{#35bf28}+2.22\%$
test_membership 1.8164μs 0.6987μs 1.4312 MOps/s 1.3017 MOps/s $\textbf{\color{#35bf28}+9.95\%}$
test_membership_nested 28.0520μs 2.7105μs 368.9320 KOps/s 360.1632 KOps/s $\color{#35bf28}+2.43\%$
test_membership_nested_leaf 33.7030μs 2.7364μs 365.4418 KOps/s 358.0169 KOps/s $\color{#35bf28}+2.07\%$
test_membership_stacked_nested 30.0860μs 2.7056μs 369.6009 KOps/s 362.7811 KOps/s $\color{#35bf28}+1.88\%$
test_membership_stacked_nested_leaf 25.2570μs 2.7046μs 369.7382 KOps/s 350.3620 KOps/s $\textbf{\color{#35bf28}+5.53\%}$
test_membership_nested_last 25.3770μs 4.3399μs 230.4214 KOps/s 233.4599 KOps/s $\color{#d91a1a}-1.30\%$
test_membership_nested_leaf_last 48.6920μs 4.2729μs 234.0307 KOps/s 234.1689 KOps/s $\color{#d91a1a}-0.06\%$
test_membership_stacked_nested_last 30.7970μs 4.2344μs 236.1612 KOps/s 236.0254 KOps/s $\color{#35bf28}+0.06\%$
test_membership_stacked_nested_leaf_last 41.8680μs 4.2320μs 236.2948 KOps/s 235.1342 KOps/s $\color{#35bf28}+0.49\%$
test_nested_getleaf 38.8720μs 10.5054μs 95.1889 KOps/s 91.8571 KOps/s $\color{#35bf28}+3.63\%$
test_nested_get 34.9350μs 10.0232μs 99.7686 KOps/s 96.9353 KOps/s $\color{#35bf28}+2.92\%$
test_stacked_getleaf 40.6660μs 10.3789μs 96.3495 KOps/s 92.4905 KOps/s $\color{#35bf28}+4.17\%$
test_stacked_get 29.5460μs 9.8807μs 101.2077 KOps/s 95.8552 KOps/s $\textbf{\color{#35bf28}+5.58\%}$
test_nested_getitemleaf 40.7160μs 10.8012μs 92.5823 KOps/s 87.6254 KOps/s $\textbf{\color{#35bf28}+5.66\%}$
test_nested_getitem 32.6220μs 10.1768μs 98.2630 KOps/s 93.4708 KOps/s $\textbf{\color{#35bf28}+5.13\%}$
test_stacked_getitemleaf 40.3060μs 10.6660μs 93.7556 KOps/s 88.9655 KOps/s $\textbf{\color{#35bf28}+5.38\%}$
test_stacked_getitem 34.1440μs 10.1615μs 98.4105 KOps/s 94.9043 KOps/s $\color{#35bf28}+3.69\%$
test_lock_nested 84.7345ms 0.5853ms 1.7084 KOps/s 2.0280 KOps/s $\textbf{\color{#d91a1a}-15.76\%}$
test_lock_stack_nested 0.7480ms 0.4706ms 2.1251 KOps/s 2.1192 KOps/s $\color{#35bf28}+0.28\%$
test_unlock_nested 84.1843ms 0.5034ms 1.9864 KOps/s 2.4051 KOps/s $\textbf{\color{#d91a1a}-17.41\%}$
test_unlock_stack_nested 0.6173ms 0.3857ms 2.5929 KOps/s 2.5741 KOps/s $\color{#35bf28}+0.73\%$
test_flatten_speed 0.2292ms 99.7233μs 10.0277 KOps/s 9.9678 KOps/s $\color{#35bf28}+0.60\%$
test_unflatten_speed 0.6603ms 0.5167ms 1.9352 KOps/s 1.9254 KOps/s $\color{#35bf28}+0.51\%$
test_common_ops 6.6219ms 1.1453ms 873.0968 Ops/s 846.7849 Ops/s $\color{#35bf28}+3.11\%$
test_creation 19.6970μs 2.1257μs 470.4402 KOps/s 489.7822 KOps/s $\color{#d91a1a}-3.95\%$
test_creation_empty 56.9770μs 18.9991μs 52.6341 KOps/s 50.2218 KOps/s $\color{#35bf28}+4.80\%$
test_creation_nested_1 69.2190μs 22.3221μs 44.7987 KOps/s 42.7459 KOps/s $\color{#35bf28}+4.80\%$
test_creation_nested_2 81.9040μs 26.7397μs 37.3976 KOps/s 35.8504 KOps/s $\color{#35bf28}+4.32\%$
test_clone 56.4460μs 16.8448μs 59.3655 KOps/s 60.0019 KOps/s $\color{#d91a1a}-1.06\%$
test_getitem[int] 1.2183ms 16.7785μs 59.6000 KOps/s 60.9349 KOps/s $\color{#d91a1a}-2.19\%$
test_getitem[slice_int] 0.1867ms 32.6784μs 30.6013 KOps/s 32.5435 KOps/s $\textbf{\color{#d91a1a}-5.97\%}$
test_getitem[range] 0.1664ms 57.2898μs 17.4551 KOps/s 17.6573 KOps/s $\color{#d91a1a}-1.15\%$
test_getitem[tuple] 0.1292ms 24.8273μs 40.2783 KOps/s 39.4964 KOps/s $\color{#35bf28}+1.98\%$
test_getitem[list] 0.1822ms 53.2733μs 18.7711 KOps/s 19.1940 KOps/s $\color{#d91a1a}-2.20\%$
test_setitem_dim[int] 85.1600μs 33.7832μs 29.6005 KOps/s 30.6127 KOps/s $\color{#d91a1a}-3.31\%$
test_setitem_dim[slice_int] 0.1155ms 60.2938μs 16.5855 KOps/s 16.3409 KOps/s $\color{#35bf28}+1.50\%$
test_setitem_dim[range] 0.1228ms 84.0006μs 11.9047 KOps/s 11.9423 KOps/s $\color{#d91a1a}-0.31\%$
test_setitem_dim[tuple] 87.0430μs 49.9849μs 20.0060 KOps/s 20.0348 KOps/s $\color{#d91a1a}-0.14\%$
test_setitem 0.1041ms 30.9390μs 32.3216 KOps/s 32.3022 KOps/s $\color{#35bf28}+0.06\%$
test_set 73.6980μs 30.1152μs 33.2059 KOps/s 32.6935 KOps/s $\color{#35bf28}+1.57\%$
test_set_shared 3.4884ms 0.2180ms 4.5869 KOps/s 4.6109 KOps/s $\color{#d91a1a}-0.52\%$
test_update 88.0240μs 38.1768μs 26.1939 KOps/s 24.8336 KOps/s $\textbf{\color{#35bf28}+5.48\%}$
test_update_nested 0.1204ms 48.8224μs 20.4824 KOps/s 19.6445 KOps/s $\color{#35bf28}+4.27\%$
test_update__nested 0.9436ms 44.9286μs 22.2576 KOps/s 22.5212 KOps/s $\color{#d91a1a}-1.17\%$
test_set_nested 79.8600μs 32.5582μs 30.7142 KOps/s 29.6633 KOps/s $\color{#35bf28}+3.54\%$
test_set_nested_new 89.2770μs 38.1944μs 26.1819 KOps/s 25.6224 KOps/s $\color{#35bf28}+2.18\%$
test_select 0.1480ms 57.1330μs 17.5030 KOps/s 17.6731 KOps/s $\color{#d91a1a}-0.96\%$
test_select_nested 0.1312ms 58.7042μs 17.0345 KOps/s 16.6980 KOps/s $\color{#35bf28}+2.02\%$
test_exclude_nested 0.1701ms 74.5447μs 13.4148 KOps/s 13.2256 KOps/s $\color{#35bf28}+1.43\%$
test_empty[True] 0.5278ms 0.3522ms 2.8389 KOps/s 2.8633 KOps/s $\color{#d91a1a}-0.85\%$
test_empty[False] 10.6200μs 1.2143μs 823.5515 KOps/s 721.0087 KOps/s $\textbf{\color{#35bf28}+14.22\%}$
test_unbind_speed 0.4643ms 0.2984ms 3.3512 KOps/s 3.2952 KOps/s $\color{#35bf28}+1.70\%$
test_unbind_speed_stack0 0.4719ms 0.2941ms 3.4004 KOps/s 3.4042 KOps/s $\color{#d91a1a}-0.11\%$
test_unbind_speed_stack1 88.1382ms 0.8018ms 1.2472 KOps/s 1.3411 KOps/s $\textbf{\color{#d91a1a}-7.00\%}$
test_split 2.2117ms 1.9942ms 501.4432 Ops/s 455.7540 Ops/s $\textbf{\color{#35bf28}+10.02\%}$
test_chunk 88.5112ms 2.3400ms 427.3574 Ops/s 452.8576 Ops/s $\textbf{\color{#d91a1a}-5.63\%}$
test_creation[device0] 0.2378ms 0.1157ms 8.6444 KOps/s 8.3383 KOps/s $\color{#35bf28}+3.67\%$
test_creation_from_tensor 4.1101ms 0.1165ms 8.5855 KOps/s 8.5148 KOps/s $\color{#35bf28}+0.83\%$
test_add_one[memmap_tensor0] 0.1844ms 7.2142μs 138.6165 KOps/s 137.7844 KOps/s $\color{#35bf28}+0.60\%$
test_contiguous[memmap_tensor0] 20.2280μs 1.8887μs 529.4776 KOps/s 505.4972 KOps/s $\color{#35bf28}+4.74\%$
test_stack[memmap_tensor0] 44.5740μs 5.5934μs 178.7828 KOps/s 175.9485 KOps/s $\color{#35bf28}+1.61\%$
test_memmaptd_index 1.0411ms 0.4046ms 2.4715 KOps/s 2.4529 KOps/s $\color{#35bf28}+0.76\%$
test_memmaptd_index_astensor 0.8031ms 0.5036ms 1.9855 KOps/s 1.9754 KOps/s $\color{#35bf28}+0.51\%$
test_memmaptd_index_op 1.6889ms 1.0517ms 950.8416 Ops/s 927.3544 Ops/s $\color{#35bf28}+2.53\%$
test_serialize_model 0.1257s 0.1180s 8.4716 Ops/s 8.4951 Ops/s $\color{#d91a1a}-0.28\%$
test_serialize_model_pickle 0.4799s 0.3945s 2.5351 Ops/s 2.4992 Ops/s $\color{#35bf28}+1.44\%$
test_serialize_weights 0.1276s 0.1160s 8.6221 Ops/s 7.6408 Ops/s $\textbf{\color{#35bf28}+12.84\%}$
test_serialize_weights_returnearly 0.2468s 0.1714s 5.8359 Ops/s 6.3096 Ops/s $\textbf{\color{#d91a1a}-7.51\%}$
test_serialize_weights_pickle 0.4638s 0.4063s 2.4611 Ops/s 2.3054 Ops/s $\textbf{\color{#35bf28}+6.75\%}$
test_serialize_weights_filesystem 0.1552s 0.1426s 7.0112 Ops/s 6.9886 Ops/s $\color{#35bf28}+0.32\%$
test_serialize_model_filesystem 0.1597s 0.1512s 6.6118 Ops/s 6.5406 Ops/s $\color{#35bf28}+1.09\%$
test_reshape_pytree 80.9210μs 38.8267μs 25.7555 KOps/s 25.7157 KOps/s $\color{#35bf28}+0.15\%$
test_reshape_td 0.1057ms 46.9208μs 21.3125 KOps/s 21.5592 KOps/s $\color{#d91a1a}-1.14\%$
test_view_pytree 83.6370μs 38.6088μs 25.9008 KOps/s 25.8212 KOps/s $\color{#35bf28}+0.31\%$
test_view_td 0.1228ms 51.8958μs 19.2694 KOps/s 19.3438 KOps/s $\color{#d91a1a}-0.38\%$
test_unbind_pytree 79.9700μs 35.1158μs 28.4772 KOps/s 27.7581 KOps/s $\color{#35bf28}+2.59\%$
test_unbind_td 0.2960ms 44.2590μs 22.5943 KOps/s 22.6074 KOps/s $\color{#d91a1a}-0.06\%$
test_split_pytree 92.2230μs 37.6964μs 26.5277 KOps/s 25.9729 KOps/s $\color{#35bf28}+2.14\%$
test_split_td 89.8265ms 68.9549μs 14.5022 KOps/s 14.9003 KOps/s $\color{#d91a1a}-2.67\%$
test_add_pytree 0.1594ms 45.1077μs 22.1692 KOps/s 22.4952 KOps/s $\color{#d91a1a}-1.45\%$
test_add_td 0.1783ms 88.1836μs 11.3400 KOps/s 10.8403 KOps/s $\color{#35bf28}+4.61\%$
test_compile_add_one_nested[tensordict-compile] 0.1242ms 58.3411μs 17.1406 KOps/s 16.7433 KOps/s $\color{#35bf28}+2.37\%$
test_compile_add_one_nested[tensordict-eager] 0.2791ms 0.1945ms 5.1411 KOps/s 5.1915 KOps/s $\color{#d91a1a}-0.97\%$
test_compile_add_one_nested[pytree-compile] 0.1242ms 57.1694μs 17.4919 KOps/s 17.4729 KOps/s $\color{#35bf28}+0.11\%$
test_compile_add_one_nested[pytree-eager] 0.2541ms 0.1406ms 7.1103 KOps/s 7.0967 KOps/s $\color{#35bf28}+0.19\%$
test_compile_copy_nested[tensordict-compile] 76.3730μs 23.0316μs 43.4186 KOps/s 41.9433 KOps/s $\color{#35bf28}+3.52\%$
test_compile_copy_nested[tensordict-eager] 0.1364ms 73.9945μs 13.5145 KOps/s 13.5204 KOps/s $\color{#d91a1a}-0.04\%$
test_compile_copy_nested[pytree-compile] 0.1612ms 75.2433μs 13.2902 KOps/s 13.2070 KOps/s $\color{#35bf28}+0.63\%$
test_compile_copy_nested[pytree-eager] 0.1372ms 67.7897μs 14.7515 KOps/s 14.5365 KOps/s $\color{#35bf28}+1.48\%$
test_compile_add_one_flat[tensordict-compile] 0.3130ms 0.1816ms 5.5077 KOps/s 5.4791 KOps/s $\color{#35bf28}+0.52\%$
test_compile_add_one_flat[tensordict-eager] 1.0700ms 0.2424ms 4.1258 KOps/s 4.1133 KOps/s $\color{#35bf28}+0.30\%$
test_compile_add_one_flat[tensorclass-compile] 0.1136ms 47.7843μs 20.9274 KOps/s 20.5138 KOps/s $\color{#35bf28}+2.02\%$
test_compile_add_one_flat[tensorclass-eager] 0.4334ms 76.0826μs 13.1436 KOps/s 13.0515 KOps/s $\color{#35bf28}+0.71\%$
test_compile_add_one_flat[pytree-compile] 0.2529ms 0.1752ms 5.7091 KOps/s 5.7699 KOps/s $\color{#d91a1a}-1.05\%$
test_compile_add_one_flat[pytree-eager] 0.4440ms 0.2881ms 3.4715 KOps/s 3.5182 KOps/s $\color{#d91a1a}-1.33\%$
test_compile_add_self_flat[tensordict-eager] 0.3772ms 0.2737ms 3.6532 KOps/s 3.6622 KOps/s $\color{#d91a1a}-0.25\%$
test_compile_add_self_flat[tensordict-compile] 0.3585ms 0.1852ms 5.3991 KOps/s 5.5376 KOps/s $\color{#d91a1a}-2.50\%$
test_compile_add_self_flat[tensorclass-eager] 0.9248ms 75.1730μs 13.3026 KOps/s 13.5118 KOps/s $\color{#d91a1a}-1.55\%$
test_compile_add_self_flat[tensorclass-compile] 0.1217ms 48.0763μs 20.8003 KOps/s 20.3033 KOps/s $\color{#35bf28}+2.45\%$
test_compile_add_self_flat[pytree-eager] 0.3388ms 0.2319ms 4.3126 KOps/s 4.3552 KOps/s $\color{#d91a1a}-0.98\%$
test_compile_add_self_flat[pytree-compile] 0.3197ms 0.1771ms 5.6472 KOps/s 5.6579 KOps/s $\color{#d91a1a}-0.19\%$
test_compile_copy_flat[tensordict-compile] 0.2063ms 0.1127ms 8.8692 KOps/s 9.0484 KOps/s $\color{#d91a1a}-1.98\%$
test_compile_copy_flat[tensordict-eager] 0.1541ms 79.9095μs 12.5142 KOps/s 12.8533 KOps/s $\color{#d91a1a}-2.64\%$
test_compile_copy_flat[pytree-compile] 0.1392ms 76.4839μs 13.0746 KOps/s 12.5282 KOps/s $\color{#35bf28}+4.36\%$
test_compile_copy_flat[pytree-eager] 0.1542ms 68.4037μs 14.6191 KOps/s 14.1098 KOps/s $\color{#35bf28}+3.61\%$
test_compile_assign_and_add[tensordict-compile] 0.3593ms 0.1923ms 5.2013 KOps/s 5.2408 KOps/s $\color{#d91a1a}-0.75\%$
test_compile_assign_and_add[tensordict-eager] 2.7401ms 1.6843ms 593.7020 Ops/s 563.1431 Ops/s $\textbf{\color{#35bf28}+5.43\%}$
test_compile_assign_and_add[pytree-compile] 0.2464ms 0.1898ms 5.2694 KOps/s 5.1743 KOps/s $\color{#35bf28}+1.84\%$
test_compile_assign_and_add[pytree-eager] 1.3212ms 1.0797ms 926.2034 Ops/s 906.0108 Ops/s $\color{#35bf28}+2.23\%$
test_compile_assign_and_add_stack[compile] 0.7593ms 0.4131ms 2.4209 KOps/s 2.4561 KOps/s $\color{#d91a1a}-1.43\%$
test_compile_assign_and_add_stack[eager] 6.0357ms 4.0423ms 247.3860 Ops/s 233.8505 Ops/s $\textbf{\color{#35bf28}+5.79\%}$
test_compile_indexing[tensor-tensordict-compile] 89.4980μs 33.6915μs 29.6810 KOps/s 28.9459 KOps/s $\color{#35bf28}+2.54\%$
test_compile_indexing[tensor-tensordict-eager] 1.0385ms 47.9238μs 20.8665 KOps/s 20.7659 KOps/s $\color{#35bf28}+0.48\%$
test_compile_indexing[tensor-tensorclass-compile] 72.4350μs 30.0759μs 33.2492 KOps/s 32.2785 KOps/s $\color{#35bf28}+3.01\%$
test_compile_indexing[tensor-tensorclass-eager] 77.9960μs 28.9538μs 34.5378 KOps/s 34.3578 KOps/s $\color{#35bf28}+0.52\%$
test_compile_indexing[tensor-pytree-compile] 78.3770μs 29.7954μs 33.5622 KOps/s 32.8341 KOps/s $\color{#35bf28}+2.22\%$
test_compile_indexing[tensor-pytree-eager] 78.9080μs 29.0014μs 34.4811 KOps/s 34.5129 KOps/s $\color{#d91a1a}-0.09\%$
test_compile_indexing[slice-tensordict-compile] 0.1704ms 74.0510μs 13.5042 KOps/s 13.2776 KOps/s $\color{#35bf28}+1.71\%$
test_compile_indexing[slice-tensordict-eager] 0.5560ms 27.4642μs 36.4111 KOps/s 36.8001 KOps/s $\color{#d91a1a}-1.06\%$
test_compile_indexing[slice-tensorclass-compile] 0.1495ms 68.5955μs 14.5782 KOps/s 14.3800 KOps/s $\color{#35bf28}+1.38\%$
test_compile_indexing[slice-tensorclass-eager] 79.9620μs 23.2499μs 43.0109 KOps/s 42.6981 KOps/s $\color{#35bf28}+0.73\%$
test_compile_indexing[slice-pytree-compile] 0.1393ms 68.1877μs 14.6654 KOps/s 14.4760 KOps/s $\color{#35bf28}+1.31\%$
test_compile_indexing[slice-pytree-eager] 67.7460μs 23.5081μs 42.5386 KOps/s 42.3670 KOps/s $\color{#35bf28}+0.40\%$
test_compile_indexing[int-tensordict-compile] 0.1704ms 73.4805μs 13.6091 KOps/s 13.3997 KOps/s $\color{#35bf28}+1.56\%$
test_compile_indexing[int-tensordict-eager] 0.8180ms 27.6123μs 36.2158 KOps/s 37.1174 KOps/s $\color{#d91a1a}-2.43\%$
test_compile_indexing[int-tensorclass-compile] 0.1464ms 68.5861μs 14.5802 KOps/s 14.4345 KOps/s $\color{#35bf28}+1.01\%$
test_compile_indexing[int-tensorclass-eager] 56.2350μs 23.1873μs 43.1271 KOps/s 43.0572 KOps/s $\color{#35bf28}+0.16\%$
test_compile_indexing[int-pytree-compile] 0.4411ms 68.5328μs 14.5916 KOps/s 14.5958 KOps/s $\color{#d91a1a}-0.03\%$
test_compile_indexing[int-pytree-eager] 86.1010μs 23.0814μs 43.3249 KOps/s 43.1787 KOps/s $\color{#35bf28}+0.34\%$
test_mod_add[eager] 70.7330μs 26.4841μs 37.7585 KOps/s 37.7909 KOps/s $\color{#d91a1a}-0.09\%$
test_mod_add[compile] 98.6050μs 37.3366μs 26.7834 KOps/s 25.9009 KOps/s $\color{#35bf28}+3.41\%$
test_mod_add[compile-overhead] 0.1061ms 37.7893μs 26.4625 KOps/s 25.9674 KOps/s $\color{#35bf28}+1.91\%$
test_mod_wrap[eager] 0.3534ms 0.2068ms 4.8355 KOps/s 4.8068 KOps/s $\color{#35bf28}+0.60\%$
test_mod_wrap[compile] 0.3164ms 0.2279ms 4.3872 KOps/s 4.3053 KOps/s $\color{#35bf28}+1.90\%$
test_mod_wrap[compile-overhead] 0.3118ms 0.2284ms 4.3781 KOps/s 4.3093 KOps/s $\color{#35bf28}+1.60\%$
test_mod_wrap_and_backward[eager] 12.8119ms 11.1678ms 89.5430 Ops/s 92.7169 Ops/s $\color{#d91a1a}-3.42\%$
test_mod_wrap_and_backward[compile] 15.6033ms 12.1785ms 82.1117 Ops/s 92.6137 Ops/s $\textbf{\color{#d91a1a}-11.34\%}$
test_mod_wrap_and_backward[compile-overhead] 14.6949ms 12.2775ms 81.4500 Ops/s 92.6182 Ops/s $\textbf{\color{#d91a1a}-12.06\%}$
test_seq_add[eager] 0.1739ms 95.7190μs 10.4472 KOps/s 10.5615 KOps/s $\color{#d91a1a}-1.08\%$
test_seq_add[compile] 0.1428ms 64.1545μs 15.5874 KOps/s 15.4522 KOps/s $\color{#35bf28}+0.87\%$
test_seq_add[compile-overhead] 0.1290ms 62.3782μs 16.0312 KOps/s 15.6807 KOps/s $\color{#35bf28}+2.24\%$
test_seq_wrap[eager] 0.7352ms 0.3893ms 2.5688 KOps/s 2.4606 KOps/s $\color{#35bf28}+4.40\%$
test_seq_wrap[compile] 0.4866ms 0.2699ms 3.7053 KOps/s 3.6662 KOps/s $\color{#35bf28}+1.07\%$
test_seq_wrap[compile-overhead] 0.4112ms 0.2703ms 3.6996 KOps/s 3.6868 KOps/s $\color{#35bf28}+0.35\%$
test_func_call_runtime[False-eager] 0.8463ms 0.5130ms 1.9494 KOps/s 1.9161 KOps/s $\color{#35bf28}+1.74\%$
test_func_call_runtime[False-compile] 0.6115ms 0.4968ms 2.0129 KOps/s 1.9857 KOps/s $\color{#35bf28}+1.37\%$
test_func_call_runtime[False-compile-overhead] 1.0378ms 0.5020ms 1.9922 KOps/s 2.0088 KOps/s $\color{#d91a1a}-0.83\%$
test_func_call_runtime[True-eager] 0.8752ms 0.7365ms 1.3578 KOps/s 1.3636 KOps/s $\color{#d91a1a}-0.42\%$
test_func_call_runtime[True-compile] 0.8691ms 0.5193ms 1.9256 KOps/s 1.8709 KOps/s $\color{#35bf28}+2.92\%$
test_func_call_runtime[True-compile-overhead] 0.8858ms 0.5137ms 1.9466 KOps/s 1.9652 KOps/s $\color{#d91a1a}-0.95\%$
test_func_call_cm_runtime[False-eager] 1.1536ms 0.5105ms 1.9590 KOps/s 1.9495 KOps/s $\color{#35bf28}+0.49\%$
test_func_call_cm_runtime[False-compile] 1.0388ms 0.5092ms 1.9638 KOps/s 1.9835 KOps/s $\color{#d91a1a}-0.99\%$
test_func_call_cm_runtime[False-compile-overhead] 0.6275ms 0.4995ms 2.0020 KOps/s 1.9805 KOps/s $\color{#35bf28}+1.08\%$
test_func_call_cm_runtime[True-eager] 1.0937ms 0.8841ms 1.1312 KOps/s 1.1333 KOps/s $\color{#d91a1a}-0.19\%$
test_func_call_cm_runtime[True-compile] 1.0171ms 0.7303ms 1.3693 KOps/s 1.3594 KOps/s $\color{#35bf28}+0.73\%$
test_func_call_cm_runtime[True-compile-overhead] 0.8722ms 0.7276ms 1.3743 KOps/s 1.3502 KOps/s $\color{#35bf28}+1.79\%$
test_vmap_func_call_cm_runtime[eager] 2.8229ms 1.8973ms 527.0728 Ops/s 520.2379 Ops/s $\color{#35bf28}+1.31\%$
test_vmap_func_call_cm_runtime[compile] 2.5925ms 1.9235ms 519.8859 Ops/s 507.6825 Ops/s $\color{#35bf28}+2.40\%$
test_vmap_func_call_cm_runtime[compile-overhead] 2.8626ms 1.9297ms 518.2069 Ops/s 507.9333 Ops/s $\color{#35bf28}+2.02\%$
test_distributed 0.2771ms 0.1277ms 7.8292 KOps/s 7.6850 KOps/s $\color{#35bf28}+1.88\%$
test_tdmodule 46.8070μs 19.3110μs 51.7839 KOps/s 53.2886 KOps/s $\color{#d91a1a}-2.82\%$
test_tdmodule_dispatch 76.8740μs 37.7396μs 26.4974 KOps/s 25.5024 KOps/s $\color{#35bf28}+3.90\%$
test_tdseq 37.1400μs 21.0097μs 47.5970 KOps/s 44.0399 KOps/s $\textbf{\color{#35bf28}+8.08\%}$
test_tdseq_dispatch 67.2260μs 42.7828μs 23.3739 KOps/s 22.0312 KOps/s $\textbf{\color{#35bf28}+6.09\%}$
test_instantiation_functorch 1.8423ms 1.5883ms 629.6023 Ops/s 627.7135 Ops/s $\color{#35bf28}+0.30\%$
test_exec_functorch 0.4096ms 0.1855ms 5.3894 KOps/s 5.4128 KOps/s $\color{#d91a1a}-0.43\%$
test_exec_functional_call 0.2844ms 0.1725ms 5.7964 KOps/s 5.7631 KOps/s $\color{#35bf28}+0.58\%$
test_exec_td_decorator 0.4734ms 0.2339ms 4.2747 KOps/s 4.2396 KOps/s $\color{#35bf28}+0.83\%$
test_vmap_mlp_speed_decorator[True-True] 0.9960ms 0.6387ms 1.5656 KOps/s 1.5376 KOps/s $\color{#35bf28}+1.83\%$
test_vmap_mlp_speed_decorator[True-False] 1.0174ms 0.6488ms 1.5414 KOps/s 1.5442 KOps/s $\color{#d91a1a}-0.18\%$
test_vmap_mlp_speed_decorator[False-True] 0.9311ms 0.5220ms 1.9156 KOps/s 1.8758 KOps/s $\color{#35bf28}+2.12\%$
test_vmap_mlp_speed_decorator[False-False] 0.7828ms 0.5219ms 1.9162 KOps/s 1.8483 KOps/s $\color{#35bf28}+3.68\%$
test_to_module_speed[True] 1.9533ms 1.4075ms 710.4713 Ops/s 699.6878 Ops/s $\color{#35bf28}+1.54\%$
test_to_module_speed[False] 1.4813ms 1.3737ms 727.9630 Ops/s 724.0173 Ops/s $\color{#35bf28}+0.54\%$
test_tc_init 80.6110μs 47.4070μs 21.0939 KOps/s 20.2308 KOps/s $\color{#35bf28}+4.27\%$
test_tc_init_nested 0.1645ms 95.5965μs 10.4606 KOps/s 10.1395 KOps/s $\color{#35bf28}+3.17\%$
test_tc_first_layer_tensor 19.8170μs 1.5622μs 640.1326 KOps/s 656.1862 KOps/s $\color{#d91a1a}-2.45\%$
test_tc_first_layer_nontensor 41.9290μs 4.7240μs 211.6866 KOps/s 213.7321 KOps/s $\color{#d91a1a}-0.96\%$
test_tc_second_layer_tensor 30.1860μs 2.8603μs 349.6149 KOps/s 355.6859 KOps/s $\color{#d91a1a}-1.71\%$
test_tc_second_layer_nontensor 47.9800μs 6.1201μs 163.3952 KOps/s 164.6178 KOps/s $\color{#d91a1a}-0.74\%$
test_unbind 7.7313ms 7.4730ms 133.8145 Ops/s 75.2138 Ops/s $\textbf{\color{#35bf28}+77.91\%}$
test_full_like 17.9703ms 11.4354ms 87.4475 Ops/s 140.6018 Ops/s $\textbf{\color{#d91a1a}-37.80\%}$
test_zeros_like 12.4020ms 7.6559ms 130.6185 Ops/s 361.4289 Ops/s $\textbf{\color{#d91a1a}-63.86\%}$
test_ones_like 14.4237ms 7.6713ms 130.3552 Ops/s 167.2743 Ops/s $\textbf{\color{#d91a1a}-22.07\%}$
test_clone 17.8084ms 9.2173ms 108.4919 Ops/s 127.4136 Ops/s $\textbf{\color{#d91a1a}-14.85\%}$
test_squeeze 48.2500μs 12.8555μs 77.7876 KOps/s 80.8392 KOps/s $\color{#d91a1a}-3.77\%$
test_unsqueeze 0.1871ms 92.0212μs 10.8671 KOps/s 10.7993 KOps/s $\color{#35bf28}+0.63\%$
test_split 0.4278ms 0.1923ms 5.2014 KOps/s 5.0656 KOps/s $\color{#35bf28}+2.68\%$
test_permute 0.4216ms 0.2214ms 4.5157 KOps/s 4.5189 KOps/s $\color{#d91a1a}-0.07\%$
test_stack 27.8654ms 25.1260ms 39.7994 Ops/s 38.9083 Ops/s $\color{#35bf28}+2.29\%$
test_cat 29.6801ms 25.3070ms 39.5147 Ops/s 38.6441 Ops/s $\color{#35bf28}+2.25\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 218. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}11$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1483ms 18.0638μs 55.3593 KOps/s 56.4384 KOps/s $\color{#d91a1a}-1.91\%$
test_plain_set_stack_nested 43.1810μs 17.9559μs 55.6920 KOps/s 56.3291 KOps/s $\color{#d91a1a}-1.13\%$
test_plain_set_nested_inplace 46.4310μs 19.2979μs 51.8191 KOps/s 52.6664 KOps/s $\color{#d91a1a}-1.61\%$
test_plain_set_stack_nested_inplace 48.4310μs 19.3663μs 51.6360 KOps/s 52.3707 KOps/s $\color{#d91a1a}-1.40\%$
test_items 26.7700μs 2.9039μs 344.3629 KOps/s 342.8539 KOps/s $\color{#35bf28}+0.44\%$
test_items_nested 0.3844ms 0.3410ms 2.9327 KOps/s 2.9340 KOps/s $\color{#d91a1a}-0.04\%$
test_items_nested_locked 0.3703ms 0.3463ms 2.8876 KOps/s 2.9404 KOps/s $\color{#d91a1a}-1.79\%$
test_items_nested_leaf 91.8820μs 63.4437μs 15.7620 KOps/s 15.9183 KOps/s $\color{#d91a1a}-0.98\%$
test_items_stack_nested 0.3941ms 0.3413ms 2.9296 KOps/s 2.9426 KOps/s $\color{#d91a1a}-0.44\%$
test_items_stack_nested_leaf 96.3220μs 64.3499μs 15.5401 KOps/s 15.5431 KOps/s $\color{#d91a1a}-0.02\%$
test_items_stack_nested_locked 0.4267ms 0.3426ms 2.9191 KOps/s 2.9379 KOps/s $\color{#d91a1a}-0.64\%$
test_keys 24.2810μs 3.4670μs 288.4376 KOps/s 292.5903 KOps/s $\color{#d91a1a}-1.42\%$
test_keys_nested 0.1103ms 70.9496μs 14.0945 KOps/s 14.1196 KOps/s $\color{#d91a1a}-0.18\%$
test_keys_nested_locked 2.2955ms 76.2961μs 13.1068 KOps/s 12.9725 KOps/s $\color{#35bf28}+1.04\%$
test_keys_nested_leaf 93.9120μs 61.6013μs 16.2334 KOps/s 16.2090 KOps/s $\color{#35bf28}+0.15\%$
test_keys_stack_nested 0.1046ms 71.4201μs 14.0017 KOps/s 13.8287 KOps/s $\color{#35bf28}+1.25\%$
test_keys_stack_nested_leaf 96.6520μs 62.9852μs 15.8767 KOps/s 15.6656 KOps/s $\color{#35bf28}+1.35\%$
test_keys_stack_nested_locked 0.1209ms 77.2678μs 12.9420 KOps/s 12.8116 KOps/s $\color{#35bf28}+1.02\%$
test_values 5.3017μs 0.8405μs 1.1897 MOps/s 1.1889 MOps/s $\color{#35bf28}+0.06\%$
test_values_nested 88.6620μs 49.0464μs 20.3889 KOps/s 20.3251 KOps/s $\color{#35bf28}+0.31\%$
test_values_nested_locked 85.7310μs 50.6971μs 19.7250 KOps/s 19.2980 KOps/s $\color{#35bf28}+2.21\%$
test_values_nested_leaf 83.7610μs 42.7692μs 23.3813 KOps/s 23.3987 KOps/s $\color{#d91a1a}-0.07\%$
test_values_stack_nested 80.5320μs 49.8860μs 20.0457 KOps/s 19.9321 KOps/s $\color{#35bf28}+0.57\%$
test_values_stack_nested_leaf 81.3820μs 43.4116μs 23.0353 KOps/s 22.6656 KOps/s $\color{#35bf28}+1.63\%$
test_values_stack_nested_locked 85.3710μs 51.3638μs 19.4690 KOps/s 19.0426 KOps/s $\color{#35bf28}+2.24\%$
test_membership 1.8181μs 0.5016μs 1.9937 MOps/s 1.9811 MOps/s $\color{#35bf28}+0.64\%$
test_membership_nested 15.8405μs 1.9264μs 519.1120 KOps/s 528.1899 KOps/s $\color{#d91a1a}-1.72\%$
test_membership_nested_leaf 23.3105μs 1.9000μs 526.3089 KOps/s 525.1248 KOps/s $\color{#35bf28}+0.23\%$
test_membership_stacked_nested 27.5610μs 2.0157μs 496.1019 KOps/s 518.7334 KOps/s $\color{#d91a1a}-4.36\%$
test_membership_stacked_nested_leaf 30.1310μs 2.0069μs 498.2852 KOps/s 514.5171 KOps/s $\color{#d91a1a}-3.15\%$
test_membership_nested_last 82.4230μs 2.9976μs 333.6052 KOps/s 335.3111 KOps/s $\color{#d91a1a}-0.51\%$
test_membership_nested_leaf_last 26.9010μs 3.0794μs 324.7434 KOps/s 336.1542 KOps/s $\color{#d91a1a}-3.39\%$
test_membership_stacked_nested_last 33.2710μs 3.1007μs 322.5100 KOps/s 337.1262 KOps/s $\color{#d91a1a}-4.34\%$
test_membership_stacked_nested_leaf_last 29.8510μs 3.0372μs 329.2500 KOps/s 337.6696 KOps/s $\color{#d91a1a}-2.49\%$
test_nested_getleaf 31.4310μs 6.0319μs 165.7840 KOps/s 164.1809 KOps/s $\color{#35bf28}+0.98\%$
test_nested_get 34.6210μs 5.6968μs 175.5386 KOps/s 171.2446 KOps/s $\color{#35bf28}+2.51\%$
test_stacked_getleaf 32.9910μs 6.0009μs 166.6429 KOps/s 167.2311 KOps/s $\color{#d91a1a}-0.35\%$
test_stacked_get 35.3210μs 5.6379μs 177.3708 KOps/s 175.4768 KOps/s $\color{#35bf28}+1.08\%$
test_nested_getitemleaf 31.2410μs 6.1630μs 162.2579 KOps/s 162.7275 KOps/s $\color{#d91a1a}-0.29\%$
test_nested_getitem 28.5810μs 5.7762μs 173.1250 KOps/s 171.7771 KOps/s $\color{#35bf28}+0.78\%$
test_stacked_getitemleaf 39.6100μs 6.0820μs 164.4202 KOps/s 165.9381 KOps/s $\color{#d91a1a}-0.91\%$
test_stacked_getitem 33.8000μs 5.7734μs 173.2078 KOps/s 176.9318 KOps/s $\color{#d91a1a}-2.10\%$
test_lock_nested 4.8905ms 0.4332ms 2.3086 KOps/s 2.3675 KOps/s $\color{#d91a1a}-2.49\%$
test_lock_stack_nested 0.4290ms 0.3955ms 2.5283 KOps/s 2.5763 KOps/s $\color{#d91a1a}-1.86\%$
test_unlock_nested 0.7579ms 0.3670ms 2.7247 KOps/s 2.7813 KOps/s $\color{#d91a1a}-2.04\%$
test_unlock_stack_nested 0.3782ms 0.3335ms 2.9986 KOps/s 3.0637 KOps/s $\color{#d91a1a}-2.12\%$
test_flatten_speed 0.1061ms 77.1517μs 12.9615 KOps/s 12.9426 KOps/s $\color{#35bf28}+0.15\%$
test_unflatten_speed 0.3806ms 0.3276ms 3.0523 KOps/s 3.0821 KOps/s $\color{#d91a1a}-0.97\%$
test_common_ops 1.6404ms 1.3112ms 762.6332 Ops/s 779.5054 Ops/s $\color{#d91a1a}-2.16\%$
test_creation 24.8200μs 1.4988μs 667.1844 KOps/s 684.7589 KOps/s $\color{#d91a1a}-2.57\%$
test_creation_empty 62.9610μs 17.9553μs 55.6939 KOps/s 57.5964 KOps/s $\color{#d91a1a}-3.30\%$
test_creation_nested_1 53.2300μs 19.5961μs 51.0306 KOps/s 51.3939 KOps/s $\color{#d91a1a}-0.71\%$
test_creation_nested_2 1.2094ms 23.1743μs 43.1513 KOps/s 45.7836 KOps/s $\textbf{\color{#d91a1a}-5.75\%}$
test_clone 62.4210μs 29.0315μs 34.4454 KOps/s 35.0890 KOps/s $\color{#d91a1a}-1.83\%$
test_getitem[int] 1.1914ms 15.7246μs 63.5948 KOps/s 65.0397 KOps/s $\color{#d91a1a}-2.22\%$
test_getitem[slice_int] 91.7979ms 37.7283μs 26.5053 KOps/s 37.5638 KOps/s $\textbf{\color{#d91a1a}-29.44\%}$
test_getitem[range] 0.1525ms 0.1082ms 9.2403 KOps/s 9.0429 KOps/s $\color{#35bf28}+2.18\%$
test_getitem[tuple] 0.1196ms 23.6326μs 42.3145 KOps/s 43.1940 KOps/s $\color{#d91a1a}-2.04\%$
test_getitem[list] 0.1958ms 99.4508μs 10.0552 KOps/s 10.0924 KOps/s $\color{#d91a1a}-0.37\%$
test_setitem_dim[int] 85.9420μs 44.6245μs 22.4092 KOps/s 22.3615 KOps/s $\color{#35bf28}+0.21\%$
test_setitem_dim[slice_int] 92.8410μs 66.1024μs 15.1280 KOps/s 15.1605 KOps/s $\color{#d91a1a}-0.21\%$
test_setitem_dim[range] 0.1775ms 0.1271ms 7.8698 KOps/s 7.8958 KOps/s $\color{#d91a1a}-0.33\%$
test_setitem_dim[tuple] 98.7210μs 60.3583μs 16.5677 KOps/s 16.5362 KOps/s $\color{#35bf28}+0.19\%$
test_setitem 86.5720μs 43.7005μs 22.8831 KOps/s 23.4001 KOps/s $\color{#d91a1a}-2.21\%$
test_set 88.8020μs 42.2651μs 23.6602 KOps/s 24.2706 KOps/s $\color{#d91a1a}-2.51\%$
test_set_shared 0.3550ms 53.7504μs 18.6045 KOps/s 18.6582 KOps/s $\color{#d91a1a}-0.29\%$
test_update 93.7220μs 53.4180μs 18.7203 KOps/s 19.3506 KOps/s $\color{#d91a1a}-3.26\%$
test_update_nested 0.1083ms 60.5287μs 16.5211 KOps/s 16.9179 KOps/s $\color{#d91a1a}-2.35\%$
test_update__nested 0.4234ms 66.3846μs 15.0637 KOps/s 16.4703 KOps/s $\textbf{\color{#d91a1a}-8.54\%}$
test_set_nested 80.9620μs 45.0522μs 22.1965 KOps/s 22.3654 KOps/s $\color{#d91a1a}-0.76\%$
test_set_nested_new 90.3720μs 49.2572μs 20.3016 KOps/s 20.8345 KOps/s $\color{#d91a1a}-2.56\%$
test_select 0.1033ms 61.9745μs 16.1357 KOps/s 16.0731 KOps/s $\color{#35bf28}+0.39\%$
test_select_nested 0.2048ms 41.6282μs 24.0222 KOps/s 23.7256 KOps/s $\color{#35bf28}+1.25\%$
test_exclude_nested 0.1055ms 59.2315μs 16.8829 KOps/s 16.7479 KOps/s $\color{#35bf28}+0.81\%$
test_empty[True] 0.2944ms 0.2589ms 3.8625 KOps/s 3.8529 KOps/s $\color{#35bf28}+0.25\%$
test_empty[False] 3.0370μs 0.7381μs 1.3549 MOps/s 1.3588 MOps/s $\color{#d91a1a}-0.29\%$
test_to 54.7610μs 26.8923μs 37.1853 KOps/s 37.2858 KOps/s $\color{#d91a1a}-0.27\%$
test_to_nonblocking 66.8810μs 25.7167μs 38.8853 KOps/s 38.5008 KOps/s $\color{#35bf28}+1.00\%$
test_unbind_speed 1.0681ms 0.2756ms 3.6282 KOps/s 3.6618 KOps/s $\color{#d91a1a}-0.92\%$
test_unbind_speed_stack0 0.3259ms 0.2781ms 3.5957 KOps/s 3.6308 KOps/s $\color{#d91a1a}-0.97\%$
test_unbind_speed_stack1 91.2836ms 0.7101ms 1.4082 KOps/s 1.4036 KOps/s $\color{#35bf28}+0.33\%$
test_split 92.6550ms 2.1243ms 470.7334 Ops/s 473.1984 Ops/s $\color{#d91a1a}-0.52\%$
test_chunk 94.7604ms 2.1241ms 470.7960 Ops/s 471.7039 Ops/s $\color{#d91a1a}-0.19\%$
test_creation[device0] 0.3820ms 0.1266ms 7.8996 KOps/s 7.9174 KOps/s $\color{#d91a1a}-0.22\%$
test_creation_from_tensor 0.4225ms 0.1289ms 7.7560 KOps/s 7.7836 KOps/s $\color{#d91a1a}-0.35\%$
test_add_one[memmap_tensor0] 0.1425ms 9.0299μs 110.7432 KOps/s 116.5694 KOps/s $\color{#d91a1a}-5.00\%$
test_contiguous[memmap_tensor0] 30.7710μs 2.1336μs 468.6900 KOps/s 466.7331 KOps/s $\color{#35bf28}+0.42\%$
test_stack[memmap_tensor0] 36.1510μs 6.2260μs 160.6163 KOps/s 157.4833 KOps/s $\color{#35bf28}+1.99\%$
test_memmaptd_index 1.0559ms 0.4189ms 2.3870 KOps/s 2.3758 KOps/s $\color{#35bf28}+0.47\%$
test_memmaptd_index_astensor 0.7417ms 0.4896ms 2.0423 KOps/s 2.0267 KOps/s $\color{#35bf28}+0.77\%$
test_memmaptd_index_op 1.4813ms 1.0797ms 926.2164 Ops/s 957.9826 Ops/s $\color{#d91a1a}-3.32\%$
test_serialize_model 0.1311s 0.1298s 7.7054 Ops/s 7.6687 Ops/s $\color{#35bf28}+0.48\%$
test_serialize_model_pickle 1.3520s 1.2128s 0.8246 Ops/s 0.8233 Ops/s $\color{#35bf28}+0.15\%$
test_serialize_weights 0.2225s 0.1426s 7.0116 Ops/s 6.9884 Ops/s $\color{#35bf28}+0.33\%$
test_serialize_weights_returnearly 0.2318s 56.1929ms 17.7959 Ops/s 17.8920 Ops/s $\color{#d91a1a}-0.54\%$
test_serialize_weights_pickle 1.3818s 1.2212s 0.8189 Ops/s 0.8381 Ops/s $\color{#d91a1a}-2.29\%$
test_reshape_pytree 79.5010μs 37.3009μs 26.8090 KOps/s 28.0037 KOps/s $\color{#d91a1a}-4.27\%$
test_reshape_td 81.3820μs 42.0655μs 23.7725 KOps/s 25.0742 KOps/s $\textbf{\color{#d91a1a}-5.19\%}$
test_view_pytree 72.7720μs 37.1674μs 26.9053 KOps/s 28.3346 KOps/s $\textbf{\color{#d91a1a}-5.04\%}$
test_view_td 82.0110μs 49.2171μs 20.3181 KOps/s 21.6453 KOps/s $\textbf{\color{#d91a1a}-6.13\%}$
test_unbind_pytree 64.4810μs 33.6881μs 29.6841 KOps/s 29.4779 KOps/s $\color{#35bf28}+0.70\%$
test_unbind_td 0.4217ms 43.3084μs 23.0902 KOps/s 23.6345 KOps/s $\color{#d91a1a}-2.30\%$
test_split_pytree 83.6620μs 46.0459μs 21.7175 KOps/s 21.4592 KOps/s $\color{#35bf28}+1.20\%$
test_split_td 0.7109ms 58.4007μs 17.1231 KOps/s 16.0120 KOps/s $\textbf{\color{#35bf28}+6.94\%}$
test_add_pytree 0.1043ms 62.1235μs 16.0970 KOps/s 17.6801 KOps/s $\textbf{\color{#d91a1a}-8.95\%}$
test_add_td 0.1533ms 0.1051ms 9.5174 KOps/s 10.6454 KOps/s $\textbf{\color{#d91a1a}-10.60\%}$
test_compile_add_one_nested[tensordict-compile] 0.2372ms 0.1597ms 6.2605 KOps/s 6.3133 KOps/s $\color{#d91a1a}-0.84\%$
test_compile_add_one_nested[tensordict-eager] 0.5666ms 0.1623ms 6.1621 KOps/s 6.2314 KOps/s $\color{#d91a1a}-1.11\%$
test_compile_add_one_nested[pytree-compile] 0.5688ms 0.1522ms 6.5704 KOps/s 6.5319 KOps/s $\color{#35bf28}+0.59\%$
test_compile_add_one_nested[pytree-eager] 0.5975ms 0.1864ms 5.3641 KOps/s 5.4787 KOps/s $\color{#d91a1a}-2.09\%$
test_compile_copy_nested[tensordict-compile] 73.7120μs 21.1682μs 47.2408 KOps/s 47.1951 KOps/s $\color{#35bf28}+0.10\%$
test_compile_copy_nested[tensordict-eager] 93.0920μs 48.4568μs 20.6369 KOps/s 20.3624 KOps/s $\color{#35bf28}+1.35\%$
test_compile_copy_nested[pytree-compile] 0.4139ms 65.0099μs 15.3823 KOps/s 15.6214 KOps/s $\color{#d91a1a}-1.53\%$
test_compile_copy_nested[pytree-eager] 97.2210μs 49.8361μs 20.0658 KOps/s 20.1228 KOps/s $\color{#d91a1a}-0.28\%$
test_compile_add_one_flat[tensordict-compile] 0.4082ms 0.3134ms 3.1906 KOps/s 3.1910 KOps/s $\color{#d91a1a}-0.01\%$
test_compile_add_one_flat[tensordict-eager] 0.3515ms 0.2323ms 4.3052 KOps/s 4.2792 KOps/s $\color{#35bf28}+0.61\%$
test_compile_add_one_flat[tensorclass-compile] 0.2092ms 0.1256ms 7.9634 KOps/s 7.9525 KOps/s $\color{#35bf28}+0.14\%$
test_compile_add_one_flat[tensorclass-eager] 0.1270ms 65.3582μs 15.3003 KOps/s 15.3143 KOps/s $\color{#d91a1a}-0.09\%$
test_compile_add_one_flat[pytree-compile] 0.3861ms 0.3210ms 3.1152 KOps/s 3.1258 KOps/s $\color{#d91a1a}-0.34\%$
test_compile_add_one_flat[pytree-eager] 1.0086ms 0.6313ms 1.5841 KOps/s 1.6416 KOps/s $\color{#d91a1a}-3.50\%$
test_compile_add_self_flat[tensordict-eager] 0.6768ms 0.2837ms 3.5252 KOps/s 3.5211 KOps/s $\color{#35bf28}+0.12\%$
test_compile_add_self_flat[tensordict-compile] 0.7371ms 0.3145ms 3.1796 KOps/s 3.1920 KOps/s $\color{#d91a1a}-0.39\%$
test_compile_add_self_flat[tensorclass-eager] 0.1589ms 77.1733μs 12.9578 KOps/s 13.0189 KOps/s $\color{#d91a1a}-0.47\%$
test_compile_add_self_flat[tensorclass-compile] 0.1716ms 0.1261ms 7.9289 KOps/s 7.9253 KOps/s $\color{#35bf28}+0.05\%$
test_compile_add_self_flat[pytree-eager] 0.9306ms 0.5272ms 1.8968 KOps/s 1.9488 KOps/s $\color{#d91a1a}-2.67\%$
test_compile_add_self_flat[pytree-compile] 0.4604ms 0.3214ms 3.1116 KOps/s 3.1063 KOps/s $\color{#35bf28}+0.17\%$
test_compile_copy_flat[tensordict-compile] 63.3710μs 18.8214μs 53.1310 KOps/s 51.8708 KOps/s $\color{#35bf28}+2.43\%$
test_compile_copy_flat[tensordict-eager] 86.6210μs 38.1913μs 26.1840 KOps/s 26.0002 KOps/s $\color{#35bf28}+0.71\%$
test_compile_copy_flat[pytree-compile] 0.1202ms 70.5995μs 14.1644 KOps/s 14.3671 KOps/s $\color{#d91a1a}-1.41\%$
test_compile_copy_flat[pytree-eager] 0.1065ms 52.1177μs 19.1873 KOps/s 19.3521 KOps/s $\color{#d91a1a}-0.85\%$
test_compile_assign_and_add[tensordict-compile] 2.3720ms 0.8192ms 1.2206 KOps/s 1.1335 KOps/s $\textbf{\color{#35bf28}+7.68\%}$
test_compile_assign_and_add[tensordict-eager] 3.5613ms 3.2035ms 312.1579 Ops/s 310.7484 Ops/s $\color{#35bf28}+0.45\%$
test_compile_assign_and_add[pytree-compile] 2.4265ms 0.8335ms 1.1998 KOps/s 1.1188 KOps/s $\textbf{\color{#35bf28}+7.24\%}$
test_compile_assign_and_add[pytree-eager] 3.4542ms 3.2175ms 310.8048 Ops/s 319.5724 Ops/s $\color{#d91a1a}-2.74\%$
test_compile_indexing[tensor-tensordict-compile] 0.2143ms 0.1160ms 8.6242 KOps/s 8.5717 KOps/s $\color{#35bf28}+0.61\%$
test_compile_indexing[tensor-tensordict-eager] 0.1878ms 61.5092μs 16.2577 KOps/s 16.2976 KOps/s $\color{#d91a1a}-0.24\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1594ms 0.1111ms 8.9986 KOps/s 8.9943 KOps/s $\color{#35bf28}+0.05\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1660ms 43.6985μs 22.8841 KOps/s 22.9767 KOps/s $\color{#d91a1a}-0.40\%$
test_compile_indexing[tensor-pytree-compile] 0.1786ms 0.1122ms 8.9155 KOps/s 8.8773 KOps/s $\color{#35bf28}+0.43\%$
test_compile_indexing[tensor-pytree-eager] 95.6220μs 43.5732μs 22.9499 KOps/s 23.2376 KOps/s $\color{#d91a1a}-1.24\%$
test_compile_indexing[slice-tensordict-compile] 0.2050ms 0.1424ms 7.0207 KOps/s 7.0094 KOps/s $\color{#35bf28}+0.16\%$
test_compile_indexing[slice-tensordict-eager] 0.1540ms 23.9177μs 41.8100 KOps/s 40.7004 KOps/s $\color{#35bf28}+2.73\%$
test_compile_indexing[slice-tensorclass-compile] 0.1796ms 0.1361ms 7.3490 KOps/s 7.2640 KOps/s $\color{#35bf28}+1.17\%$
test_compile_indexing[slice-tensorclass-eager] 58.5010μs 20.3556μs 49.1266 KOps/s 49.2384 KOps/s $\color{#d91a1a}-0.23\%$
test_compile_indexing[slice-pytree-compile] 0.1766ms 0.1370ms 7.2979 KOps/s 7.1966 KOps/s $\color{#35bf28}+1.41\%$
test_compile_indexing[slice-pytree-eager] 56.4210μs 20.4526μs 48.8936 KOps/s 49.6139 KOps/s $\color{#d91a1a}-1.45\%$
test_compile_indexing[int-tensordict-compile] 0.2337ms 0.1427ms 7.0090 KOps/s 6.9638 KOps/s $\color{#35bf28}+0.65\%$
test_compile_indexing[int-tensordict-eager] 0.4912ms 23.5979μs 42.3767 KOps/s 41.4399 KOps/s $\color{#35bf28}+2.26\%$
test_compile_indexing[int-tensorclass-compile] 0.2088ms 0.1379ms 7.2501 KOps/s 7.2392 KOps/s $\color{#35bf28}+0.15\%$
test_compile_indexing[int-tensorclass-eager] 54.0510μs 20.6576μs 48.4084 KOps/s 49.4356 KOps/s $\color{#d91a1a}-2.08\%$
test_compile_indexing[int-pytree-compile] 0.1763ms 0.1378ms 7.2589 KOps/s 7.2299 KOps/s $\color{#35bf28}+0.40\%$
test_compile_indexing[int-pytree-eager] 52.8500μs 20.6513μs 48.4232 KOps/s 49.5870 KOps/s $\color{#d91a1a}-2.35\%$
test_mod_add[eager] 76.7610μs 32.8487μs 30.4426 KOps/s 30.4290 KOps/s $\color{#35bf28}+0.04\%$
test_mod_add[compile] 0.1200ms 78.5084μs 12.7375 KOps/s 12.4441 KOps/s $\color{#35bf28}+2.36\%$
test_mod_add[compile-overhead] 0.3019ms 0.1496ms 6.6848 KOps/s 6.2572 KOps/s $\textbf{\color{#35bf28}+6.83\%}$
test_mod_wrap[eager] 0.3285ms 0.2428ms 4.1192 KOps/s 4.1048 KOps/s $\color{#35bf28}+0.35\%$
test_mod_wrap[compile] 1.3413ms 0.2910ms 3.4366 KOps/s 3.3347 KOps/s $\color{#35bf28}+3.06\%$
test_mod_wrap[compile-overhead] 7.6277ms 4.0815ms 245.0076 Ops/s 254.8212 Ops/s $\color{#d91a1a}-3.85\%$
test_mod_wrap_and_backward[eager] 1.5299ms 1.3767ms 726.3633 Ops/s 693.9433 Ops/s $\color{#35bf28}+4.67\%$
test_mod_wrap_and_backward[compile] 1.5614ms 1.3288ms 752.5729 Ops/s 701.2918 Ops/s $\textbf{\color{#35bf28}+7.31\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3301ms 0.9069ms 1.1026 KOps/s 1.0015 KOps/s $\textbf{\color{#35bf28}+10.09\%}$
test_seq_add[eager] 0.1522ms 0.1055ms 9.4788 KOps/s 9.9569 KOps/s $\color{#d91a1a}-4.80\%$
test_seq_add[compile] 0.1352ms 88.3021μs 11.3248 KOps/s 11.2235 KOps/s $\color{#35bf28}+0.90\%$
test_seq_add[compile-overhead] 0.1940ms 0.1221ms 8.1923 KOps/s 8.0865 KOps/s $\color{#35bf28}+1.31\%$
test_seq_wrap[eager] 0.6791ms 0.3866ms 2.5868 KOps/s 2.5188 KOps/s $\color{#35bf28}+2.70\%$
test_seq_wrap[compile] 0.4011ms 0.3083ms 3.2433 KOps/s 3.1933 KOps/s $\color{#35bf28}+1.56\%$
test_seq_wrap[compile-overhead] 0.2918ms 0.2156ms 4.6386 KOps/s 4.5845 KOps/s $\color{#35bf28}+1.18\%$
test_func_call_runtime[False-eager] 0.8173ms 0.7365ms 1.3578 KOps/s 1.3193 KOps/s $\color{#35bf28}+2.92\%$
test_func_call_runtime[False-compile] 0.8254ms 0.7726ms 1.2944 KOps/s 1.2911 KOps/s $\color{#35bf28}+0.25\%$
test_func_call_runtime[False-compile-overhead] 0.4045ms 0.3510ms 2.8492 KOps/s 2.8143 KOps/s $\color{#35bf28}+1.24\%$
test_func_call_runtime[True-eager] 1.0224ms 0.8933ms 1.1195 KOps/s 1.0935 KOps/s $\color{#35bf28}+2.38\%$
test_func_call_runtime[True-compile] 0.9183ms 0.7931ms 1.2609 KOps/s 1.2451 KOps/s $\color{#35bf28}+1.27\%$
test_func_call_runtime[True-compile-overhead] 0.4530ms 0.3740ms 2.6739 KOps/s 2.6671 KOps/s $\color{#35bf28}+0.25\%$
test_func_call_cm_runtime[False-eager] 0.8034ms 0.7314ms 1.3673 KOps/s 1.2667 KOps/s $\textbf{\color{#35bf28}+7.94\%}$
test_func_call_cm_runtime[False-compile] 0.8967ms 0.7770ms 1.2870 KOps/s 1.2602 KOps/s $\color{#35bf28}+2.13\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4539ms 0.3547ms 2.8192 KOps/s 2.8110 KOps/s $\color{#35bf28}+0.29\%$
test_func_call_cm_runtime[True-eager] 1.1744ms 1.0056ms 994.4637 Ops/s 986.7863 Ops/s $\color{#35bf28}+0.78\%$
test_func_call_cm_runtime[True-compile] 0.9289ms 0.8202ms 1.2192 KOps/s 1.1698 KOps/s $\color{#35bf28}+4.23\%$
test_func_call_cm_runtime[True-compile-overhead] 0.4530ms 0.4005ms 2.4968 KOps/s 2.4834 KOps/s $\color{#35bf28}+0.54\%$
test_vmap_func_call_cm_runtime[eager] 2.5819ms 2.1064ms 474.7479 Ops/s 473.1135 Ops/s $\color{#35bf28}+0.35\%$
test_vmap_func_call_cm_runtime[compile] 0.9035ms 0.8394ms 1.1913 KOps/s 1.1192 KOps/s $\textbf{\color{#35bf28}+6.44\%}$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4567ms 0.4032ms 2.4802 KOps/s 2.4718 KOps/s $\color{#35bf28}+0.34\%$
test_distributed 5.5627ms 0.3233ms 3.0927 KOps/s 8.8801 KOps/s $\textbf{\color{#d91a1a}-65.17\%}$
test_tdmodule 0.1332ms 16.6154μs 60.1852 KOps/s 63.6959 KOps/s $\textbf{\color{#d91a1a}-5.51\%}$
test_tdmodule_dispatch 68.7710μs 32.9397μs 30.3585 KOps/s 32.0603 KOps/s $\textbf{\color{#d91a1a}-5.31\%}$
test_tdseq 40.6810μs 17.5152μs 57.0932 KOps/s 59.7826 KOps/s $\color{#d91a1a}-4.50\%$
test_tdseq_dispatch 59.1010μs 35.1283μs 28.4671 KOps/s 29.2913 KOps/s $\color{#d91a1a}-2.81\%$
test_instantiation_functorch 2.0049ms 1.8400ms 543.4930 Ops/s 538.4400 Ops/s $\color{#35bf28}+0.94\%$
test_exec_functorch 0.2968ms 0.2024ms 4.9409 KOps/s 4.7798 KOps/s $\color{#35bf28}+3.37\%$
test_exec_functional_call 0.2656ms 0.2055ms 4.8673 KOps/s 4.8302 KOps/s $\color{#35bf28}+0.77\%$
test_exec_td_decorator 0.4403ms 0.2576ms 3.8826 KOps/s 3.8574 KOps/s $\color{#35bf28}+0.65\%$
test_vmap_mlp_speed_decorator[True-True] 0.8057ms 0.6903ms 1.4487 KOps/s 1.4508 KOps/s $\color{#d91a1a}-0.15\%$
test_vmap_mlp_speed_decorator[True-False] 0.8347ms 0.6981ms 1.4325 KOps/s 1.4519 KOps/s $\color{#d91a1a}-1.33\%$
test_vmap_mlp_speed_decorator[False-True] 0.7401ms 0.6064ms 1.6491 KOps/s 1.6580 KOps/s $\color{#d91a1a}-0.54\%$
test_vmap_mlp_speed_decorator[False-False] 0.6989ms 0.6182ms 1.6177 KOps/s 1.6515 KOps/s $\color{#d91a1a}-2.05\%$
test_vmap_transformer_speed_decorator[True-True] 20.3144ms 19.7261ms 50.6941 Ops/s 50.7759 Ops/s $\color{#d91a1a}-0.16\%$
test_vmap_transformer_speed_decorator[True-False] 20.4404ms 19.7449ms 50.6459 Ops/s 50.5623 Ops/s $\color{#35bf28}+0.17\%$
test_vmap_transformer_speed_decorator[False-True] 19.7108ms 19.5606ms 51.1232 Ops/s 50.9117 Ops/s $\color{#35bf28}+0.42\%$
test_vmap_transformer_speed_decorator[False-False] 20.3778ms 19.7071ms 50.7431 Ops/s 50.6922 Ops/s $\color{#35bf28}+0.10\%$
test_to_module_speed[True] 1.3897ms 0.9996ms 1.0004 KOps/s 992.5223 Ops/s $\color{#35bf28}+0.79\%$
test_to_module_speed[False] 1.4220ms 0.9699ms 1.0310 KOps/s 1.0179 KOps/s $\color{#35bf28}+1.29\%$
test_tc_init 67.0410μs 39.1855μs 25.5196 KOps/s 26.3361 KOps/s $\color{#d91a1a}-3.10\%$
test_tc_init_nested 0.1152ms 78.5995μs 12.7227 KOps/s 13.0666 KOps/s $\color{#d91a1a}-2.63\%$
test_tc_first_layer_tensor 4.6901μs 0.6729μs 1.4860 MOps/s 1.4843 MOps/s $\color{#35bf28}+0.12\%$
test_tc_first_layer_nontensor 18.4800μs 2.2583μs 442.8018 KOps/s 448.1177 KOps/s $\color{#d91a1a}-1.19\%$
test_tc_second_layer_tensor 10.2127μs 1.3633μs 733.5161 KOps/s 732.5715 KOps/s $\color{#35bf28}+0.13\%$
test_tc_second_layer_nontensor 32.8300μs 2.9307μs 341.2157 KOps/s 336.5638 KOps/s $\color{#35bf28}+1.38\%$
test_unbind 0.1885s 9.4041ms 106.3370 Ops/s 93.9671 Ops/s $\textbf{\color{#35bf28}+13.16\%}$
test_full_like 0.6553ms 0.5738ms 1.7429 KOps/s 1.7390 KOps/s $\color{#35bf28}+0.22\%$
test_zeros_like 0.2866ms 0.1979ms 5.0536 KOps/s 5.0513 KOps/s $\color{#35bf28}+0.05\%$
test_ones_like 0.2333ms 0.1977ms 5.0571 KOps/s 5.0554 KOps/s $\color{#35bf28}+0.03\%$
test_clone 0.4488ms 0.4147ms 2.4114 KOps/s 2.4119 KOps/s $\color{#d91a1a}-0.02\%$
test_squeeze 35.3510μs 9.7996μs 102.0455 KOps/s 102.3003 KOps/s $\color{#d91a1a}-0.25\%$
test_unsqueeze 0.2157ms 75.4801μs 13.2485 KOps/s 13.2780 KOps/s $\color{#d91a1a}-0.22\%$
test_split 0.3801ms 0.1531ms 6.5308 KOps/s 6.3988 KOps/s $\color{#35bf28}+2.06\%$
test_permute 0.2645ms 0.1804ms 5.5419 KOps/s 5.5729 KOps/s $\color{#d91a1a}-0.56\%$
test_stack 1.2640ms 0.8689ms 1.1509 KOps/s 1.1495 KOps/s $\color{#35bf28}+0.12\%$
test_cat 1.2669ms 1.2310ms 812.3335 Ops/s 812.1075 Ops/s $\color{#35bf28}+0.03\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants