Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix _foreach_copy_ for older versions of PT #1035

Merged
merged 1 commit into from
Oct 8, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Oct 8, 2024

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Oct 8, 2024
ghstack-source-id: 682b96483f0ffdad4ef8e7cdd35f133587c2c828
Pull Request resolved: #1035
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 8, 2024
@vmoens vmoens merged commit 24b1855 into gh/vmoens/28/base Oct 8, 2024
24 of 37 checks passed
@vmoens vmoens deleted the gh/vmoens/28/head branch October 8, 2024 12:33
Copy link

github-actions bot commented Oct 8, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 216. Improved: $\large\color{#35bf28}34$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 52.8800μs 23.5445μs 42.4728 KOps/s 40.9754 KOps/s $\color{#35bf28}+3.65\%$
test_plain_set_stack_nested 59.8920μs 23.6062μs 42.3617 KOps/s 39.8222 KOps/s $\textbf{\color{#35bf28}+6.38\%}$
test_plain_set_nested_inplace 65.4130μs 25.5057μs 39.2070 KOps/s 36.5333 KOps/s $\textbf{\color{#35bf28}+7.32\%}$
test_plain_set_stack_nested_inplace 90.0200μs 25.5773μs 39.0971 KOps/s 36.6362 KOps/s $\textbf{\color{#35bf28}+6.72\%}$
test_items 34.0140μs 4.0986μs 243.9850 KOps/s 214.6165 KOps/s $\textbf{\color{#35bf28}+13.68\%}$
test_items_nested 0.5990ms 0.3775ms 2.6490 KOps/s 2.6006 KOps/s $\color{#35bf28}+1.86\%$
test_items_nested_locked 0.5826ms 0.3807ms 2.6271 KOps/s 2.5826 KOps/s $\color{#35bf28}+1.72\%$
test_items_nested_leaf 0.1523ms 80.1237μs 12.4807 KOps/s 12.2991 KOps/s $\color{#35bf28}+1.48\%$
test_items_stack_nested 0.5711ms 0.3845ms 2.6006 KOps/s 2.5736 KOps/s $\color{#35bf28}+1.05\%$
test_items_stack_nested_leaf 0.1671ms 83.9236μs 11.9156 KOps/s 11.9296 KOps/s $\color{#d91a1a}-0.12\%$
test_items_stack_nested_locked 0.6217ms 0.3841ms 2.6038 KOps/s 2.5926 KOps/s $\color{#35bf28}+0.43\%$
test_keys 25.1670μs 3.6341μs 275.1734 KOps/s 275.3318 KOps/s $\color{#d91a1a}-0.06\%$
test_keys_nested 0.2510ms 0.1349ms 7.4124 KOps/s 7.1638 KOps/s $\color{#35bf28}+3.47\%$
test_keys_nested_locked 0.6918ms 0.1436ms 6.9620 KOps/s 6.9163 KOps/s $\color{#35bf28}+0.66\%$
test_keys_nested_leaf 0.1948ms 0.1179ms 8.4834 KOps/s 8.1121 KOps/s $\color{#35bf28}+4.58\%$
test_keys_stack_nested 0.2492ms 0.1363ms 7.3367 KOps/s 7.1401 KOps/s $\color{#35bf28}+2.75\%$
test_keys_stack_nested_leaf 0.1868ms 0.1201ms 8.3280 KOps/s 8.0927 KOps/s $\color{#35bf28}+2.91\%$
test_keys_stack_nested_locked 0.2611ms 0.1420ms 7.0406 KOps/s 6.8039 KOps/s $\color{#35bf28}+3.48\%$
test_values 8.5760μs 1.0437μs 958.1477 KOps/s 945.6531 KOps/s $\color{#35bf28}+1.32\%$
test_values_nested 0.1687ms 92.6996μs 10.7875 KOps/s 10.1541 KOps/s $\textbf{\color{#35bf28}+6.24\%}$
test_values_nested_locked 0.1688ms 93.6565μs 10.6773 KOps/s 10.4638 KOps/s $\color{#35bf28}+2.04\%$
test_values_nested_leaf 0.1464ms 78.8075μs 12.6891 KOps/s 12.2573 KOps/s $\color{#35bf28}+3.52\%$
test_values_stack_nested 0.1636ms 93.1914μs 10.7306 KOps/s 10.5700 KOps/s $\color{#35bf28}+1.52\%$
test_values_stack_nested_leaf 0.1616ms 78.3801μs 12.7583 KOps/s 12.2240 KOps/s $\color{#35bf28}+4.37\%$
test_values_stack_nested_locked 0.1594ms 94.6361μs 10.5668 KOps/s 10.3204 KOps/s $\color{#35bf28}+2.39\%$
test_membership 6.0499μs 0.7272μs 1.3751 MOps/s 1.1519 MOps/s $\textbf{\color{#35bf28}+19.38\%}$
test_membership_nested 28.7540μs 2.7318μs 366.0596 KOps/s 367.3263 KOps/s $\color{#d91a1a}-0.34\%$
test_membership_nested_leaf 28.3530μs 2.7204μs 367.5987 KOps/s 361.7287 KOps/s $\color{#35bf28}+1.62\%$
test_membership_stacked_nested 26.3590μs 2.7440μs 364.4311 KOps/s 367.2817 KOps/s $\color{#d91a1a}-0.78\%$
test_membership_stacked_nested_leaf 43.7820μs 2.7066μs 369.4620 KOps/s 363.5475 KOps/s $\color{#35bf28}+1.63\%$
test_membership_nested_last 30.8980μs 4.2620μs 234.6307 KOps/s 238.6715 KOps/s $\color{#d91a1a}-1.69\%$
test_membership_nested_leaf_last 50.3050μs 4.2124μs 237.3930 KOps/s 235.4393 KOps/s $\color{#35bf28}+0.83\%$
test_membership_stacked_nested_last 45.6160μs 5.8279μs 171.5871 KOps/s 201.0910 KOps/s $\textbf{\color{#d91a1a}-14.67\%}$
test_membership_stacked_nested_leaf_last 43.1210μs 5.8993μs 169.5125 KOps/s 198.3664 KOps/s $\textbf{\color{#d91a1a}-14.55\%}$
test_nested_getleaf 55.2340μs 10.5302μs 94.9645 KOps/s 93.9151 KOps/s $\color{#35bf28}+1.12\%$
test_nested_get 38.4920μs 10.0357μs 99.6439 KOps/s 99.3022 KOps/s $\color{#35bf28}+0.34\%$
test_stacked_getleaf 50.5550μs 10.5022μs 95.2179 KOps/s 94.8649 KOps/s $\color{#35bf28}+0.37\%$
test_stacked_get 35.7870μs 10.0645μs 99.3593 KOps/s 98.6879 KOps/s $\color{#35bf28}+0.68\%$
test_nested_getitemleaf 32.4810μs 10.9496μs 91.3276 KOps/s 91.3077 KOps/s $\color{#35bf28}+0.02\%$
test_nested_getitem 33.4330μs 10.3508μs 96.6107 KOps/s 96.7918 KOps/s $\color{#d91a1a}-0.19\%$
test_stacked_getitemleaf 29.7450μs 10.9078μs 91.6777 KOps/s 89.9944 KOps/s $\color{#35bf28}+1.87\%$
test_stacked_getitem 49.1230μs 10.2073μs 97.9693 KOps/s 96.8683 KOps/s $\color{#35bf28}+1.14\%$
test_lock_nested 88.9962ms 0.6058ms 1.6508 KOps/s 1.9551 KOps/s $\textbf{\color{#d91a1a}-15.56\%}$
test_lock_stack_nested 0.8422ms 0.4668ms 2.1420 KOps/s 2.0722 KOps/s $\color{#35bf28}+3.37\%$
test_unlock_nested 90.6623ms 0.5207ms 1.9205 KOps/s 2.3270 KOps/s $\textbf{\color{#d91a1a}-17.47\%}$
test_unlock_stack_nested 0.5251ms 0.3766ms 2.6554 KOps/s 2.5015 KOps/s $\textbf{\color{#35bf28}+6.16\%}$
test_flatten_speed 0.3155ms 0.1031ms 9.6948 KOps/s 9.9331 KOps/s $\color{#d91a1a}-2.40\%$
test_unflatten_speed 0.7335ms 0.5149ms 1.9422 KOps/s 1.9364 KOps/s $\color{#35bf28}+0.30\%$
test_common_ops 3.8587ms 1.1047ms 905.2573 Ops/s 870.6569 Ops/s $\color{#35bf28}+3.97\%$
test_creation 74.9210μs 2.0569μs 486.1687 KOps/s 478.9606 KOps/s $\color{#35bf28}+1.50\%$
test_creation_empty 48.4810μs 17.6140μs 56.7731 KOps/s 51.7927 KOps/s $\textbf{\color{#35bf28}+9.62\%}$
test_creation_nested_1 59.2110μs 21.2707μs 47.0130 KOps/s 43.9139 KOps/s $\textbf{\color{#35bf28}+7.06\%}$
test_creation_nested_2 73.8290μs 25.5743μs 39.1018 KOps/s 37.4714 KOps/s $\color{#35bf28}+4.35\%$
test_clone 0.1531ms 17.0243μs 58.7396 KOps/s 57.7807 KOps/s $\color{#35bf28}+1.66\%$
test_getitem[int] 1.0153ms 16.3852μs 61.0307 KOps/s 57.9631 KOps/s $\textbf{\color{#35bf28}+5.29\%}$
test_getitem[slice_int] 0.1350ms 29.7008μs 33.6692 KOps/s 31.4830 KOps/s $\textbf{\color{#35bf28}+6.94\%}$
test_getitem[range] 0.5151ms 57.7821μs 17.3064 KOps/s 17.2950 KOps/s $\color{#35bf28}+0.07\%$
test_getitem[tuple] 0.1611ms 25.1329μs 39.7885 KOps/s 38.2233 KOps/s $\color{#35bf28}+4.09\%$
test_getitem[list] 0.1794ms 51.8156μs 19.2992 KOps/s 18.8411 KOps/s $\color{#35bf28}+2.43\%$
test_setitem_dim[int] 68.2080μs 31.7322μs 31.5138 KOps/s 30.2722 KOps/s $\color{#35bf28}+4.10\%$
test_setitem_dim[slice_int] 95.8810μs 58.7430μs 17.0233 KOps/s 15.8141 KOps/s $\textbf{\color{#35bf28}+7.65\%}$
test_setitem_dim[range] 0.1299ms 82.3134μs 12.1487 KOps/s 12.0315 KOps/s $\color{#35bf28}+0.97\%$
test_setitem_dim[tuple] 84.3780μs 47.6539μs 20.9846 KOps/s 20.1249 KOps/s $\color{#35bf28}+4.27\%$
test_setitem 85.4610μs 29.4237μs 33.9862 KOps/s 33.2186 KOps/s $\color{#35bf28}+2.31\%$
test_set 0.2562ms 28.8338μs 34.6815 KOps/s 33.8476 KOps/s $\color{#35bf28}+2.46\%$
test_set_shared 3.1137ms 0.2151ms 4.6482 KOps/s 4.6240 KOps/s $\color{#35bf28}+0.52\%$
test_update 0.2789ms 36.6901μs 27.2553 KOps/s 25.7331 KOps/s $\textbf{\color{#35bf28}+5.92\%}$
test_update_nested 0.1719ms 47.2561μs 21.1613 KOps/s 20.0760 KOps/s $\textbf{\color{#35bf28}+5.41\%}$
test_update__nested 0.9757ms 44.9865μs 22.2289 KOps/s 22.1386 KOps/s $\color{#35bf28}+0.41\%$
test_set_nested 0.2562ms 32.7204μs 30.5619 KOps/s 31.2523 KOps/s $\color{#d91a1a}-2.21\%$
test_set_nested_new 0.2499ms 36.9826μs 27.0397 KOps/s 27.2985 KOps/s $\color{#d91a1a}-0.95\%$
test_select 0.2444ms 53.8883μs 18.5569 KOps/s 17.9964 KOps/s $\color{#35bf28}+3.11\%$
test_select_nested 0.1175ms 59.8319μs 16.7135 KOps/s 16.6925 KOps/s $\color{#35bf28}+0.13\%$
test_exclude_nested 0.1450ms 75.4418μs 13.2553 KOps/s 13.2511 KOps/s $\color{#35bf28}+0.03\%$
test_empty[True] 0.4772ms 0.3534ms 2.8299 KOps/s 2.8191 KOps/s $\color{#35bf28}+0.38\%$
test_empty[False] 6.6084μs 1.1885μs 841.3719 KOps/s 793.5746 KOps/s $\textbf{\color{#35bf28}+6.02\%}$
test_unbind_speed 0.4077ms 0.3074ms 3.2526 KOps/s 3.2169 KOps/s $\color{#35bf28}+1.11\%$
test_unbind_speed_stack0 0.4291ms 0.2912ms 3.4337 KOps/s 3.2457 KOps/s $\textbf{\color{#35bf28}+5.79\%}$
test_unbind_speed_stack1 0.1012s 0.7958ms 1.2565 KOps/s 1.3309 KOps/s $\textbf{\color{#d91a1a}-5.59\%}$
test_split 3.1298ms 1.9929ms 501.7789 Ops/s 435.0395 Ops/s $\textbf{\color{#35bf28}+15.34\%}$
test_chunk 93.3572ms 2.1852ms 457.6277 Ops/s 436.1655 Ops/s $\color{#35bf28}+4.92\%$
test_creation[device0] 0.2330ms 0.1144ms 8.7401 KOps/s 8.4894 KOps/s $\color{#35bf28}+2.95\%$
test_creation_from_tensor 4.8757ms 0.1162ms 8.6078 KOps/s 8.3873 KOps/s $\color{#35bf28}+2.63\%$
test_add_one[memmap_tensor0] 0.2311ms 6.9211μs 144.4856 KOps/s 134.1030 KOps/s $\textbf{\color{#35bf28}+7.74\%}$
test_contiguous[memmap_tensor0] 28.1530μs 1.8648μs 536.2458 KOps/s 522.0150 KOps/s $\color{#35bf28}+2.73\%$
test_stack[memmap_tensor0] 63.9500μs 5.4033μs 185.0729 KOps/s 172.5575 KOps/s $\textbf{\color{#35bf28}+7.25\%}$
test_memmaptd_index 1.2009ms 0.4107ms 2.4348 KOps/s 2.3665 KOps/s $\color{#35bf28}+2.89\%$
test_memmaptd_index_astensor 0.7984ms 0.5091ms 1.9642 KOps/s 1.9003 KOps/s $\color{#35bf28}+3.36\%$
test_memmaptd_index_op 1.6331ms 1.0091ms 991.0137 Ops/s 925.6551 Ops/s $\textbf{\color{#35bf28}+7.06\%}$
test_serialize_model 0.2160s 0.1290s 7.7490 Ops/s 8.3823 Ops/s $\textbf{\color{#d91a1a}-7.56\%}$
test_serialize_model_pickle 0.4987s 0.4072s 2.4557 Ops/s 2.5857 Ops/s $\textbf{\color{#d91a1a}-5.03\%}$
test_serialize_weights 0.1294s 0.1166s 8.5775 Ops/s 7.5196 Ops/s $\textbf{\color{#35bf28}+14.07\%}$
test_serialize_weights_returnearly 0.2489s 0.1715s 5.8317 Ops/s 6.2534 Ops/s $\textbf{\color{#d91a1a}-6.74\%}$
test_serialize_weights_pickle 0.5001s 0.4159s 2.4046 Ops/s 2.3993 Ops/s $\color{#35bf28}+0.22\%$
test_serialize_weights_filesystem 0.1508s 0.1439s 6.9492 Ops/s 6.9915 Ops/s $\color{#d91a1a}-0.61\%$
test_serialize_model_filesystem 0.1608s 0.1525s 6.5594 Ops/s 6.1106 Ops/s $\textbf{\color{#35bf28}+7.34\%}$
test_reshape_pytree 0.1138ms 38.6713μs 25.8590 KOps/s 26.0610 KOps/s $\color{#d91a1a}-0.78\%$
test_reshape_td 0.1022ms 48.7562μs 20.5102 KOps/s 20.7172 KOps/s $\color{#d91a1a}-1.00\%$
test_view_pytree 0.1001ms 37.9878μs 26.3242 KOps/s 25.9792 KOps/s $\color{#35bf28}+1.33\%$
test_view_td 0.1074ms 54.1056μs 18.4824 KOps/s 19.2610 KOps/s $\color{#d91a1a}-4.04\%$
test_unbind_pytree 75.5420μs 35.3564μs 28.2834 KOps/s 27.3340 KOps/s $\color{#35bf28}+3.47\%$
test_unbind_td 0.3266ms 46.8382μs 21.3501 KOps/s 21.6995 KOps/s $\color{#d91a1a}-1.61\%$
test_split_pytree 81.2730μs 37.3479μs 26.7753 KOps/s 26.3193 KOps/s $\color{#35bf28}+1.73\%$
test_split_td 0.2043ms 57.5127μs 17.3875 KOps/s 16.5817 KOps/s $\color{#35bf28}+4.86\%$
test_add_pytree 93.6870μs 43.9351μs 22.7609 KOps/s 22.0571 KOps/s $\color{#35bf28}+3.19\%$
test_add_td 0.2280ms 85.2647μs 11.7282 KOps/s 11.4311 KOps/s $\color{#35bf28}+2.60\%$
test_compile_add_one_nested[tensordict-compile] 0.1384ms 58.2395μs 17.1705 KOps/s 17.4359 KOps/s $\color{#d91a1a}-1.52\%$
test_compile_add_one_nested[tensordict-eager] 0.4305ms 0.1980ms 5.0515 KOps/s 5.1825 KOps/s $\color{#d91a1a}-2.53\%$
test_compile_add_one_nested[pytree-compile] 0.1512ms 57.1783μs 17.4891 KOps/s 17.8618 KOps/s $\color{#d91a1a}-2.09\%$
test_compile_add_one_nested[pytree-eager] 0.2656ms 0.1376ms 7.2650 KOps/s 7.0667 KOps/s $\color{#35bf28}+2.81\%$
test_compile_copy_nested[tensordict-compile] 61.0950μs 23.9636μs 41.7299 KOps/s 42.9859 KOps/s $\color{#d91a1a}-2.92\%$
test_compile_copy_nested[tensordict-eager] 0.1554ms 75.2659μs 13.2862 KOps/s 13.3175 KOps/s $\color{#d91a1a}-0.23\%$
test_compile_copy_nested[pytree-compile] 0.1479ms 74.2219μs 13.4731 KOps/s 13.4254 KOps/s $\color{#35bf28}+0.36\%$
test_compile_copy_nested[pytree-eager] 0.1251ms 67.8824μs 14.7314 KOps/s 14.5859 KOps/s $\color{#35bf28}+1.00\%$
test_compile_add_one_flat[tensordict-compile] 0.3685ms 0.1810ms 5.5242 KOps/s 5.5297 KOps/s $\color{#d91a1a}-0.10\%$
test_compile_add_one_flat[tensordict-eager] 0.3907ms 0.2399ms 4.1692 KOps/s 4.1640 KOps/s $\color{#35bf28}+0.13\%$
test_compile_add_one_flat[tensorclass-compile] 0.1016ms 47.4678μs 21.0669 KOps/s 21.2365 KOps/s $\color{#d91a1a}-0.80\%$
test_compile_add_one_flat[tensorclass-eager] 0.4136ms 76.7435μs 13.0304 KOps/s 12.7776 KOps/s $\color{#35bf28}+1.98\%$
test_compile_add_one_flat[pytree-compile] 0.2739ms 0.1734ms 5.7676 KOps/s 5.7290 KOps/s $\color{#35bf28}+0.67\%$
test_compile_add_one_flat[pytree-eager] 0.4689ms 0.2777ms 3.6014 KOps/s 3.4333 KOps/s $\color{#35bf28}+4.90\%$
test_compile_add_self_flat[tensordict-eager] 0.4646ms 0.2757ms 3.6276 KOps/s 3.5951 KOps/s $\color{#35bf28}+0.90\%$
test_compile_add_self_flat[tensordict-compile] 0.3781ms 0.1802ms 5.5508 KOps/s 5.5785 KOps/s $\color{#d91a1a}-0.50\%$
test_compile_add_self_flat[tensorclass-eager] 0.1801ms 73.3073μs 13.6412 KOps/s 13.5762 KOps/s $\color{#35bf28}+0.48\%$
test_compile_add_self_flat[tensorclass-compile] 0.1582ms 47.9641μs 20.8489 KOps/s 20.9809 KOps/s $\color{#d91a1a}-0.63\%$
test_compile_add_self_flat[pytree-eager] 0.4585ms 0.2295ms 4.3577 KOps/s 4.3225 KOps/s $\color{#35bf28}+0.81\%$
test_compile_add_self_flat[pytree-compile] 0.3797ms 0.1723ms 5.8035 KOps/s 5.6850 KOps/s $\color{#35bf28}+2.08\%$
test_compile_copy_flat[tensordict-compile] 0.2596ms 0.1092ms 9.1582 KOps/s 8.8344 KOps/s $\color{#35bf28}+3.66\%$
test_compile_copy_flat[tensordict-eager] 0.1939ms 78.2738μs 12.7757 KOps/s 11.9726 KOps/s $\textbf{\color{#35bf28}+6.71\%}$
test_compile_copy_flat[pytree-compile] 0.1459ms 77.2340μs 12.9477 KOps/s 13.2093 KOps/s $\color{#d91a1a}-1.98\%$
test_compile_copy_flat[pytree-eager] 0.1276ms 68.2885μs 14.6438 KOps/s 14.5260 KOps/s $\color{#35bf28}+0.81\%$
test_compile_assign_and_add[tensordict-compile] 0.4013ms 0.1981ms 5.0473 KOps/s 5.1815 KOps/s $\color{#d91a1a}-2.59\%$
test_compile_assign_and_add[tensordict-eager] 1.9826ms 1.7012ms 587.8192 Ops/s 567.8482 Ops/s $\color{#35bf28}+3.52\%$
test_compile_assign_and_add[pytree-compile] 0.3985ms 0.1956ms 5.1125 KOps/s 5.2560 KOps/s $\color{#d91a1a}-2.73\%$
test_compile_assign_and_add[pytree-eager] 1.7908ms 1.1071ms 903.2470 Ops/s 887.1507 Ops/s $\color{#35bf28}+1.81\%$
test_compile_assign_and_add_stack[compile] 0.5352ms 0.4204ms 2.3789 KOps/s 2.3854 KOps/s $\color{#d91a1a}-0.27\%$
test_compile_assign_and_add_stack[eager] 4.1010ms 3.8754ms 258.0404 Ops/s 244.4281 Ops/s $\textbf{\color{#35bf28}+5.57\%}$
test_compile_indexing[tensor-tensordict-compile] 83.2570μs 34.2601μs 29.1885 KOps/s 29.7841 KOps/s $\color{#d91a1a}-2.00\%$
test_compile_indexing[tensor-tensordict-eager] 0.6434ms 46.4505μs 21.5283 KOps/s 20.5776 KOps/s $\color{#35bf28}+4.62\%$
test_compile_indexing[tensor-tensorclass-compile] 4.8755ms 29.9281μs 33.4134 KOps/s 34.2630 KOps/s $\color{#d91a1a}-2.48\%$
test_compile_indexing[tensor-tensorclass-eager] 84.0080μs 27.9204μs 35.8161 KOps/s 35.2769 KOps/s $\color{#35bf28}+1.53\%$
test_compile_indexing[tensor-pytree-compile] 81.1020μs 29.6591μs 33.7165 KOps/s 34.8974 KOps/s $\color{#d91a1a}-3.38\%$
test_compile_indexing[tensor-pytree-eager] 0.1124ms 27.4074μs 36.4866 KOps/s 35.5108 KOps/s $\color{#35bf28}+2.75\%$
test_compile_indexing[slice-tensordict-compile] 0.1410ms 72.9023μs 13.7170 KOps/s 13.6028 KOps/s $\color{#35bf28}+0.84\%$
test_compile_indexing[slice-tensordict-eager] 0.5500ms 27.5079μs 36.3532 KOps/s 35.1045 KOps/s $\color{#35bf28}+3.56\%$
test_compile_indexing[slice-tensorclass-compile] 0.1294ms 66.5336μs 15.0300 KOps/s 14.8106 KOps/s $\color{#35bf28}+1.48\%$
test_compile_indexing[slice-tensorclass-eager] 61.8360μs 22.9128μs 43.6438 KOps/s 43.7330 KOps/s $\color{#d91a1a}-0.20\%$
test_compile_indexing[slice-pytree-compile] 0.1533ms 66.6310μs 15.0080 KOps/s 14.9089 KOps/s $\color{#35bf28}+0.67\%$
test_compile_indexing[slice-pytree-eager] 81.9840μs 22.9614μs 43.5514 KOps/s 43.7475 KOps/s $\color{#d91a1a}-0.45\%$
test_compile_indexing[int-tensordict-compile] 0.1411ms 72.3422μs 13.8232 KOps/s 13.5899 KOps/s $\color{#35bf28}+1.72\%$
test_compile_indexing[int-tensordict-eager] 0.9131ms 27.5132μs 36.3461 KOps/s 35.2534 KOps/s $\color{#35bf28}+3.10\%$
test_compile_indexing[int-tensorclass-compile] 0.1428ms 66.3083μs 15.0811 KOps/s 14.7630 KOps/s $\color{#35bf28}+2.15\%$
test_compile_indexing[int-tensorclass-eager] 91.8920μs 22.8708μs 43.7239 KOps/s 43.3926 KOps/s $\color{#35bf28}+0.76\%$
test_compile_indexing[int-pytree-compile] 0.1432ms 66.0903μs 15.1308 KOps/s 14.4321 KOps/s $\color{#35bf28}+4.84\%$
test_compile_indexing[int-pytree-eager] 0.1133ms 22.7839μs 43.8906 KOps/s 43.7835 KOps/s $\color{#35bf28}+0.24\%$
test_mod_add[eager] 0.1054ms 24.1882μs 41.3425 KOps/s 39.3816 KOps/s $\color{#35bf28}+4.98\%$
test_mod_add[compile] 0.1073ms 37.4242μs 26.7207 KOps/s 26.3768 KOps/s $\color{#35bf28}+1.30\%$
test_mod_add[compile-overhead] 86.5130μs 37.7638μs 26.4804 KOps/s 27.0643 KOps/s $\color{#d91a1a}-2.16\%$
test_mod_wrap[eager] 0.3347ms 0.2080ms 4.8068 KOps/s 4.7259 KOps/s $\color{#35bf28}+1.71\%$
test_mod_wrap[compile] 0.3753ms 0.2325ms 4.3011 KOps/s 4.2919 KOps/s $\color{#35bf28}+0.22\%$
test_mod_wrap[compile-overhead] 0.3345ms 0.2305ms 4.3386 KOps/s 4.3312 KOps/s $\color{#35bf28}+0.17\%$
test_mod_wrap_and_backward[eager] 12.3936ms 10.7155ms 93.3230 Ops/s 86.4881 Ops/s $\textbf{\color{#35bf28}+7.90\%}$
test_mod_wrap_and_backward[compile] 12.5517ms 10.6348ms 94.0309 Ops/s 80.7215 Ops/s $\textbf{\color{#35bf28}+16.49\%}$
test_mod_wrap_and_backward[compile-overhead] 11.5605ms 10.6760ms 93.6678 Ops/s 78.2329 Ops/s $\textbf{\color{#35bf28}+19.73\%}$
test_seq_add[eager] 0.2769ms 89.8437μs 11.1304 KOps/s 10.8707 KOps/s $\color{#35bf28}+2.39\%$
test_seq_add[compile] 0.1865ms 63.3898μs 15.7754 KOps/s 15.5066 KOps/s $\color{#35bf28}+1.73\%$
test_seq_add[compile-overhead] 0.1615ms 61.8316μs 16.1730 KOps/s 15.9605 KOps/s $\color{#35bf28}+1.33\%$
test_seq_wrap[eager] 1.0108ms 0.3797ms 2.6338 KOps/s 2.5352 KOps/s $\color{#35bf28}+3.89\%$
test_seq_wrap[compile] 0.4119ms 0.2664ms 3.7537 KOps/s 3.6813 KOps/s $\color{#35bf28}+1.97\%$
test_seq_wrap[compile-overhead] 0.5166ms 0.2698ms 3.7067 KOps/s 3.6683 KOps/s $\color{#35bf28}+1.05\%$
test_func_call_runtime[False-eager] 0.8943ms 0.5263ms 1.9000 KOps/s 1.8735 KOps/s $\color{#35bf28}+1.41\%$
test_func_call_runtime[False-compile] 0.6168ms 0.4982ms 2.0072 KOps/s 1.9695 KOps/s $\color{#35bf28}+1.91\%$
test_func_call_runtime[False-compile-overhead] 0.6372ms 0.4991ms 2.0035 KOps/s 1.9400 KOps/s $\color{#35bf28}+3.28\%$
test_func_call_runtime[True-eager] 0.8771ms 0.7375ms 1.3560 KOps/s 1.3336 KOps/s $\color{#35bf28}+1.68\%$
test_func_call_runtime[True-compile] 1.0889ms 0.5175ms 1.9322 KOps/s 1.9371 KOps/s $\color{#d91a1a}-0.25\%$
test_func_call_runtime[True-compile-overhead] 0.6270ms 0.5155ms 1.9398 KOps/s 1.9274 KOps/s $\color{#35bf28}+0.64\%$
test_func_call_cm_runtime[False-eager] 0.8172ms 0.5275ms 1.8958 KOps/s 1.8904 KOps/s $\color{#35bf28}+0.29\%$
test_func_call_cm_runtime[False-compile] 1.0801ms 0.5035ms 1.9860 KOps/s 1.9620 KOps/s $\color{#35bf28}+1.22\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5823ms 0.5017ms 1.9930 KOps/s 1.9579 KOps/s $\color{#35bf28}+1.80\%$
test_func_call_cm_runtime[True-eager] 1.2377ms 0.9093ms 1.0997 KOps/s 1.0860 KOps/s $\color{#35bf28}+1.26\%$
test_func_call_cm_runtime[True-compile] 0.9060ms 0.7497ms 1.3338 KOps/s 1.3168 KOps/s $\color{#35bf28}+1.29\%$
test_func_call_cm_runtime[True-compile-overhead] 1.0720ms 0.7589ms 1.3177 KOps/s 1.2953 KOps/s $\color{#35bf28}+1.73\%$
test_vmap_func_call_cm_runtime[eager] 3.5418ms 1.9432ms 514.6263 Ops/s 504.8087 Ops/s $\color{#35bf28}+1.94\%$
test_vmap_func_call_cm_runtime[compile] 2.9311ms 2.0044ms 498.9140 Ops/s 489.9636 Ops/s $\color{#35bf28}+1.83\%$
test_vmap_func_call_cm_runtime[compile-overhead] 2.6846ms 1.9943ms 501.4372 Ops/s 488.2073 Ops/s $\color{#35bf28}+2.71\%$
test_distributed 0.2797ms 0.1276ms 7.8364 KOps/s 7.6358 KOps/s $\color{#35bf28}+2.63\%$
test_tdmodule 31.2190μs 17.5192μs 57.0804 KOps/s 51.4164 KOps/s $\textbf{\color{#35bf28}+11.02\%}$
test_tdmodule_dispatch 56.3960μs 34.8917μs 28.6601 KOps/s 26.7990 KOps/s $\textbf{\color{#35bf28}+6.94\%}$
test_tdseq 39.3440μs 20.2508μs 49.3808 KOps/s 45.8632 KOps/s $\textbf{\color{#35bf28}+7.67\%}$
test_tdseq_dispatch 80.9230μs 40.2942μs 24.8175 KOps/s 23.4576 KOps/s $\textbf{\color{#35bf28}+5.80\%}$
test_instantiation_functorch 1.6797ms 1.5385ms 649.9991 Ops/s 624.5841 Ops/s $\color{#35bf28}+4.07\%$
test_exec_functorch 0.3189ms 0.1816ms 5.5058 KOps/s 5.3165 KOps/s $\color{#35bf28}+3.56\%$
test_exec_functional_call 0.3976ms 0.1727ms 5.7919 KOps/s 5.5966 KOps/s $\color{#35bf28}+3.49\%$
test_exec_td_decorator 0.5318ms 0.2340ms 4.2740 KOps/s 4.1758 KOps/s $\color{#35bf28}+2.35\%$
test_vmap_mlp_speed_decorator[True-True] 0.9642ms 0.6475ms 1.5445 KOps/s 1.5305 KOps/s $\color{#35bf28}+0.91\%$
test_vmap_mlp_speed_decorator[True-False] 0.9376ms 0.6467ms 1.5463 KOps/s 1.5106 KOps/s $\color{#35bf28}+2.37\%$
test_vmap_mlp_speed_decorator[False-True] 0.7186ms 0.5330ms 1.8761 KOps/s 1.8348 KOps/s $\color{#35bf28}+2.25\%$
test_vmap_mlp_speed_decorator[False-False] 0.8697ms 0.5353ms 1.8679 KOps/s 1.8528 KOps/s $\color{#35bf28}+0.82\%$
test_to_module_speed[True] 2.3320ms 1.4107ms 708.8656 Ops/s 699.8337 Ops/s $\color{#35bf28}+1.29\%$
test_to_module_speed[False] 1.9965ms 1.3653ms 732.4277 Ops/s 720.0742 Ops/s $\color{#35bf28}+1.72\%$
test_tc_init 92.8450μs 45.4230μs 22.0153 KOps/s 20.6985 KOps/s $\textbf{\color{#35bf28}+6.36\%}$
test_tc_init_nested 0.1589ms 90.6594μs 11.0303 KOps/s 10.5114 KOps/s $\color{#35bf28}+4.94\%$
test_tc_first_layer_tensor 15.7890μs 1.5974μs 626.0218 KOps/s 615.9985 KOps/s $\color{#35bf28}+1.63\%$
test_tc_first_layer_nontensor 23.2740μs 4.7056μs 212.5119 KOps/s 205.6748 KOps/s $\color{#35bf28}+3.32\%$
test_tc_second_layer_tensor 40.6570μs 2.8598μs 349.6726 KOps/s 340.0885 KOps/s $\color{#35bf28}+2.82\%$
test_tc_second_layer_nontensor 52.0780μs 6.0911μs 164.1742 KOps/s 162.5731 KOps/s $\color{#35bf28}+0.98\%$
test_unbind 0.4750s 13.5066ms 74.0380 Ops/s 75.1760 Ops/s $\color{#d91a1a}-1.51\%$
test_full_like 8.5786ms 7.4698ms 133.8720 Ops/s 141.3096 Ops/s $\textbf{\color{#d91a1a}-5.26\%}$
test_zeros_like 3.3634ms 2.7703ms 360.9698 Ops/s 363.3751 Ops/s $\color{#d91a1a}-0.66\%$
test_ones_like 4.1743ms 3.2060ms 311.9197 Ops/s 144.7759 Ops/s $\textbf{\color{#35bf28}+115.45\%}$
test_clone 5.4791ms 4.9492ms 202.0528 Ops/s 116.7959 Ops/s $\textbf{\color{#35bf28}+73.00\%}$
test_squeeze 64.9130μs 12.3877μs 80.7253 KOps/s 80.8775 KOps/s $\color{#d91a1a}-0.19\%$
test_unsqueeze 0.3571ms 94.5528μs 10.5761 KOps/s 10.6916 KOps/s $\color{#d91a1a}-1.08\%$
test_split 0.3912ms 0.1927ms 5.1894 KOps/s 4.9619 KOps/s $\color{#35bf28}+4.58\%$
test_permute 0.4011ms 0.2236ms 4.4718 KOps/s 4.4253 KOps/s $\color{#35bf28}+1.05\%$
test_stack 31.1722ms 24.6929ms 40.4975 Ops/s 40.7536 Ops/s $\color{#d91a1a}-0.63\%$
test_cat 29.0726ms 24.5817ms 40.6806 Ops/s 41.6093 Ops/s $\color{#d91a1a}-2.23\%$

Copy link

github-actions bot commented Oct 8, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 218. Improved: $\large\color{#35bf28}21$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1476ms 16.4937μs 60.6294 KOps/s 57.1575 KOps/s $\textbf{\color{#35bf28}+6.07\%}$
test_plain_set_stack_nested 37.2900μs 16.5780μs 60.3209 KOps/s 57.0105 KOps/s $\textbf{\color{#35bf28}+5.81\%}$
test_plain_set_nested_inplace 47.6200μs 17.6462μs 56.6695 KOps/s 53.8312 KOps/s $\textbf{\color{#35bf28}+5.27\%}$
test_plain_set_stack_nested_inplace 50.3110μs 17.5804μs 56.8814 KOps/s 53.3699 KOps/s $\textbf{\color{#35bf28}+6.58\%}$
test_items 21.4800μs 2.8458μs 351.3926 KOps/s 348.0382 KOps/s $\color{#35bf28}+0.96\%$
test_items_nested 0.3729ms 0.3382ms 2.9566 KOps/s 2.9438 KOps/s $\color{#35bf28}+0.44\%$
test_items_nested_locked 0.3833ms 0.3396ms 2.9449 KOps/s 2.9674 KOps/s $\color{#d91a1a}-0.76\%$
test_items_nested_leaf 91.7120μs 62.1810μs 16.0821 KOps/s 15.9196 KOps/s $\color{#35bf28}+1.02\%$
test_items_stack_nested 0.4219ms 0.3398ms 2.9425 KOps/s 2.9236 KOps/s $\color{#35bf28}+0.65\%$
test_items_stack_nested_leaf 91.3120μs 62.8614μs 15.9080 KOps/s 15.5087 KOps/s $\color{#35bf28}+2.57\%$
test_items_stack_nested_locked 0.3737ms 0.3434ms 2.9119 KOps/s 2.9123 KOps/s $\color{#d91a1a}-0.01\%$
test_keys 25.1410μs 3.3987μs 294.2317 KOps/s 289.6323 KOps/s $\color{#35bf28}+1.59\%$
test_keys_nested 98.8520μs 71.0264μs 14.0793 KOps/s 13.9689 KOps/s $\color{#35bf28}+0.79\%$
test_keys_nested_locked 2.5988ms 76.9714μs 12.9918 KOps/s 12.9071 KOps/s $\color{#35bf28}+0.66\%$
test_keys_nested_leaf 94.6110μs 61.9635μs 16.1385 KOps/s 16.1277 KOps/s $\color{#35bf28}+0.07\%$
test_keys_stack_nested 96.6310μs 70.6640μs 14.1515 KOps/s 14.2790 KOps/s $\color{#d91a1a}-0.89\%$
test_keys_stack_nested_leaf 88.4610μs 61.2269μs 16.3327 KOps/s 16.2387 KOps/s $\color{#35bf28}+0.58\%$
test_keys_stack_nested_locked 0.1077ms 77.1728μs 12.9579 KOps/s 13.0060 KOps/s $\color{#d91a1a}-0.37\%$
test_values 4.1467μs 0.8362μs 1.1958 MOps/s 1.1568 MOps/s $\color{#35bf28}+3.38\%$
test_values_nested 78.7720μs 48.5228μs 20.6089 KOps/s 20.4514 KOps/s $\color{#35bf28}+0.77\%$
test_values_nested_locked 84.2320μs 49.8793μs 20.0484 KOps/s 19.8734 KOps/s $\color{#35bf28}+0.88\%$
test_values_nested_leaf 69.3210μs 42.6768μs 23.4319 KOps/s 23.3726 KOps/s $\color{#35bf28}+0.25\%$
test_values_stack_nested 86.1010μs 49.0237μs 20.3983 KOps/s 20.0809 KOps/s $\color{#35bf28}+1.58\%$
test_values_stack_nested_leaf 66.5710μs 43.1082μs 23.1974 KOps/s 23.1062 KOps/s $\color{#35bf28}+0.39\%$
test_values_stack_nested_locked 80.2520μs 50.3434μs 19.8636 KOps/s 19.3857 KOps/s $\color{#35bf28}+2.47\%$
test_membership 1.7145μs 0.5007μs 1.9971 MOps/s 1.9645 MOps/s $\color{#35bf28}+1.66\%$
test_membership_nested 16.5005μs 1.8925μs 528.3984 KOps/s 507.3444 KOps/s $\color{#35bf28}+4.15\%$
test_membership_nested_leaf 14.9600μs 1.8971μs 527.1171 KOps/s 538.5882 KOps/s $\color{#d91a1a}-2.13\%$
test_membership_stacked_nested 36.2510μs 1.9316μs 517.6935 KOps/s 520.1716 KOps/s $\color{#d91a1a}-0.48\%$
test_membership_stacked_nested_leaf 27.4900μs 1.9624μs 509.5718 KOps/s 518.6846 KOps/s $\color{#d91a1a}-1.76\%$
test_membership_nested_last 36.1300μs 2.9862μs 334.8743 KOps/s 336.2173 KOps/s $\color{#d91a1a}-0.40\%$
test_membership_nested_leaf_last 23.8100μs 3.0051μs 332.7637 KOps/s 328.6513 KOps/s $\color{#35bf28}+1.25\%$
test_membership_stacked_nested_last 26.6900μs 3.0007μs 333.2570 KOps/s 160.5686 KOps/s $\textbf{\color{#35bf28}+107.55\%}$
test_membership_stacked_nested_leaf_last 28.8800μs 2.9598μs 337.8610 KOps/s 160.9732 KOps/s $\textbf{\color{#35bf28}+109.89\%}$
test_nested_getleaf 35.3710μs 6.0377μs 165.6267 KOps/s 162.5076 KOps/s $\color{#35bf28}+1.92\%$
test_nested_get 37.0910μs 5.6267μs 177.7237 KOps/s 172.4554 KOps/s $\color{#35bf28}+3.05\%$
test_stacked_getleaf 37.6200μs 5.9844μs 167.1001 KOps/s 165.2412 KOps/s $\color{#35bf28}+1.13\%$
test_stacked_get 24.8700μs 5.5846μs 179.0636 KOps/s 174.5539 KOps/s $\color{#35bf28}+2.58\%$
test_nested_getitemleaf 32.6710μs 6.0719μs 164.6935 KOps/s 162.5136 KOps/s $\color{#35bf28}+1.34\%$
test_nested_getitem 32.0100μs 5.6400μs 177.3046 KOps/s 175.4921 KOps/s $\color{#35bf28}+1.03\%$
test_stacked_getitemleaf 44.8910μs 6.1537μs 162.5048 KOps/s 164.3897 KOps/s $\color{#d91a1a}-1.15\%$
test_stacked_getitem 0.9800ms 5.6126μs 178.1703 KOps/s 174.5915 KOps/s $\color{#35bf28}+2.05\%$
test_lock_nested 7.0441ms 0.4327ms 2.3111 KOps/s 2.3030 KOps/s $\color{#35bf28}+0.35\%$
test_lock_stack_nested 0.4730ms 0.3918ms 2.5520 KOps/s 2.5780 KOps/s $\color{#d91a1a}-1.01\%$
test_unlock_nested 0.7924ms 0.3642ms 2.7455 KOps/s 2.7061 KOps/s $\color{#35bf28}+1.46\%$
test_unlock_stack_nested 0.3784ms 0.3306ms 3.0252 KOps/s 3.0724 KOps/s $\color{#d91a1a}-1.54\%$
test_flatten_speed 0.1557ms 75.9714μs 13.1628 KOps/s 12.9415 KOps/s $\color{#35bf28}+1.71\%$
test_unflatten_speed 0.3625ms 0.3201ms 3.1241 KOps/s 3.1505 KOps/s $\color{#d91a1a}-0.84\%$
test_common_ops 1.5908ms 1.2767ms 783.2405 Ops/s 776.0090 Ops/s $\color{#35bf28}+0.93\%$
test_creation 27.4910μs 1.4597μs 685.0593 KOps/s 674.2716 KOps/s $\color{#35bf28}+1.60\%$
test_creation_empty 45.6810μs 15.0762μs 66.3298 KOps/s 58.9033 KOps/s $\textbf{\color{#35bf28}+12.61\%}$
test_creation_nested_1 64.6010μs 16.9982μs 58.8298 KOps/s 53.6792 KOps/s $\textbf{\color{#35bf28}+9.60\%}$
test_creation_nested_2 43.7710μs 19.4175μs 51.5000 KOps/s 47.7138 KOps/s $\textbf{\color{#35bf28}+7.94\%}$
test_clone 71.7410μs 29.0431μs 34.4316 KOps/s 34.3150 KOps/s $\color{#35bf28}+0.34\%$
test_getitem[int] 1.3447ms 16.1603μs 61.8802 KOps/s 60.9231 KOps/s $\color{#35bf28}+1.57\%$
test_getitem[slice_int] 0.1269ms 28.0133μs 35.6974 KOps/s 35.5897 KOps/s $\color{#35bf28}+0.30\%$
test_getitem[range] 0.2281ms 0.1121ms 8.9207 KOps/s 8.8508 KOps/s $\color{#35bf28}+0.79\%$
test_getitem[tuple] 0.1197ms 23.8852μs 41.8670 KOps/s 41.1253 KOps/s $\color{#35bf28}+1.80\%$
test_getitem[list] 0.2046ms 0.1010ms 9.8967 KOps/s 9.9256 KOps/s $\color{#d91a1a}-0.29\%$
test_setitem_dim[int] 77.7110μs 45.3270μs 22.0619 KOps/s 21.9718 KOps/s $\color{#35bf28}+0.41\%$
test_setitem_dim[slice_int] 93.9810μs 67.7299μs 14.7645 KOps/s 14.8387 KOps/s $\color{#d91a1a}-0.50\%$
test_setitem_dim[range] 0.1633ms 0.1290ms 7.7518 KOps/s 7.7545 KOps/s $\color{#d91a1a}-0.03\%$
test_setitem_dim[tuple] 84.7410μs 61.5486μs 16.2473 KOps/s 16.3998 KOps/s $\color{#d91a1a}-0.93\%$
test_setitem 92.3520μs 42.7086μs 23.4145 KOps/s 23.2248 KOps/s $\color{#35bf28}+0.82\%$
test_set 72.0310μs 41.6464μs 24.0117 KOps/s 23.8555 KOps/s $\color{#35bf28}+0.65\%$
test_set_shared 0.3533ms 55.0887μs 18.1526 KOps/s 18.2615 KOps/s $\color{#d91a1a}-0.60\%$
test_update 98.4020μs 50.7435μs 19.7069 KOps/s 19.3192 KOps/s $\color{#35bf28}+2.01\%$
test_update_nested 0.1115ms 58.1351μs 17.2013 KOps/s 16.8590 KOps/s $\color{#35bf28}+2.03\%$
test_update__nested 0.4233ms 65.1936μs 15.3389 KOps/s 16.2805 KOps/s $\textbf{\color{#d91a1a}-5.78\%}$
test_set_nested 95.7720μs 44.8888μs 22.2773 KOps/s 22.6925 KOps/s $\color{#d91a1a}-1.83\%$
test_set_nested_new 85.7610μs 49.8259μs 20.0699 KOps/s 20.8435 KOps/s $\color{#d91a1a}-3.71\%$
test_select 0.1047ms 60.5045μs 16.5277 KOps/s 16.2479 KOps/s $\color{#35bf28}+1.72\%$
test_select_nested 80.8420μs 41.7226μs 23.9678 KOps/s 24.2177 KOps/s $\color{#d91a1a}-1.03\%$
test_exclude_nested 90.2710μs 59.1456μs 16.9074 KOps/s 17.1021 KOps/s $\color{#d91a1a}-1.14\%$
test_empty[True] 0.3050ms 0.2587ms 3.8648 KOps/s 3.8964 KOps/s $\color{#d91a1a}-0.81\%$
test_empty[False] 5.0691μs 0.7433μs 1.3453 MOps/s 1.3508 MOps/s $\color{#d91a1a}-0.41\%$
test_to 57.1500μs 26.5358μs 37.6850 KOps/s 37.4427 KOps/s $\color{#35bf28}+0.65\%$
test_to_nonblocking 69.1610μs 25.2376μs 39.6235 KOps/s 39.0636 KOps/s $\color{#35bf28}+1.43\%$
test_unbind_speed 1.4633ms 0.2802ms 3.5687 KOps/s 3.5774 KOps/s $\color{#d91a1a}-0.24\%$
test_unbind_speed_stack0 0.3580ms 0.2731ms 3.6614 KOps/s 3.6276 KOps/s $\color{#35bf28}+0.93\%$
test_unbind_speed_stack1 91.7594ms 0.7075ms 1.4135 KOps/s 1.4244 KOps/s $\color{#d91a1a}-0.77\%$
test_split 93.6574ms 2.1555ms 463.9246 Ops/s 450.5751 Ops/s $\color{#35bf28}+2.96\%$
test_chunk 93.3817ms 2.1499ms 465.1467 Ops/s 447.0926 Ops/s $\color{#35bf28}+4.04\%$
test_creation[device0] 0.3474ms 0.1294ms 7.7261 KOps/s 7.8266 KOps/s $\color{#d91a1a}-1.28\%$
test_creation_from_tensor 0.3601ms 0.1363ms 7.3394 KOps/s 7.5515 KOps/s $\color{#d91a1a}-2.81\%$
test_add_one[memmap_tensor0] 0.2336ms 9.0939μs 109.9633 KOps/s 113.4235 KOps/s $\color{#d91a1a}-3.05\%$
test_contiguous[memmap_tensor0] 30.9700μs 2.1924μs 456.1139 KOps/s 455.4627 KOps/s $\color{#35bf28}+0.14\%$
test_stack[memmap_tensor0] 42.8600μs 6.6754μs 149.8036 KOps/s 141.2164 KOps/s $\textbf{\color{#35bf28}+6.08\%}$
test_memmaptd_index 1.3590ms 0.4246ms 2.3550 KOps/s 2.2560 KOps/s $\color{#35bf28}+4.39\%$
test_memmaptd_index_astensor 0.7461ms 0.5022ms 1.9911 KOps/s 1.9301 KOps/s $\color{#35bf28}+3.16\%$
test_memmaptd_index_op 1.4521ms 1.0404ms 961.1486 Ops/s 928.6160 Ops/s $\color{#35bf28}+3.50\%$
test_serialize_model 0.1322s 0.1301s 7.6879 Ops/s 7.6955 Ops/s $\color{#d91a1a}-0.10\%$
test_serialize_model_pickle 1.3477s 1.2132s 0.8243 Ops/s 0.8239 Ops/s $\color{#35bf28}+0.04\%$
test_serialize_weights 0.2245s 0.1430s 6.9906 Ops/s 7.0059 Ops/s $\color{#d91a1a}-0.22\%$
test_serialize_weights_returnearly 0.2140s 56.9957ms 17.5452 Ops/s 17.7991 Ops/s $\color{#d91a1a}-1.43\%$
test_serialize_weights_pickle 1.3721s 1.2183s 0.8208 Ops/s 0.8220 Ops/s $\color{#d91a1a}-0.14\%$
test_reshape_pytree 76.9010μs 35.2999μs 28.3287 KOps/s 28.1059 KOps/s $\color{#35bf28}+0.79\%$
test_reshape_td 81.9710μs 41.0699μs 24.3487 KOps/s 23.4307 KOps/s $\color{#35bf28}+3.92\%$
test_view_pytree 66.5510μs 34.7618μs 28.7672 KOps/s 28.3663 KOps/s $\color{#35bf28}+1.41\%$
test_view_td 81.9410μs 45.4914μs 21.9822 KOps/s 21.2266 KOps/s $\color{#35bf28}+3.56\%$
test_unbind_pytree 69.7810μs 33.5423μs 29.8131 KOps/s 29.4293 KOps/s $\color{#35bf28}+1.30\%$
test_unbind_td 0.5296ms 42.0449μs 23.7841 KOps/s 23.3351 KOps/s $\color{#35bf28}+1.92\%$
test_split_pytree 0.1034ms 46.3063μs 21.5953 KOps/s 21.7514 KOps/s $\color{#d91a1a}-0.72\%$
test_split_td 93.7510ms 64.3986μs 15.5283 KOps/s 17.4197 KOps/s $\textbf{\color{#d91a1a}-10.86\%}$
test_add_pytree 0.1031ms 56.9738μs 17.5519 KOps/s 17.6258 KOps/s $\color{#d91a1a}-0.42\%$
test_add_td 0.1666ms 96.4142μs 10.3719 KOps/s 10.3460 KOps/s $\color{#35bf28}+0.25\%$
test_compile_add_one_nested[tensordict-compile] 0.2112ms 0.1603ms 6.2369 KOps/s 6.1721 KOps/s $\color{#35bf28}+1.05\%$
test_compile_add_one_nested[tensordict-eager] 0.2866ms 0.1644ms 6.0825 KOps/s 6.0360 KOps/s $\color{#35bf28}+0.77\%$
test_compile_add_one_nested[pytree-compile] 0.1824ms 0.1442ms 6.9364 KOps/s 6.9177 KOps/s $\color{#35bf28}+0.27\%$
test_compile_add_one_nested[pytree-eager] 0.2361ms 0.1850ms 5.4060 KOps/s 5.4390 KOps/s $\color{#d91a1a}-0.61\%$
test_compile_copy_nested[tensordict-compile] 64.7210μs 21.7761μs 45.9219 KOps/s 45.4771 KOps/s $\color{#35bf28}+0.98\%$
test_compile_copy_nested[tensordict-eager] 81.4710μs 49.3786μs 20.2517 KOps/s 20.1006 KOps/s $\color{#35bf28}+0.75\%$
test_compile_copy_nested[pytree-compile] 0.2258ms 65.4696μs 15.2743 KOps/s 15.4375 KOps/s $\color{#d91a1a}-1.06\%$
test_compile_copy_nested[pytree-eager] 92.8120μs 49.4949μs 20.2041 KOps/s 19.9807 KOps/s $\color{#35bf28}+1.12\%$
test_compile_add_one_flat[tensordict-compile] 0.3832ms 0.3214ms 3.1112 KOps/s 3.0904 KOps/s $\color{#35bf28}+0.67\%$
test_compile_add_one_flat[tensordict-eager] 0.3919ms 0.2335ms 4.2821 KOps/s 4.1641 KOps/s $\color{#35bf28}+2.84\%$
test_compile_add_one_flat[tensorclass-compile] 0.1766ms 0.1271ms 7.8702 KOps/s 7.7663 KOps/s $\color{#35bf28}+1.34\%$
test_compile_add_one_flat[tensorclass-eager] 0.1233ms 65.3770μs 15.2959 KOps/s 14.2221 KOps/s $\textbf{\color{#35bf28}+7.55\%}$
test_compile_add_one_flat[pytree-compile] 0.4589ms 0.3231ms 3.0953 KOps/s 3.1273 KOps/s $\color{#d91a1a}-1.02\%$
test_compile_add_one_flat[pytree-eager] 0.7966ms 0.6631ms 1.5080 KOps/s 1.5979 KOps/s $\textbf{\color{#d91a1a}-5.62\%}$
test_compile_add_self_flat[tensordict-eager] 0.4284ms 0.2853ms 3.5048 KOps/s 3.4478 KOps/s $\color{#35bf28}+1.65\%$
test_compile_add_self_flat[tensordict-compile] 0.4377ms 0.3272ms 3.0559 KOps/s 3.0825 KOps/s $\color{#d91a1a}-0.86\%$
test_compile_add_self_flat[tensorclass-eager] 0.1883ms 78.4725μs 12.7433 KOps/s 12.6609 KOps/s $\color{#35bf28}+0.65\%$
test_compile_add_self_flat[tensorclass-compile] 0.2012ms 0.1353ms 7.3920 KOps/s 7.5996 KOps/s $\color{#d91a1a}-2.73\%$
test_compile_add_self_flat[pytree-eager] 0.7211ms 0.5420ms 1.8450 KOps/s 1.9084 KOps/s $\color{#d91a1a}-3.32\%$
test_compile_add_self_flat[pytree-compile] 0.6153ms 0.3279ms 3.0496 KOps/s 3.1362 KOps/s $\color{#d91a1a}-2.76\%$
test_compile_copy_flat[tensordict-compile] 0.1109ms 20.2313μs 49.4284 KOps/s 49.4949 KOps/s $\color{#d91a1a}-0.13\%$
test_compile_copy_flat[tensordict-eager] 0.1343ms 37.9104μs 26.3780 KOps/s 25.3332 KOps/s $\color{#35bf28}+4.12\%$
test_compile_copy_flat[pytree-compile] 0.1525ms 69.9791μs 14.2900 KOps/s 14.3131 KOps/s $\color{#d91a1a}-0.16\%$
test_compile_copy_flat[pytree-eager] 0.1393ms 51.5883μs 19.3842 KOps/s 19.3669 KOps/s $\color{#35bf28}+0.09\%$
test_compile_assign_and_add[tensordict-compile] 2.3505ms 0.7776ms 1.2861 KOps/s 1.1132 KOps/s $\textbf{\color{#35bf28}+15.52\%}$
test_compile_assign_and_add[tensordict-eager] 3.5642ms 3.3344ms 299.9068 Ops/s 309.9770 Ops/s $\color{#d91a1a}-3.25\%$
test_compile_assign_and_add[pytree-compile] 2.3142ms 0.8312ms 1.2030 KOps/s 1.1320 KOps/s $\textbf{\color{#35bf28}+6.28\%}$
test_compile_assign_and_add[pytree-eager] 3.5630ms 3.2595ms 306.7947 Ops/s 311.4700 Ops/s $\color{#d91a1a}-1.50\%$
test_compile_indexing[tensor-tensordict-compile] 0.1766ms 0.1081ms 9.2486 KOps/s 9.0168 KOps/s $\color{#35bf28}+2.57\%$
test_compile_indexing[tensor-tensordict-eager] 0.1975ms 61.9772μs 16.1350 KOps/s 15.5495 KOps/s $\color{#35bf28}+3.77\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1921ms 0.1024ms 9.7682 KOps/s 9.7515 KOps/s $\color{#35bf28}+0.17\%$
test_compile_indexing[tensor-tensorclass-eager] 98.7120μs 43.7683μs 22.8476 KOps/s 22.6404 KOps/s $\color{#35bf28}+0.92\%$
test_compile_indexing[tensor-pytree-compile] 0.1782ms 0.1033ms 9.6847 KOps/s 9.6102 KOps/s $\color{#35bf28}+0.78\%$
test_compile_indexing[tensor-pytree-eager] 0.1112ms 45.1800μs 22.1337 KOps/s 22.1053 KOps/s $\color{#35bf28}+0.13\%$
test_compile_indexing[slice-tensordict-compile] 0.1914ms 0.1378ms 7.2548 KOps/s 7.2143 KOps/s $\color{#35bf28}+0.56\%$
test_compile_indexing[slice-tensordict-eager] 0.1548ms 24.8740μs 40.2027 KOps/s 39.6582 KOps/s $\color{#35bf28}+1.37\%$
test_compile_indexing[slice-tensorclass-compile] 0.1845ms 0.1363ms 7.3392 KOps/s 7.6196 KOps/s $\color{#d91a1a}-3.68\%$
test_compile_indexing[slice-tensorclass-eager] 77.9810μs 20.6584μs 48.4065 KOps/s 47.8995 KOps/s $\color{#35bf28}+1.06\%$
test_compile_indexing[slice-pytree-compile] 0.2090ms 0.1315ms 7.6044 KOps/s 7.5623 KOps/s $\color{#35bf28}+0.56\%$
test_compile_indexing[slice-pytree-eager] 61.0710μs 20.4194μs 48.9731 KOps/s 47.6644 KOps/s $\color{#35bf28}+2.75\%$
test_compile_indexing[int-tensordict-compile] 0.2054ms 0.1410ms 7.0935 KOps/s 7.0558 KOps/s $\color{#35bf28}+0.53\%$
test_compile_indexing[int-tensordict-eager] 0.5054ms 24.5594μs 40.7176 KOps/s 39.4309 KOps/s $\color{#35bf28}+3.26\%$
test_compile_indexing[int-tensorclass-compile] 0.1999ms 0.1372ms 7.2884 KOps/s 7.5599 KOps/s $\color{#d91a1a}-3.59\%$
test_compile_indexing[int-tensorclass-eager] 70.0910μs 21.4300μs 46.6636 KOps/s 47.7780 KOps/s $\color{#d91a1a}-2.33\%$
test_compile_indexing[int-pytree-compile] 0.2253ms 0.1401ms 7.1386 KOps/s 7.5694 KOps/s $\textbf{\color{#d91a1a}-5.69\%}$
test_compile_indexing[int-pytree-eager] 65.1710μs 21.0618μs 47.4792 KOps/s 47.3638 KOps/s $\color{#35bf28}+0.24\%$
test_mod_add[eager] 83.9410μs 33.8109μs 29.5763 KOps/s 29.8813 KOps/s $\color{#d91a1a}-1.02\%$
test_mod_add[compile] 0.3053ms 73.2673μs 13.6487 KOps/s 13.4030 KOps/s $\color{#35bf28}+1.83\%$
test_mod_add[compile-overhead] 0.2685ms 0.1354ms 7.3841 KOps/s 7.1167 KOps/s $\color{#35bf28}+3.76\%$
test_mod_wrap[eager] 0.4463ms 0.2412ms 4.1452 KOps/s 4.0216 KOps/s $\color{#35bf28}+3.07\%$
test_mod_wrap[compile] 1.4139ms 0.3022ms 3.3095 KOps/s 3.3143 KOps/s $\color{#d91a1a}-0.15\%$
test_mod_wrap[compile-overhead] 7.3649ms 3.9822ms 251.1184 Ops/s 252.0569 Ops/s $\color{#d91a1a}-0.37\%$
test_mod_wrap_and_backward[eager] 2.0268ms 1.4318ms 698.4413 Ops/s 687.0644 Ops/s $\color{#35bf28}+1.66\%$
test_mod_wrap_and_backward[compile] 1.7242ms 1.4171ms 705.6732 Ops/s 691.6834 Ops/s $\color{#35bf28}+2.02\%$
test_mod_wrap_and_backward[compile-overhead] 1.7533ms 1.0582ms 944.9720 Ops/s 973.7104 Ops/s $\color{#d91a1a}-2.95\%$
test_seq_add[eager] 0.1583ms 0.1070ms 9.3453 KOps/s 9.2932 KOps/s $\color{#35bf28}+0.56\%$
test_seq_add[compile] 0.1558ms 86.1136μs 11.6126 KOps/s 11.5218 KOps/s $\color{#35bf28}+0.79\%$
test_seq_add[compile-overhead] 0.2180ms 0.1188ms 8.4155 KOps/s 8.6730 KOps/s $\color{#d91a1a}-2.97\%$
test_seq_wrap[eager] 0.4521ms 0.3819ms 2.6185 KOps/s 2.4201 KOps/s $\textbf{\color{#35bf28}+8.20\%}$
test_seq_wrap[compile] 0.4373ms 0.3140ms 3.1851 KOps/s 3.0768 KOps/s $\color{#35bf28}+3.52\%$
test_seq_wrap[compile-overhead] 0.2752ms 0.2190ms 4.5671 KOps/s 4.4660 KOps/s $\color{#35bf28}+2.26\%$
test_func_call_runtime[False-eager] 1.2684ms 0.7970ms 1.2547 KOps/s 1.2393 KOps/s $\color{#35bf28}+1.25\%$
test_func_call_runtime[False-compile] 0.9849ms 0.7937ms 1.2600 KOps/s 1.2175 KOps/s $\color{#35bf28}+3.48\%$
test_func_call_runtime[False-compile-overhead] 0.4179ms 0.3605ms 2.7736 KOps/s 2.7324 KOps/s $\color{#35bf28}+1.51\%$
test_func_call_runtime[True-eager] 0.9941ms 0.9092ms 1.0999 KOps/s 1.0679 KOps/s $\color{#35bf28}+3.00\%$
test_func_call_runtime[True-compile] 0.8969ms 0.8160ms 1.2255 KOps/s 1.1920 KOps/s $\color{#35bf28}+2.81\%$
test_func_call_runtime[True-compile-overhead] 0.4657ms 0.3834ms 2.6079 KOps/s 2.5721 KOps/s $\color{#35bf28}+1.39\%$
test_func_call_cm_runtime[False-eager] 0.7914ms 0.7377ms 1.3555 KOps/s 1.2405 KOps/s $\textbf{\color{#35bf28}+9.27\%}$
test_func_call_cm_runtime[False-compile] 1.2303ms 0.7948ms 1.2581 KOps/s 1.1909 KOps/s $\textbf{\color{#35bf28}+5.64\%}$
test_func_call_cm_runtime[False-compile-overhead] 0.4058ms 0.3613ms 2.7677 KOps/s 2.7388 KOps/s $\color{#35bf28}+1.05\%$
test_func_call_cm_runtime[True-eager] 1.1324ms 1.0216ms 978.9032 Ops/s 972.3170 Ops/s $\color{#35bf28}+0.68\%$
test_func_call_cm_runtime[True-compile] 0.8967ms 0.8409ms 1.1892 KOps/s 1.1620 KOps/s $\color{#35bf28}+2.34\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5159ms 0.4080ms 2.4511 KOps/s 2.4219 KOps/s $\color{#35bf28}+1.20\%$
test_vmap_func_call_cm_runtime[eager] 2.7968ms 2.1052ms 475.0093 Ops/s 474.2261 Ops/s $\color{#35bf28}+0.17\%$
test_vmap_func_call_cm_runtime[compile] 1.3419ms 0.8655ms 1.1554 KOps/s 1.1463 KOps/s $\color{#35bf28}+0.79\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4759ms 0.4084ms 2.4487 KOps/s 2.4139 KOps/s $\color{#35bf28}+1.44\%$
test_distributed 2.4353ms 0.1257ms 7.9528 KOps/s 8.4203 KOps/s $\textbf{\color{#d91a1a}-5.55\%}$
test_tdmodule 37.1610μs 14.5963μs 68.5107 KOps/s 60.5615 KOps/s $\textbf{\color{#35bf28}+13.13\%}$
test_tdmodule_dispatch 53.3410μs 30.0704μs 33.2553 KOps/s 31.3113 KOps/s $\textbf{\color{#35bf28}+6.21\%}$
test_tdseq 37.8400μs 15.9109μs 62.8499 KOps/s 56.5135 KOps/s $\textbf{\color{#35bf28}+11.21\%}$
test_tdseq_dispatch 54.7310μs 31.8573μs 31.3900 KOps/s 28.4562 KOps/s $\textbf{\color{#35bf28}+10.31\%}$
test_instantiation_functorch 2.0141ms 1.8627ms 536.8466 Ops/s 530.8640 Ops/s $\color{#35bf28}+1.13\%$
test_exec_functorch 0.2668ms 0.2093ms 4.7772 KOps/s 4.7214 KOps/s $\color{#35bf28}+1.18\%$
test_exec_functional_call 0.2520ms 0.2092ms 4.7808 KOps/s 4.6584 KOps/s $\color{#35bf28}+2.63\%$
test_exec_td_decorator 0.4421ms 0.2676ms 3.7368 KOps/s 3.7168 KOps/s $\color{#35bf28}+0.54\%$
test_vmap_mlp_speed_decorator[True-True] 0.8354ms 0.7107ms 1.4070 KOps/s 1.4522 KOps/s $\color{#d91a1a}-3.11\%$
test_vmap_mlp_speed_decorator[True-False] 0.8927ms 0.7115ms 1.4055 KOps/s 1.4608 KOps/s $\color{#d91a1a}-3.79\%$
test_vmap_mlp_speed_decorator[False-True] 0.7600ms 0.6207ms 1.6111 KOps/s 1.6654 KOps/s $\color{#d91a1a}-3.26\%$
test_vmap_mlp_speed_decorator[False-False] 0.7755ms 0.6269ms 1.5951 KOps/s 1.6647 KOps/s $\color{#d91a1a}-4.18\%$
test_vmap_transformer_speed_decorator[True-True] 20.0367ms 19.4618ms 51.3827 Ops/s 51.1022 Ops/s $\color{#35bf28}+0.55\%$
test_vmap_transformer_speed_decorator[True-False] 20.2373ms 19.5138ms 51.2458 Ops/s 51.1248 Ops/s $\color{#35bf28}+0.24\%$
test_vmap_transformer_speed_decorator[False-True] 20.5781ms 19.3984ms 51.5508 Ops/s 51.6004 Ops/s $\color{#d91a1a}-0.10\%$
test_vmap_transformer_speed_decorator[False-False] 19.4538ms 19.3764ms 51.6093 Ops/s 51.3569 Ops/s $\color{#35bf28}+0.49\%$
test_to_module_speed[True] 1.2691ms 0.9998ms 1.0002 KOps/s 994.2266 Ops/s $\color{#35bf28}+0.60\%$
test_to_module_speed[False] 1.4137ms 0.9737ms 1.0271 KOps/s 1.0162 KOps/s $\color{#35bf28}+1.07\%$
test_tc_init 74.1410μs 34.7403μs 28.7851 KOps/s 27.3324 KOps/s $\textbf{\color{#35bf28}+5.31\%}$
test_tc_init_nested 0.1329ms 71.2356μs 14.0379 KOps/s 13.4444 KOps/s $\color{#35bf28}+4.41\%$
test_tc_first_layer_tensor 10.3416μs 0.6703μs 1.4918 MOps/s 1.4770 MOps/s $\color{#35bf28}+1.01\%$
test_tc_first_layer_nontensor 25.7010μs 2.2314μs 448.1436 KOps/s 446.5971 KOps/s $\color{#35bf28}+0.35\%$
test_tc_second_layer_tensor 11.6778μs 1.3431μs 744.5461 KOps/s 728.0689 KOps/s $\color{#35bf28}+2.26\%$
test_tc_second_layer_nontensor 22.2600μs 2.9199μs 342.4771 KOps/s 339.8466 KOps/s $\color{#35bf28}+0.77\%$
test_unbind 0.1835s 11.9713ms 83.5331 Ops/s 93.2731 Ops/s $\textbf{\color{#d91a1a}-10.44\%}$
test_full_like 0.6545ms 0.5733ms 1.7444 KOps/s 1.7440 KOps/s $\color{#35bf28}+0.02\%$
test_zeros_like 0.2872ms 0.1979ms 5.0532 KOps/s 5.0555 KOps/s $\color{#d91a1a}-0.05\%$
test_ones_like 0.2353ms 0.1977ms 5.0576 KOps/s 5.0540 KOps/s $\color{#35bf28}+0.07\%$
test_clone 0.4450ms 0.4146ms 2.4118 KOps/s 2.4100 KOps/s $\color{#35bf28}+0.07\%$
test_squeeze 33.5610μs 9.7537μs 102.5252 KOps/s 101.6714 KOps/s $\color{#35bf28}+0.84\%$
test_unsqueeze 0.2234ms 74.3563μs 13.4488 KOps/s 13.6498 KOps/s $\color{#d91a1a}-1.47\%$
test_split 0.4252ms 0.1552ms 6.4443 KOps/s 6.3827 KOps/s $\color{#35bf28}+0.96\%$
test_permute 0.2214ms 0.1805ms 5.5414 KOps/s 5.4155 KOps/s $\color{#35bf28}+2.33\%$
test_stack 1.2510ms 0.8598ms 1.1631 KOps/s 1.2530 KOps/s $\textbf{\color{#d91a1a}-7.18\%}$
test_cat 1.2531ms 1.2311ms 812.2777 Ops/s 812.1694 Ops/s $\color{#35bf28}+0.01\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants