Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] TD+NJT to(device) support #1022

Merged
merged 11 commits into from
Oct 16, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Oct 2, 2024

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Oct 2, 2024
ghstack-source-id: 792ce21cfa30eb2d62f4b30a469f30312d25909d
Pull Request resolved: #1022
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 2, 2024
Copy link

github-actions bot commented Oct 2, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 216. Improved: $\large\color{#35bf28}21$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 60.2840μs 24.2236μs 41.2821 KOps/s 41.8118 KOps/s $\color{#d91a1a}-1.27\%$
test_plain_set_stack_nested 52.0980μs 24.6545μs 40.5605 KOps/s 39.9663 KOps/s $\color{#35bf28}+1.49\%$
test_plain_set_nested_inplace 56.3060μs 26.5761μs 37.6278 KOps/s 36.6333 KOps/s $\color{#35bf28}+2.71\%$
test_plain_set_stack_nested_inplace 67.9070μs 26.5238μs 37.7019 KOps/s 36.6728 KOps/s $\color{#35bf28}+2.81\%$
test_items 30.4570μs 4.1312μs 242.0584 KOps/s 241.4267 KOps/s $\color{#35bf28}+0.26\%$
test_items_nested 0.9022ms 0.3911ms 2.5567 KOps/s 2.6695 KOps/s $\color{#d91a1a}-4.22\%$
test_items_nested_locked 0.5320ms 0.3865ms 2.5876 KOps/s 2.6492 KOps/s $\color{#d91a1a}-2.32\%$
test_items_nested_leaf 0.1557ms 80.1320μs 12.4794 KOps/s 12.4413 KOps/s $\color{#35bf28}+0.31\%$
test_items_stack_nested 0.4532ms 0.3876ms 2.5798 KOps/s 2.6076 KOps/s $\color{#d91a1a}-1.06\%$
test_items_stack_nested_leaf 0.1895ms 83.0969μs 12.0341 KOps/s 12.1389 KOps/s $\color{#d91a1a}-0.86\%$
test_items_stack_nested_locked 0.8127ms 0.3913ms 2.5553 KOps/s 2.6211 KOps/s $\color{#d91a1a}-2.51\%$
test_keys 21.7410μs 3.4292μs 291.6123 KOps/s 288.0792 KOps/s $\color{#35bf28}+1.23\%$
test_keys_nested 0.2497ms 0.1339ms 7.4702 KOps/s 7.6575 KOps/s $\color{#d91a1a}-2.45\%$
test_keys_nested_locked 0.8288ms 0.1388ms 7.2034 KOps/s 7.2152 KOps/s $\color{#d91a1a}-0.16\%$
test_keys_nested_leaf 0.1770ms 0.1182ms 8.4606 KOps/s 8.6588 KOps/s $\color{#d91a1a}-2.29\%$
test_keys_stack_nested 0.2342ms 0.1335ms 7.4932 KOps/s 7.6051 KOps/s $\color{#d91a1a}-1.47\%$
test_keys_stack_nested_leaf 0.2140ms 0.1148ms 8.7081 KOps/s 8.8055 KOps/s $\color{#d91a1a}-1.11\%$
test_keys_stack_nested_locked 0.2685ms 0.1396ms 7.1632 KOps/s 7.3353 KOps/s $\color{#d91a1a}-2.35\%$
test_values 7.8066μs 0.9730μs 1.0277 MOps/s 892.0130 KOps/s $\textbf{\color{#35bf28}+15.21\%}$
test_values_nested 0.1645ms 93.2604μs 10.7227 KOps/s 10.7249 KOps/s $\color{#d91a1a}-0.02\%$
test_values_nested_locked 0.1717ms 93.4814μs 10.6973 KOps/s 10.6026 KOps/s $\color{#35bf28}+0.89\%$
test_values_nested_leaf 0.1344ms 79.0718μs 12.6467 KOps/s 12.5050 KOps/s $\color{#35bf28}+1.13\%$
test_values_stack_nested 0.1765ms 94.1674μs 10.6194 KOps/s 10.7493 KOps/s $\color{#d91a1a}-1.21\%$
test_values_stack_nested_leaf 0.1358ms 79.3764μs 12.5982 KOps/s 13.1503 KOps/s $\color{#d91a1a}-4.20\%$
test_values_stack_nested_locked 0.1649ms 95.4184μs 10.4802 KOps/s 10.6197 KOps/s $\color{#d91a1a}-1.31\%$
test_membership 2.2713μs 0.6987μs 1.4313 MOps/s 1.4167 MOps/s $\color{#35bf28}+1.03\%$
test_membership_nested 26.4590μs 2.6826μs 372.7704 KOps/s 371.2618 KOps/s $\color{#35bf28}+0.41\%$
test_membership_nested_leaf 24.7860μs 2.6921μs 371.4531 KOps/s 371.5758 KOps/s $\color{#d91a1a}-0.03\%$
test_membership_stacked_nested 22.5020μs 2.6834μs 372.6603 KOps/s 363.8044 KOps/s $\color{#35bf28}+2.43\%$
test_membership_stacked_nested_leaf 21.9110μs 2.6896μs 371.8012 KOps/s 364.8475 KOps/s $\color{#35bf28}+1.91\%$
test_membership_nested_last 30.2270μs 4.1098μs 243.3198 KOps/s 245.8942 KOps/s $\color{#d91a1a}-1.05\%$
test_membership_nested_leaf_last 26.3000μs 4.1302μs 242.1173 KOps/s 246.7042 KOps/s $\color{#d91a1a}-1.86\%$
test_membership_stacked_nested_last 51.0520μs 4.9154μs 203.4412 KOps/s 145.4860 KOps/s $\textbf{\color{#35bf28}+39.84\%}$
test_membership_stacked_nested_leaf_last 26.2490μs 4.9672μs 201.3216 KOps/s 145.1583 KOps/s $\textbf{\color{#35bf28}+38.69\%}$
test_nested_getleaf 32.3410μs 10.5564μs 94.7289 KOps/s 94.4945 KOps/s $\color{#35bf28}+0.25\%$
test_nested_get 38.7830μs 10.0642μs 99.3616 KOps/s 99.7340 KOps/s $\color{#d91a1a}-0.37\%$
test_stacked_getleaf 47.3220μs 10.4271μs 95.9038 KOps/s 94.9662 KOps/s $\color{#35bf28}+0.99\%$
test_stacked_get 30.6280μs 10.0199μs 99.8015 KOps/s 100.9739 KOps/s $\color{#d91a1a}-1.16\%$
test_nested_getitemleaf 38.1010μs 10.9154μs 91.6137 KOps/s 92.6658 KOps/s $\color{#d91a1a}-1.14\%$
test_nested_getitem 41.3850μs 10.3716μs 96.4170 KOps/s 97.4346 KOps/s $\color{#d91a1a}-1.04\%$
test_stacked_getitemleaf 38.8430μs 10.9232μs 91.5482 KOps/s 90.9344 KOps/s $\color{#35bf28}+0.67\%$
test_stacked_getitem 32.5610μs 10.2449μs 97.6092 KOps/s 98.4391 KOps/s $\color{#d91a1a}-0.84\%$
test_lock_nested 84.1560ms 0.5825ms 1.7168 KOps/s 2.0244 KOps/s $\textbf{\color{#d91a1a}-15.20\%}$
test_lock_stack_nested 0.8284ms 0.4635ms 2.1574 KOps/s 2.2012 KOps/s $\color{#d91a1a}-1.99\%$
test_unlock_nested 82.7726ms 0.4998ms 2.0009 KOps/s 2.4180 KOps/s $\textbf{\color{#d91a1a}-17.25\%}$
test_unlock_stack_nested 0.5820ms 0.3776ms 2.6480 KOps/s 2.6575 KOps/s $\color{#d91a1a}-0.36\%$
test_flatten_speed 0.2037ms 0.1001ms 9.9930 KOps/s 9.9377 KOps/s $\color{#35bf28}+0.56\%$
test_unflatten_speed 0.9190ms 0.5150ms 1.9417 KOps/s 1.9577 KOps/s $\color{#d91a1a}-0.82\%$
test_common_ops 4.1091ms 1.1414ms 876.1197 Ops/s 859.7592 Ops/s $\color{#35bf28}+1.90\%$
test_creation 66.0040μs 2.0490μs 488.0547 KOps/s 485.7155 KOps/s $\color{#35bf28}+0.48\%$
test_creation_empty 63.7890μs 18.0584μs 55.3758 KOps/s 51.8638 KOps/s $\textbf{\color{#35bf28}+6.77\%}$
test_creation_nested_1 66.8950μs 22.2996μs 44.8438 KOps/s 44.0170 KOps/s $\color{#35bf28}+1.88\%$
test_creation_nested_2 78.2670μs 25.9565μs 38.5261 KOps/s 37.6964 KOps/s $\color{#35bf28}+2.20\%$
test_clone 97.9440μs 16.8346μs 59.4016 KOps/s 57.5680 KOps/s $\color{#35bf28}+3.19\%$
test_getitem[int] 0.8884ms 16.7023μs 59.8721 KOps/s 59.5780 KOps/s $\color{#35bf28}+0.49\%$
test_getitem[slice_int] 0.2314ms 31.9646μs 31.2847 KOps/s 32.5272 KOps/s $\color{#d91a1a}-3.82\%$
test_getitem[range] 0.1721ms 59.0312μs 16.9402 KOps/s 17.3340 KOps/s $\color{#d91a1a}-2.27\%$
test_getitem[tuple] 0.1394ms 25.7455μs 38.8417 KOps/s 39.8110 KOps/s $\color{#d91a1a}-2.43\%$
test_getitem[list] 0.2410ms 52.3686μs 19.0954 KOps/s 18.8745 KOps/s $\color{#35bf28}+1.17\%$
test_setitem_dim[int] 51.3660μs 31.7871μs 31.4593 KOps/s 30.8478 KOps/s $\color{#35bf28}+1.98\%$
test_setitem_dim[slice_int] 0.1093ms 59.5354μs 16.7967 KOps/s 16.1424 KOps/s $\color{#35bf28}+4.05\%$
test_setitem_dim[range] 0.1643ms 83.2407μs 12.0134 KOps/s 11.9495 KOps/s $\color{#35bf28}+0.53\%$
test_setitem_dim[tuple] 67.8380μs 47.5254μs 21.0414 KOps/s 20.5443 KOps/s $\color{#35bf28}+2.42\%$
test_setitem 0.1097ms 30.1703μs 33.1452 KOps/s 32.1422 KOps/s $\color{#35bf28}+3.12\%$
test_set 0.1092ms 29.3477μs 34.0742 KOps/s 33.0152 KOps/s $\color{#35bf28}+3.21\%$
test_set_shared 1.3135ms 0.2133ms 4.6873 KOps/s 4.5941 KOps/s $\color{#35bf28}+2.03\%$
test_update 0.1506ms 37.8792μs 26.3997 KOps/s 24.7757 KOps/s $\textbf{\color{#35bf28}+6.55\%}$
test_update_nested 0.1356ms 48.8203μs 20.4833 KOps/s 19.2506 KOps/s $\textbf{\color{#35bf28}+6.40\%}$
test_update__nested 0.3514ms 44.7314μs 22.3556 KOps/s 22.5315 KOps/s $\color{#d91a1a}-0.78\%$
test_set_nested 89.8480μs 32.7119μs 30.5699 KOps/s 29.7710 KOps/s $\color{#35bf28}+2.68\%$
test_set_nested_new 0.1259ms 37.1504μs 26.9176 KOps/s 25.8804 KOps/s $\color{#35bf28}+4.01\%$
test_select 0.1346ms 55.6227μs 17.9783 KOps/s 17.6949 KOps/s $\color{#35bf28}+1.60\%$
test_select_nested 0.1285ms 59.1612μs 16.9030 KOps/s 16.9014 KOps/s $+0.01\%$
test_exclude_nested 0.1524ms 74.5453μs 13.4147 KOps/s 13.2552 KOps/s $\color{#35bf28}+1.20\%$
test_empty[True] 0.6417ms 0.3512ms 2.8475 KOps/s 2.8463 KOps/s $\color{#35bf28}+0.04\%$
test_empty[False] 6.0594μs 1.1905μs 839.9576 KOps/s 849.9397 KOps/s $\color{#d91a1a}-1.17\%$
test_unbind_speed 0.6239ms 0.3019ms 3.3125 KOps/s 3.4289 KOps/s $\color{#d91a1a}-3.39\%$
test_unbind_speed_stack0 0.4020ms 0.2918ms 3.4270 KOps/s 3.4847 KOps/s $\color{#d91a1a}-1.66\%$
test_unbind_speed_stack1 87.3163ms 0.8343ms 1.1985 KOps/s 1.4043 KOps/s $\textbf{\color{#d91a1a}-14.65\%}$
test_split 75.0711ms 2.1303ms 469.4183 Ops/s 469.3346 Ops/s $\color{#35bf28}+0.02\%$
test_chunk 2.1482ms 1.9954ms 501.1569 Ops/s 464.6569 Ops/s $\textbf{\color{#35bf28}+7.86\%}$
test_creation[device0] 0.2317ms 0.1171ms 8.5416 KOps/s 8.4670 KOps/s $\color{#35bf28}+0.88\%$
test_creation_from_tensor 3.0337ms 0.1155ms 8.6543 KOps/s 8.6753 KOps/s $\color{#d91a1a}-0.24\%$
test_add_one[memmap_tensor0] 0.2954ms 7.4464μs 134.2928 KOps/s 134.0100 KOps/s $\color{#35bf28}+0.21\%$
test_contiguous[memmap_tensor0] 22.0210μs 1.9345μs 516.9284 KOps/s 516.0961 KOps/s $\color{#35bf28}+0.16\%$
test_stack[memmap_tensor0] 50.2040μs 5.6178μs 178.0046 KOps/s 174.5384 KOps/s $\color{#35bf28}+1.99\%$
test_memmaptd_index 1.1868ms 0.4095ms 2.4417 KOps/s 2.4495 KOps/s $\color{#d91a1a}-0.32\%$
test_memmaptd_index_astensor 86.3985ms 0.5505ms 1.8166 KOps/s 1.9662 KOps/s $\textbf{\color{#d91a1a}-7.60\%}$
test_memmaptd_index_op 1.9341ms 1.0536ms 949.1524 Ops/s 930.8164 Ops/s $\color{#35bf28}+1.97\%$
test_serialize_model 0.1270s 0.1188s 8.4192 Ops/s 8.5205 Ops/s $\color{#d91a1a}-1.19\%$
test_serialize_model_pickle 0.4761s 0.3964s 2.5229 Ops/s 2.5367 Ops/s $\color{#d91a1a}-0.54\%$
test_serialize_weights 0.1188s 0.1130s 8.8494 Ops/s 7.5998 Ops/s $\textbf{\color{#35bf28}+16.44\%}$
test_serialize_weights_returnearly 0.2397s 0.1738s 5.7527 Ops/s 6.3589 Ops/s $\textbf{\color{#d91a1a}-9.53\%}$
test_serialize_weights_pickle 0.5358s 0.4197s 2.3825 Ops/s 2.4891 Ops/s $\color{#d91a1a}-4.29\%$
test_serialize_weights_filesystem 0.1455s 0.1411s 7.0853 Ops/s 7.1468 Ops/s $\color{#d91a1a}-0.86\%$
test_serialize_model_filesystem 0.1617s 0.1470s 6.8012 Ops/s 6.1395 Ops/s $\textbf{\color{#35bf28}+10.78\%}$
test_reshape_pytree 0.1049ms 39.4591μs 25.3427 KOps/s 25.0509 KOps/s $\color{#35bf28}+1.16\%$
test_reshape_td 95.2080μs 45.9448μs 21.7653 KOps/s 21.8526 KOps/s $\color{#d91a1a}-0.40\%$
test_view_pytree 95.6590μs 39.4134μs 25.3721 KOps/s 25.7766 KOps/s $\color{#d91a1a}-1.57\%$
test_view_td 0.1277ms 52.9120μs 18.8993 KOps/s 19.4589 KOps/s $\color{#d91a1a}-2.88\%$
test_unbind_pytree 81.7130μs 36.3285μs 27.5266 KOps/s 28.1439 KOps/s $\color{#d91a1a}-2.19\%$
test_unbind_td 0.2875ms 44.8109μs 22.3160 KOps/s 22.7015 KOps/s $\color{#d91a1a}-1.70\%$
test_split_pytree 81.0620μs 37.6361μs 26.5702 KOps/s 25.5594 KOps/s $\color{#35bf28}+3.95\%$
test_split_td 88.3124ms 67.4802μs 14.8192 KOps/s 17.4333 KOps/s $\textbf{\color{#d91a1a}-15.00\%}$
test_add_pytree 0.1440ms 46.4542μs 21.5266 KOps/s 22.4601 KOps/s $\color{#d91a1a}-4.16\%$
test_add_td 0.1894ms 84.2600μs 11.8680 KOps/s 11.1793 KOps/s $\textbf{\color{#35bf28}+6.16\%}$
test_compile_add_one_nested[tensordict-compile] 0.1265ms 57.4676μs 17.4011 KOps/s 17.2000 KOps/s $\color{#35bf28}+1.17\%$
test_compile_add_one_nested[tensordict-eager] 0.4234ms 0.1963ms 5.0946 KOps/s 5.1414 KOps/s $\color{#d91a1a}-0.91\%$
test_compile_add_one_nested[pytree-compile] 0.1142ms 56.6732μs 17.6450 KOps/s 17.6196 KOps/s $\color{#35bf28}+0.14\%$
test_compile_add_one_nested[pytree-eager] 0.3259ms 0.1429ms 6.9990 KOps/s 7.1179 KOps/s $\color{#d91a1a}-1.67\%$
test_compile_copy_nested[tensordict-compile] 69.8910μs 22.7041μs 44.0450 KOps/s 41.9281 KOps/s $\textbf{\color{#35bf28}+5.05\%}$
test_compile_copy_nested[tensordict-eager] 0.1633ms 74.0692μs 13.5009 KOps/s 13.5507 KOps/s $\color{#d91a1a}-0.37\%$
test_compile_copy_nested[pytree-compile] 0.1411ms 75.4338μs 13.2567 KOps/s 13.1528 KOps/s $\color{#35bf28}+0.79\%$
test_compile_copy_nested[pytree-eager] 0.1471ms 68.7222μs 14.5513 KOps/s 14.7187 KOps/s $\color{#d91a1a}-1.14\%$
test_compile_add_one_flat[tensordict-compile] 0.3648ms 0.1813ms 5.5143 KOps/s 5.4999 KOps/s $\color{#35bf28}+0.26\%$
test_compile_add_one_flat[tensordict-eager] 0.3530ms 0.2446ms 4.0891 KOps/s 4.2072 KOps/s $\color{#d91a1a}-2.81\%$
test_compile_add_one_flat[tensorclass-compile] 0.1161ms 47.4577μs 21.0714 KOps/s 21.3610 KOps/s $\color{#d91a1a}-1.36\%$
test_compile_add_one_flat[tensorclass-eager] 0.3810ms 78.4410μs 12.7484 KOps/s 12.6500 KOps/s $\color{#35bf28}+0.78\%$
test_compile_add_one_flat[pytree-compile] 0.3429ms 0.1748ms 5.7214 KOps/s 5.8091 KOps/s $\color{#d91a1a}-1.51\%$
test_compile_add_one_flat[pytree-eager] 0.4526ms 0.2842ms 3.5183 KOps/s 3.4907 KOps/s $\color{#35bf28}+0.79\%$
test_compile_add_self_flat[tensordict-eager] 0.5520ms 0.2799ms 3.5727 KOps/s 3.6785 KOps/s $\color{#d91a1a}-2.87\%$
test_compile_add_self_flat[tensordict-compile] 0.3467ms 0.1851ms 5.4017 KOps/s 5.5043 KOps/s $\color{#d91a1a}-1.86\%$
test_compile_add_self_flat[tensorclass-eager] 0.1709ms 73.9918μs 13.5150 KOps/s 13.6863 KOps/s $\color{#d91a1a}-1.25\%$
test_compile_add_self_flat[tensorclass-compile] 0.1379ms 48.0046μs 20.8313 KOps/s 20.6289 KOps/s $\color{#35bf28}+0.98\%$
test_compile_add_self_flat[pytree-eager] 0.4466ms 0.2319ms 4.3127 KOps/s 4.2649 KOps/s $\color{#35bf28}+1.12\%$
test_compile_add_self_flat[pytree-compile] 0.3686ms 0.1789ms 5.5903 KOps/s 5.7659 KOps/s $\color{#d91a1a}-3.04\%$
test_compile_copy_flat[tensordict-compile] 0.2063ms 0.1117ms 8.9499 KOps/s 9.0741 KOps/s $\color{#d91a1a}-1.37\%$
test_compile_copy_flat[tensordict-eager] 0.1504ms 76.2909μs 13.1077 KOps/s 12.3806 KOps/s $\textbf{\color{#35bf28}+5.87\%}$
test_compile_copy_flat[pytree-compile] 0.1886ms 78.1301μs 12.7992 KOps/s 12.5602 KOps/s $\color{#35bf28}+1.90\%$
test_compile_copy_flat[pytree-eager] 0.1339ms 68.0955μs 14.6853 KOps/s 14.6621 KOps/s $\color{#35bf28}+0.16\%$
test_compile_assign_and_add[tensordict-compile] 0.2586ms 0.1923ms 5.1990 KOps/s 5.2268 KOps/s $\color{#d91a1a}-0.53\%$
test_compile_assign_and_add[tensordict-eager] 1.9332ms 1.7367ms 575.8187 Ops/s 561.4636 Ops/s $\color{#35bf28}+2.56\%$
test_compile_assign_and_add[pytree-compile] 0.3788ms 0.1929ms 5.1832 KOps/s 5.0960 KOps/s $\color{#35bf28}+1.71\%$
test_compile_assign_and_add[pytree-eager] 1.2150ms 1.0962ms 912.2289 Ops/s 902.9832 Ops/s $\color{#35bf28}+1.02\%$
test_compile_assign_and_add_stack[compile] 0.7477ms 0.4139ms 2.4158 KOps/s 2.4101 KOps/s $\color{#35bf28}+0.24\%$
test_compile_assign_and_add_stack[eager] 4.4658ms 4.0180ms 248.8810 Ops/s 244.4880 Ops/s $\color{#35bf28}+1.80\%$
test_compile_indexing[tensor-tensordict-compile] 82.7560μs 33.7392μs 29.6391 KOps/s 29.3661 KOps/s $\color{#35bf28}+0.93\%$
test_compile_indexing[tensor-tensordict-eager] 0.8048ms 47.4350μs 21.0815 KOps/s 20.5879 KOps/s $\color{#35bf28}+2.40\%$
test_compile_indexing[tensor-tensorclass-compile] 95.1280μs 29.9568μs 33.3814 KOps/s 34.2075 KOps/s $\color{#d91a1a}-2.41\%$
test_compile_indexing[tensor-tensorclass-eager] 69.4300μs 29.2082μs 34.2369 KOps/s 34.6861 KOps/s $\color{#d91a1a}-1.29\%$
test_compile_indexing[tensor-pytree-compile] 76.9740μs 29.7382μs 33.6268 KOps/s 32.6604 KOps/s $\color{#35bf28}+2.96\%$
test_compile_indexing[tensor-pytree-eager] 86.7330μs 28.9922μs 34.4920 KOps/s 35.2652 KOps/s $\color{#d91a1a}-2.19\%$
test_compile_indexing[slice-tensordict-compile] 0.1468ms 76.1901μs 13.1251 KOps/s 13.6208 KOps/s $\color{#d91a1a}-3.64\%$
test_compile_indexing[slice-tensordict-eager] 0.4188ms 27.8991μs 35.8435 KOps/s 35.6230 KOps/s $\color{#35bf28}+0.62\%$
test_compile_indexing[slice-tensorclass-compile] 0.1210ms 68.7964μs 14.5357 KOps/s 14.6928 KOps/s $\color{#d91a1a}-1.07\%$
test_compile_indexing[slice-tensorclass-eager] 69.2000μs 23.1130μs 43.2656 KOps/s 42.8497 KOps/s $\color{#35bf28}+0.97\%$
test_compile_indexing[slice-pytree-compile] 0.1536ms 68.6189μs 14.5732 KOps/s 14.7104 KOps/s $\color{#d91a1a}-0.93\%$
test_compile_indexing[slice-pytree-eager] 57.5280μs 23.3430μs 42.8393 KOps/s 42.8797 KOps/s $\color{#d91a1a}-0.09\%$
test_compile_indexing[int-tensordict-compile] 0.1337ms 74.5561μs 13.4127 KOps/s 13.7452 KOps/s $\color{#d91a1a}-2.42\%$
test_compile_indexing[int-tensordict-eager] 0.8845ms 27.3666μs 36.5409 KOps/s 36.2310 KOps/s $\color{#35bf28}+0.86\%$
test_compile_indexing[int-tensorclass-compile] 0.1265ms 68.1529μs 14.6729 KOps/s 14.8023 KOps/s $\color{#d91a1a}-0.87\%$
test_compile_indexing[int-tensorclass-eager] 66.1740μs 23.3290μs 42.8650 KOps/s 43.8347 KOps/s $\color{#d91a1a}-2.21\%$
test_compile_indexing[int-pytree-compile] 0.1419ms 68.8612μs 14.5220 KOps/s 15.2932 KOps/s $\textbf{\color{#d91a1a}-5.04\%}$
test_compile_indexing[int-pytree-eager] 95.2120μs 23.1996μs 43.1042 KOps/s 43.4891 KOps/s $\color{#d91a1a}-0.89\%$
test_mod_add[eager] 79.7490μs 25.2258μs 39.6420 KOps/s 37.7587 KOps/s $\color{#35bf28}+4.99\%$
test_mod_add[compile] 85.1590μs 38.8868μs 25.7157 KOps/s 26.2546 KOps/s $\color{#d91a1a}-2.05\%$
test_mod_add[compile-overhead] 81.6430μs 38.7507μs 25.8060 KOps/s 25.9805 KOps/s $\color{#d91a1a}-0.67\%$
test_mod_wrap[eager] 0.2885ms 0.2075ms 4.8196 KOps/s 4.7671 KOps/s $\color{#35bf28}+1.10\%$
test_mod_wrap[compile] 0.3341ms 0.2314ms 4.3217 KOps/s 4.2369 KOps/s $\color{#35bf28}+2.00\%$
test_mod_wrap[compile-overhead] 0.3575ms 0.2282ms 4.3820 KOps/s 4.2820 KOps/s $\color{#35bf28}+2.33\%$
test_mod_wrap_and_backward[eager] 12.5830ms 10.6842ms 93.5961 Ops/s 86.4794 Ops/s $\textbf{\color{#35bf28}+8.23\%}$
test_mod_wrap_and_backward[compile] 11.7456ms 10.5685ms 94.6212 Ops/s 81.3863 Ops/s $\textbf{\color{#35bf28}+16.26\%}$
test_mod_wrap_and_backward[compile-overhead] 11.2693ms 10.5597ms 94.6997 Ops/s 85.1734 Ops/s $\textbf{\color{#35bf28}+11.18\%}$
test_seq_add[eager] 0.2492ms 90.9683μs 10.9928 KOps/s 10.7897 KOps/s $\color{#35bf28}+1.88\%$
test_seq_add[compile] 0.1447ms 64.7227μs 15.4505 KOps/s 15.3636 KOps/s $\color{#35bf28}+0.57\%$
test_seq_add[compile-overhead] 0.1235ms 63.2175μs 15.8184 KOps/s 15.5001 KOps/s $\color{#35bf28}+2.05\%$
test_seq_wrap[eager] 0.5700ms 0.3853ms 2.5952 KOps/s 2.5800 KOps/s $\color{#35bf28}+0.59\%$
test_seq_wrap[compile] 0.5416ms 0.2712ms 3.6871 KOps/s 3.6673 KOps/s $\color{#35bf28}+0.54\%$
test_seq_wrap[compile-overhead] 0.4958ms 0.2653ms 3.7698 KOps/s 3.6526 KOps/s $\color{#35bf28}+3.21\%$
test_func_call_runtime[False-eager] 0.8506ms 0.5134ms 1.9477 KOps/s 1.9752 KOps/s $\color{#d91a1a}-1.39\%$
test_func_call_runtime[False-compile] 0.8203ms 0.4955ms 2.0180 KOps/s 1.9814 KOps/s $\color{#35bf28}+1.85\%$
test_func_call_runtime[False-compile-overhead] 0.9124ms 0.4954ms 2.0187 KOps/s 1.9733 KOps/s $\color{#35bf28}+2.30\%$
test_func_call_runtime[True-eager] 1.0643ms 0.7389ms 1.3534 KOps/s 1.3557 KOps/s $\color{#d91a1a}-0.17\%$
test_func_call_runtime[True-compile] 0.5901ms 0.5079ms 1.9690 KOps/s 1.9416 KOps/s $\color{#35bf28}+1.41\%$
test_func_call_runtime[True-compile-overhead] 0.6807ms 0.5031ms 1.9876 KOps/s 1.9389 KOps/s $\color{#35bf28}+2.51\%$
test_func_call_cm_runtime[False-eager] 0.8771ms 0.5114ms 1.9554 KOps/s 1.9751 KOps/s $\color{#d91a1a}-1.00\%$
test_func_call_cm_runtime[False-compile] 0.9287ms 0.4958ms 2.0170 KOps/s 1.9909 KOps/s $\color{#35bf28}+1.31\%$
test_func_call_cm_runtime[False-compile-overhead] 0.6023ms 0.4939ms 2.0246 KOps/s 2.0149 KOps/s $\color{#35bf28}+0.48\%$
test_func_call_cm_runtime[True-eager] 1.2099ms 0.8744ms 1.1436 KOps/s 1.1291 KOps/s $\color{#35bf28}+1.28\%$
test_func_call_cm_runtime[True-compile] 1.4620ms 0.7256ms 1.3781 KOps/s 1.3584 KOps/s $\color{#35bf28}+1.45\%$
test_func_call_cm_runtime[True-compile-overhead] 0.8803ms 0.7135ms 1.4016 KOps/s 1.3594 KOps/s $\color{#35bf28}+3.11\%$
test_vmap_func_call_cm_runtime[eager] 2.3859ms 1.8718ms 534.2461 Ops/s 529.1913 Ops/s $\color{#35bf28}+0.96\%$
test_vmap_func_call_cm_runtime[compile] 2.5380ms 1.8995ms 526.4413 Ops/s 509.5502 Ops/s $\color{#35bf28}+3.31\%$
test_vmap_func_call_cm_runtime[compile-overhead] 2.6699ms 1.9330ms 517.3307 Ops/s 507.0062 Ops/s $\color{#35bf28}+2.04\%$
test_distributed 0.2225ms 0.1268ms 7.8854 KOps/s 7.7157 KOps/s $\color{#35bf28}+2.20\%$
test_tdmodule 70.4620μs 18.0195μs 55.4953 KOps/s 51.1454 KOps/s $\textbf{\color{#35bf28}+8.51\%}$
test_tdmodule_dispatch 63.5590μs 36.2494μs 27.5867 KOps/s 26.2214 KOps/s $\textbf{\color{#35bf28}+5.21\%}$
test_tdseq 44.0830μs 20.9180μs 47.8058 KOps/s 44.2148 KOps/s $\textbf{\color{#35bf28}+8.12\%}$
test_tdseq_dispatch 68.9790μs 41.3659μs 24.1745 KOps/s 22.8464 KOps/s $\textbf{\color{#35bf28}+5.81\%}$
test_instantiation_functorch 1.7939ms 1.5572ms 642.1638 Ops/s 628.5840 Ops/s $\color{#35bf28}+2.16\%$
test_exec_functorch 0.2676ms 0.1810ms 5.5234 KOps/s 5.4126 KOps/s $\color{#35bf28}+2.05\%$
test_exec_functional_call 0.2674ms 0.1729ms 5.7824 KOps/s 5.9611 KOps/s $\color{#d91a1a}-3.00\%$
test_exec_td_decorator 0.4700ms 0.2334ms 4.2843 KOps/s 4.3537 KOps/s $\color{#d91a1a}-1.59\%$
test_vmap_mlp_speed_decorator[True-True] 0.9156ms 0.6311ms 1.5846 KOps/s 1.5386 KOps/s $\color{#35bf28}+2.99\%$
test_vmap_mlp_speed_decorator[True-False] 0.8806ms 0.6358ms 1.5729 KOps/s 1.5053 KOps/s $\color{#35bf28}+4.49\%$
test_vmap_mlp_speed_decorator[False-True] 0.8930ms 0.5232ms 1.9113 KOps/s 1.7732 KOps/s $\textbf{\color{#35bf28}+7.79\%}$
test_vmap_mlp_speed_decorator[False-False] 0.6557ms 0.5219ms 1.9162 KOps/s 1.6818 KOps/s $\textbf{\color{#35bf28}+13.93\%}$
test_to_module_speed[True] 2.3225ms 1.4403ms 694.3082 Ops/s 712.7067 Ops/s $\color{#d91a1a}-2.58\%$
test_to_module_speed[False] 1.6762ms 1.3895ms 719.6918 Ops/s 729.6667 Ops/s $\color{#d91a1a}-1.37\%$
test_tc_init 0.1110ms 45.9697μs 21.7534 KOps/s 21.3986 KOps/s $\color{#35bf28}+1.66\%$
test_tc_init_nested 0.1891ms 93.0233μs 10.7500 KOps/s 10.7240 KOps/s $\color{#35bf28}+0.24\%$
test_tc_first_layer_tensor 17.0210μs 1.5255μs 655.5219 KOps/s 657.9079 KOps/s $\color{#d91a1a}-0.36\%$
test_tc_first_layer_nontensor 47.3050μs 4.6337μs 215.8121 KOps/s 217.5200 KOps/s $\color{#d91a1a}-0.79\%$
test_tc_second_layer_tensor 25.2380μs 2.7567μs 362.7571 KOps/s 358.0951 KOps/s $\color{#35bf28}+1.30\%$
test_tc_second_layer_nontensor 30.5070μs 5.8913μs 169.7425 KOps/s 168.0896 KOps/s $\color{#35bf28}+0.98\%$
test_unbind 0.4712s 13.0604ms 76.5673 Ops/s 78.0733 Ops/s $\color{#d91a1a}-1.93\%$
test_full_like 8.0689ms 7.0764ms 141.3151 Ops/s 142.9196 Ops/s $\color{#d91a1a}-1.12\%$
test_zeros_like 3.0802ms 2.7143ms 368.4125 Ops/s 378.7263 Ops/s $\color{#d91a1a}-2.72\%$
test_ones_like 11.3424ms 6.0383ms 165.6101 Ops/s 308.0212 Ops/s $\textbf{\color{#d91a1a}-46.23\%}$
test_clone 16.6946ms 7.9779ms 125.3462 Ops/s 206.1655 Ops/s $\textbf{\color{#d91a1a}-39.20\%}$
test_squeeze 61.5550μs 12.6954μs 78.7687 KOps/s 80.5539 KOps/s $\color{#d91a1a}-2.22\%$
test_unsqueeze 0.3435ms 93.2422μs 10.7248 KOps/s 10.8858 KOps/s $\color{#d91a1a}-1.48\%$
test_split 0.3748ms 0.1937ms 5.1633 KOps/s 5.1639 KOps/s $\color{#d91a1a}-0.01\%$
test_permute 0.3770ms 0.2156ms 4.6376 KOps/s 4.5404 KOps/s $\color{#35bf28}+2.14\%$
test_stack 31.3457ms 24.9280ms 40.1156 Ops/s 40.3925 Ops/s $\color{#d91a1a}-0.69\%$
test_cat 28.2500ms 24.4273ms 40.9379 Ops/s 40.7908 Ops/s $\color{#35bf28}+0.36\%$

Copy link

github-actions bot commented Oct 2, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 218. Improved: $\large\color{#35bf28}15$. Worsened: $\large\color{#d91a1a}13$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1492ms 16.6220μs 60.1611 KOps/s 60.1800 KOps/s $\color{#d91a1a}-0.03\%$
test_plain_set_stack_nested 0.3915ms 16.6041μs 60.2262 KOps/s 60.7220 KOps/s $\color{#d91a1a}-0.82\%$
test_plain_set_nested_inplace 47.7510μs 17.7504μs 56.3368 KOps/s 56.2550 KOps/s $\color{#35bf28}+0.15\%$
test_plain_set_stack_nested_inplace 0.4007ms 17.7847μs 56.2281 KOps/s 57.8237 KOps/s $\color{#d91a1a}-2.76\%$
test_items 0.3840ms 2.8619μs 349.4200 KOps/s 345.3645 KOps/s $\color{#35bf28}+1.17\%$
test_items_nested 0.7300ms 0.3498ms 2.8585 KOps/s 2.8467 KOps/s $\color{#35bf28}+0.41\%$
test_items_nested_locked 0.4096ms 0.3485ms 2.8694 KOps/s 2.8385 KOps/s $\color{#35bf28}+1.09\%$
test_items_nested_leaf 0.4449ms 62.6363μs 15.9652 KOps/s 16.0078 KOps/s $\color{#d91a1a}-0.27\%$
test_items_stack_nested 0.5298ms 0.3466ms 2.8855 KOps/s 2.8563 KOps/s $\color{#35bf28}+1.02\%$
test_items_stack_nested_leaf 0.1020ms 62.6751μs 15.9553 KOps/s 15.4977 KOps/s $\color{#35bf28}+2.95\%$
test_items_stack_nested_locked 0.5266ms 0.3513ms 2.8467 KOps/s 2.8130 KOps/s $\color{#35bf28}+1.20\%$
test_keys 22.6800μs 3.4876μs 286.7339 KOps/s 272.3354 KOps/s $\textbf{\color{#35bf28}+5.29\%}$
test_keys_nested 98.1120μs 70.2703μs 14.2308 KOps/s 14.0143 KOps/s $\color{#35bf28}+1.54\%$
test_keys_nested_locked 2.3870ms 76.1699μs 13.1285 KOps/s 12.9691 KOps/s $\color{#35bf28}+1.23\%$
test_keys_nested_leaf 0.1045ms 61.8880μs 16.1582 KOps/s 15.9934 KOps/s $\color{#35bf28}+1.03\%$
test_keys_stack_nested 0.1155ms 69.7953μs 14.3276 KOps/s 14.0778 KOps/s $\color{#35bf28}+1.77\%$
test_keys_stack_nested_leaf 98.7510μs 61.7131μs 16.2040 KOps/s 15.8822 KOps/s $\color{#35bf28}+2.03\%$
test_keys_stack_nested_locked 0.1275ms 76.8881μs 13.0059 KOps/s 13.0091 KOps/s $\color{#d91a1a}-0.02\%$
test_values 4.4683μs 0.8344μs 1.1985 MOps/s 1.1587 MOps/s $\color{#35bf28}+3.43\%$
test_values_nested 81.2510μs 48.4438μs 20.6425 KOps/s 20.5189 KOps/s $\color{#35bf28}+0.60\%$
test_values_nested_locked 78.7020μs 50.7065μs 19.7213 KOps/s 19.7910 KOps/s $\color{#d91a1a}-0.35\%$
test_values_nested_leaf 75.4020μs 42.4023μs 23.5836 KOps/s 23.3607 KOps/s $\color{#35bf28}+0.95\%$
test_values_stack_nested 85.3010μs 49.0567μs 20.3846 KOps/s 20.0711 KOps/s $\color{#35bf28}+1.56\%$
test_values_stack_nested_leaf 73.5810μs 43.1496μs 23.1752 KOps/s 23.0483 KOps/s $\color{#35bf28}+0.55\%$
test_values_stack_nested_locked 80.8810μs 50.9508μs 19.6268 KOps/s 19.2803 KOps/s $\color{#35bf28}+1.80\%$
test_membership 1.5926μs 0.5106μs 1.9584 MOps/s 2.0040 MOps/s $\color{#d91a1a}-2.27\%$
test_membership_nested 17.4905μs 1.8965μs 527.2736 KOps/s 511.3687 KOps/s $\color{#35bf28}+3.11\%$
test_membership_nested_leaf 13.8467μs 1.8638μs 536.5457 KOps/s 537.8368 KOps/s $\color{#d91a1a}-0.24\%$
test_membership_stacked_nested 33.9010μs 1.9327μs 517.4216 KOps/s 510.1132 KOps/s $\color{#35bf28}+1.43\%$
test_membership_stacked_nested_leaf 25.9400μs 1.9451μs 514.1034 KOps/s 506.1495 KOps/s $\color{#35bf28}+1.57\%$
test_membership_nested_last 26.7700μs 3.0264μs 330.4217 KOps/s 336.4528 KOps/s $\color{#d91a1a}-1.79\%$
test_membership_nested_leaf_last 30.4410μs 3.0206μs 331.0604 KOps/s 331.9962 KOps/s $\color{#d91a1a}-0.28\%$
test_membership_stacked_nested_last 29.0910μs 2.9971μs 333.6549 KOps/s 122.1304 KOps/s $\textbf{\color{#35bf28}+173.20\%}$
test_membership_stacked_nested_leaf_last 27.0900μs 2.9952μs 333.8681 KOps/s 122.2034 KOps/s $\textbf{\color{#35bf28}+173.21\%}$
test_nested_getleaf 31.2400μs 6.1107μs 163.6471 KOps/s 163.2516 KOps/s $\color{#35bf28}+0.24\%$
test_nested_get 33.2700μs 5.7740μs 173.1901 KOps/s 173.1733 KOps/s $+0.01\%$
test_stacked_getleaf 37.1500μs 6.0648μs 164.8849 KOps/s 165.5486 KOps/s $\color{#d91a1a}-0.40\%$
test_stacked_get 32.7710μs 5.6987μs 175.4791 KOps/s 175.7518 KOps/s $\color{#d91a1a}-0.16\%$
test_nested_getitemleaf 40.7510μs 6.1463μs 162.7006 KOps/s 161.2979 KOps/s $\color{#35bf28}+0.87\%$
test_nested_getitem 31.0010μs 5.8957μs 169.6145 KOps/s 171.7504 KOps/s $\color{#d91a1a}-1.24\%$
test_stacked_getitemleaf 39.2210μs 6.1728μs 162.0014 KOps/s 162.4643 KOps/s $\color{#d91a1a}-0.28\%$
test_stacked_getitem 34.0810μs 5.7420μs 174.1565 KOps/s 172.1604 KOps/s $\color{#35bf28}+1.16\%$
test_lock_nested 4.8928ms 0.4290ms 2.3311 KOps/s 2.3297 KOps/s $\color{#35bf28}+0.06\%$
test_lock_stack_nested 0.5316ms 0.3928ms 2.5461 KOps/s 2.6206 KOps/s $\color{#d91a1a}-2.84\%$
test_unlock_nested 0.7662ms 0.3645ms 2.7438 KOps/s 2.7350 KOps/s $\color{#35bf28}+0.32\%$
test_unlock_stack_nested 0.4063ms 0.3299ms 3.0315 KOps/s 3.1229 KOps/s $\color{#d91a1a}-2.93\%$
test_flatten_speed 0.1553ms 76.6775μs 13.0416 KOps/s 12.9138 KOps/s $\color{#35bf28}+0.99\%$
test_unflatten_speed 0.3905ms 0.3265ms 3.0625 KOps/s 3.0880 KOps/s $\color{#d91a1a}-0.83\%$
test_common_ops 1.4844ms 1.2287ms 813.8540 Ops/s 807.8529 Ops/s $\color{#35bf28}+0.74\%$
test_creation 24.0800μs 1.5024μs 665.6112 KOps/s 667.8802 KOps/s $\color{#d91a1a}-0.34\%$
test_creation_empty 42.9400μs 15.3886μs 64.9831 KOps/s 67.2230 KOps/s $\color{#d91a1a}-3.33\%$
test_creation_nested_1 48.3610μs 17.0169μs 58.7651 KOps/s 60.2488 KOps/s $\color{#d91a1a}-2.46\%$
test_creation_nested_2 50.8000μs 19.7367μs 50.6671 KOps/s 51.6879 KOps/s $\color{#d91a1a}-1.97\%$
test_clone 72.4110μs 27.9791μs 35.7410 KOps/s 34.9239 KOps/s $\color{#35bf28}+2.34\%$
test_getitem[int] 1.2945ms 15.8342μs 63.1544 KOps/s 63.2800 KOps/s $\color{#d91a1a}-0.20\%$
test_getitem[slice_int] 0.1201ms 26.6973μs 37.4569 KOps/s 36.4511 KOps/s $\color{#35bf28}+2.76\%$
test_getitem[range] 0.2179ms 0.1078ms 9.2725 KOps/s 9.2464 KOps/s $\color{#35bf28}+0.28\%$
test_getitem[tuple] 0.1202ms 23.6495μs 42.2842 KOps/s 42.0702 KOps/s $\color{#35bf28}+0.51\%$
test_getitem[list] 0.1936ms 96.5096μs 10.3617 KOps/s 10.0156 KOps/s $\color{#35bf28}+3.46\%$
test_setitem_dim[int] 67.6620μs 44.2192μs 22.6146 KOps/s 21.5801 KOps/s $\color{#35bf28}+4.79\%$
test_setitem_dim[slice_int] 90.6910μs 64.1498μs 15.5885 KOps/s 15.3227 KOps/s $\color{#35bf28}+1.73\%$
test_setitem_dim[range] 0.1503ms 0.1233ms 8.1119 KOps/s 7.8415 KOps/s $\color{#35bf28}+3.45\%$
test_setitem_dim[tuple] 96.3610μs 58.2458μs 17.1686 KOps/s 17.0143 KOps/s $\color{#35bf28}+0.91\%$
test_setitem 79.6710μs 40.2458μs 24.8473 KOps/s 24.1776 KOps/s $\color{#35bf28}+2.77\%$
test_set 87.5710μs 42.0251μs 23.7953 KOps/s 25.2149 KOps/s $\textbf{\color{#d91a1a}-5.63\%}$
test_set_shared 0.3822ms 54.4254μs 18.3738 KOps/s 18.6017 KOps/s $\color{#d91a1a}-1.23\%$
test_update 90.0410μs 49.7449μs 20.1026 KOps/s 19.5427 KOps/s $\color{#35bf28}+2.86\%$
test_update_nested 99.2710μs 59.7307μs 16.7418 KOps/s 16.9511 KOps/s $\color{#d91a1a}-1.23\%$
test_update__nested 0.1608ms 60.8926μs 16.4224 KOps/s 15.0337 KOps/s $\textbf{\color{#35bf28}+9.24\%}$
test_set_nested 82.9210μs 42.9205μs 23.2989 KOps/s 23.4258 KOps/s $\color{#d91a1a}-0.54\%$
test_set_nested_new 91.8010μs 48.6058μs 20.5737 KOps/s 21.7995 KOps/s $\textbf{\color{#d91a1a}-5.62\%}$
test_select 0.1020ms 63.4615μs 15.7576 KOps/s 16.8386 KOps/s $\textbf{\color{#d91a1a}-6.42\%}$
test_select_nested 73.5410μs 42.2931μs 23.6445 KOps/s 23.9804 KOps/s $\color{#d91a1a}-1.40\%$
test_exclude_nested 90.4510μs 60.1127μs 16.6354 KOps/s 16.7186 KOps/s $\color{#d91a1a}-0.50\%$
test_empty[True] 0.3933ms 0.2661ms 3.7584 KOps/s 3.7622 KOps/s $\color{#d91a1a}-0.10\%$
test_empty[False] 2.8011μs 0.7431μs 1.3458 MOps/s 1.3458 MOps/s $-0.00\%$
test_to 63.6410μs 26.5139μs 37.7160 KOps/s 38.2096 KOps/s $\color{#d91a1a}-1.29\%$
test_to_nonblocking 61.8210μs 25.2478μs 39.6074 KOps/s 39.5748 KOps/s $\color{#35bf28}+0.08\%$
test_unbind_speed 0.3257ms 0.2779ms 3.5982 KOps/s 3.5354 KOps/s $\color{#35bf28}+1.78\%$
test_unbind_speed_stack0 0.4089ms 0.2736ms 3.6556 KOps/s 3.6795 KOps/s $\color{#d91a1a}-0.65\%$
test_unbind_speed_stack1 92.2033ms 0.7116ms 1.4053 KOps/s 1.5578 KOps/s $\textbf{\color{#d91a1a}-9.79\%}$
test_split 93.5394ms 2.1605ms 462.8644 Ops/s 455.7592 Ops/s $\color{#35bf28}+1.56\%$
test_chunk 94.2277ms 2.1700ms 460.8229 Ops/s 455.3258 Ops/s $\color{#35bf28}+1.21\%$
test_creation[device0] 0.3565ms 0.1258ms 7.9518 KOps/s 7.9602 KOps/s $\color{#d91a1a}-0.11\%$
test_creation_from_tensor 0.3848ms 0.1325ms 7.5472 KOps/s 7.5491 KOps/s $\color{#d91a1a}-0.02\%$
test_add_one[memmap_tensor0] 0.2354ms 8.8253μs 113.3106 KOps/s 114.2236 KOps/s $\color{#d91a1a}-0.80\%$
test_contiguous[memmap_tensor0] 17.5100μs 2.1328μs 468.8716 KOps/s 474.1445 KOps/s $\color{#d91a1a}-1.11\%$
test_stack[memmap_tensor0] 37.0100μs 6.6570μs 150.2184 KOps/s 146.6851 KOps/s $\color{#35bf28}+2.41\%$
test_memmaptd_index 1.4572ms 0.4280ms 2.3365 KOps/s 2.3407 KOps/s $\color{#d91a1a}-0.18\%$
test_memmaptd_index_astensor 0.7669ms 0.5002ms 1.9990 KOps/s 1.9895 KOps/s $\color{#35bf28}+0.48\%$
test_memmaptd_index_op 1.4355ms 1.0173ms 983.0127 Ops/s 972.8972 Ops/s $\color{#35bf28}+1.04\%$
test_serialize_model 0.1308s 0.1297s 7.7121 Ops/s 7.6537 Ops/s $\color{#35bf28}+0.76\%$
test_serialize_model_pickle 1.3700s 1.2181s 0.8210 Ops/s 0.8219 Ops/s $\color{#d91a1a}-0.11\%$
test_serialize_weights 0.2216s 0.1426s 7.0112 Ops/s 7.6755 Ops/s $\textbf{\color{#d91a1a}-8.65\%}$
test_serialize_weights_returnearly 0.2150s 57.0347ms 17.5332 Ops/s 20.8190 Ops/s $\textbf{\color{#d91a1a}-15.78\%}$
test_serialize_weights_pickle 1.3464s 1.1858s 0.8433 Ops/s 0.8369 Ops/s $\color{#35bf28}+0.77\%$
test_reshape_pytree 79.5110μs 37.1218μs 26.9383 KOps/s 28.2268 KOps/s $\color{#d91a1a}-4.56\%$
test_reshape_td 0.1698ms 44.4624μs 22.4909 KOps/s 23.8956 KOps/s $\textbf{\color{#d91a1a}-5.88\%}$
test_view_pytree 79.4510μs 37.4137μs 26.7282 KOps/s 28.8165 KOps/s $\textbf{\color{#d91a1a}-7.25\%}$
test_view_td 97.2410μs 49.1858μs 20.3311 KOps/s 22.0470 KOps/s $\textbf{\color{#d91a1a}-7.78\%}$
test_unbind_pytree 78.0720μs 34.9508μs 28.6117 KOps/s 29.2415 KOps/s $\color{#d91a1a}-2.15\%$
test_unbind_td 0.5457ms 42.3191μs 23.6300 KOps/s 23.2616 KOps/s $\color{#35bf28}+1.58\%$
test_split_pytree 73.7410μs 46.0915μs 21.6960 KOps/s 21.7559 KOps/s $\color{#d91a1a}-0.28\%$
test_split_td 0.7132ms 55.0364μs 18.1698 KOps/s 17.4699 KOps/s $\color{#35bf28}+4.01\%$
test_add_pytree 96.7820μs 55.5916μs 17.9883 KOps/s 17.0890 KOps/s $\textbf{\color{#35bf28}+5.26\%}$
test_add_td 0.1223ms 89.4392μs 11.1808 KOps/s 11.0456 KOps/s $\color{#35bf28}+1.22\%$
test_compile_add_one_nested[tensordict-compile] 0.2574ms 0.1589ms 6.2927 KOps/s 6.0436 KOps/s $\color{#35bf28}+4.12\%$
test_compile_add_one_nested[tensordict-eager] 0.2979ms 0.1597ms 6.2616 KOps/s 6.2927 KOps/s $\color{#d91a1a}-0.49\%$
test_compile_add_one_nested[pytree-compile] 0.1960ms 0.1507ms 6.6339 KOps/s 6.5450 KOps/s $\color{#35bf28}+1.36\%$
test_compile_add_one_nested[pytree-eager] 0.2309ms 0.1816ms 5.5078 KOps/s 5.4522 KOps/s $\color{#35bf28}+1.02\%$
test_compile_copy_nested[tensordict-compile] 58.2010μs 21.7524μs 45.9718 KOps/s 47.8408 KOps/s $\color{#d91a1a}-3.91\%$
test_compile_copy_nested[tensordict-eager] 0.1052ms 48.3999μs 20.6612 KOps/s 20.8042 KOps/s $\color{#d91a1a}-0.69\%$
test_compile_copy_nested[pytree-compile] 1.1456ms 65.5340μs 15.2593 KOps/s 15.4350 KOps/s $\color{#d91a1a}-1.14\%$
test_compile_copy_nested[pytree-eager] 87.2520μs 50.3415μs 19.8643 KOps/s 20.2174 KOps/s $\color{#d91a1a}-1.75\%$
test_compile_add_one_flat[tensordict-compile] 0.3623ms 0.3137ms 3.1881 KOps/s 3.1781 KOps/s $\color{#35bf28}+0.32\%$
test_compile_add_one_flat[tensordict-eager] 0.3157ms 0.2313ms 4.3236 KOps/s 4.3156 KOps/s $\color{#35bf28}+0.19\%$
test_compile_add_one_flat[tensorclass-compile] 0.2011ms 0.1283ms 7.7942 KOps/s 7.8385 KOps/s $\color{#d91a1a}-0.56\%$
test_compile_add_one_flat[tensorclass-eager] 0.1279ms 66.8238μs 14.9647 KOps/s 15.1896 KOps/s $\color{#d91a1a}-1.48\%$
test_compile_add_one_flat[pytree-compile] 0.3994ms 0.3230ms 3.0956 KOps/s 3.0878 KOps/s $\color{#35bf28}+0.25\%$
test_compile_add_one_flat[pytree-eager] 0.7446ms 0.6202ms 1.6124 KOps/s 1.6027 KOps/s $\color{#35bf28}+0.61\%$
test_compile_add_self_flat[tensordict-eager] 0.4150ms 0.2820ms 3.5467 KOps/s 3.5173 KOps/s $\color{#35bf28}+0.83\%$
test_compile_add_self_flat[tensordict-compile] 0.3656ms 0.3150ms 3.1746 KOps/s 3.1778 KOps/s $\color{#d91a1a}-0.10\%$
test_compile_add_self_flat[tensorclass-eager] 0.1909ms 79.1544μs 12.6335 KOps/s 13.0880 KOps/s $\color{#d91a1a}-3.47\%$
test_compile_add_self_flat[tensorclass-compile] 0.1802ms 0.1314ms 7.6098 KOps/s 7.8336 KOps/s $\color{#d91a1a}-2.86\%$
test_compile_add_self_flat[pytree-eager] 0.6794ms 0.5673ms 1.7626 KOps/s 1.8924 KOps/s $\textbf{\color{#d91a1a}-6.86\%}$
test_compile_add_self_flat[pytree-compile] 0.3729ms 0.3217ms 3.1087 KOps/s 3.0885 KOps/s $\color{#35bf28}+0.66\%$
test_compile_copy_flat[tensordict-compile] 63.1610μs 20.5378μs 48.6907 KOps/s 50.1036 KOps/s $\color{#d91a1a}-2.82\%$
test_compile_copy_flat[tensordict-eager] 72.5620μs 38.3241μs 26.0932 KOps/s 26.0229 KOps/s $\color{#35bf28}+0.27\%$
test_compile_copy_flat[pytree-compile] 0.1010ms 69.2937μs 14.4313 KOps/s 14.4083 KOps/s $\color{#35bf28}+0.16\%$
test_compile_copy_flat[pytree-eager] 96.2410μs 51.6226μs 19.3713 KOps/s 19.4751 KOps/s $\color{#d91a1a}-0.53\%$
test_compile_assign_and_add[tensordict-compile] 2.3452ms 0.8124ms 1.2310 KOps/s 1.1297 KOps/s $\textbf{\color{#35bf28}+8.97\%}$
test_compile_assign_and_add[tensordict-eager] 3.3313ms 3.2017ms 312.3356 Ops/s 309.4735 Ops/s $\color{#35bf28}+0.92\%$
test_compile_assign_and_add[pytree-compile] 2.4001ms 0.8262ms 1.2103 KOps/s 1.1191 KOps/s $\textbf{\color{#35bf28}+8.15\%}$
test_compile_assign_and_add[pytree-eager] 3.1621ms 3.0909ms 323.5282 Ops/s 318.2927 Ops/s $\color{#35bf28}+1.64\%$
test_compile_indexing[tensor-tensordict-compile] 0.2579ms 0.1156ms 8.6506 KOps/s 8.5982 KOps/s $\color{#35bf28}+0.61\%$
test_compile_indexing[tensor-tensordict-eager] 0.1845ms 58.4270μs 17.1154 KOps/s 16.5010 KOps/s $\color{#35bf28}+3.72\%$
test_compile_indexing[tensor-tensorclass-compile] 0.2251ms 0.1156ms 8.6510 KOps/s 8.9649 KOps/s $\color{#d91a1a}-3.50\%$
test_compile_indexing[tensor-tensorclass-eager] 88.7620μs 47.2184μs 21.1782 KOps/s 21.5347 KOps/s $\color{#d91a1a}-1.66\%$
test_compile_indexing[tensor-pytree-compile] 0.1565ms 0.1168ms 8.5582 KOps/s 8.4444 KOps/s $\color{#35bf28}+1.35\%$
test_compile_indexing[tensor-pytree-eager] 0.1129ms 47.0080μs 21.2730 KOps/s 21.8449 KOps/s $\color{#d91a1a}-2.62\%$
test_compile_indexing[slice-tensordict-compile] 0.1854ms 0.1486ms 6.7305 KOps/s 6.9993 KOps/s $\color{#d91a1a}-3.84\%$
test_compile_indexing[slice-tensordict-eager] 0.1591ms 26.4017μs 37.8763 KOps/s 39.9515 KOps/s $\textbf{\color{#d91a1a}-5.19\%}$
test_compile_indexing[slice-tensorclass-compile] 0.1829ms 0.1431ms 6.9873 KOps/s 7.3201 KOps/s $\color{#d91a1a}-4.55\%$
test_compile_indexing[slice-tensorclass-eager] 50.8710μs 21.5494μs 46.4050 KOps/s 48.4246 KOps/s $\color{#d91a1a}-4.17\%$
test_compile_indexing[slice-pytree-compile] 0.1807ms 0.1424ms 7.0222 KOps/s 7.1238 KOps/s $\color{#d91a1a}-1.43\%$
test_compile_indexing[slice-pytree-eager] 52.8610μs 20.5452μs 48.6732 KOps/s 48.1657 KOps/s $\color{#35bf28}+1.05\%$
test_compile_indexing[int-tensordict-compile] 0.3022ms 0.1468ms 6.8127 KOps/s 6.6342 KOps/s $\color{#35bf28}+2.69\%$
test_compile_indexing[int-tensordict-eager] 0.5085ms 24.9081μs 40.1475 KOps/s 37.9919 KOps/s $\textbf{\color{#35bf28}+5.67\%}$
test_compile_indexing[int-tensorclass-compile] 0.2138ms 0.1409ms 7.0965 KOps/s 6.9236 KOps/s $\color{#35bf28}+2.50\%$
test_compile_indexing[int-tensorclass-eager] 52.5210μs 20.9929μs 47.6351 KOps/s 48.7406 KOps/s $\color{#d91a1a}-2.27\%$
test_compile_indexing[int-pytree-compile] 0.2092ms 0.1406ms 7.1133 KOps/s 7.1447 KOps/s $\color{#d91a1a}-0.44\%$
test_compile_indexing[int-pytree-eager] 56.2500μs 20.7605μs 48.1684 KOps/s 48.6143 KOps/s $\color{#d91a1a}-0.92\%$
test_mod_add[eager] 73.8810μs 34.2619μs 29.1869 KOps/s 30.9765 KOps/s $\textbf{\color{#d91a1a}-5.78\%}$
test_mod_add[compile] 0.2290ms 79.9246μs 12.5118 KOps/s 12.4223 KOps/s $\color{#35bf28}+0.72\%$
test_mod_add[compile-overhead] 0.3045ms 0.1489ms 6.7154 KOps/s 6.4663 KOps/s $\color{#35bf28}+3.85\%$
test_mod_wrap[eager] 0.3109ms 0.2490ms 4.0158 KOps/s 4.0841 KOps/s $\color{#d91a1a}-1.67\%$
test_mod_wrap[compile] 1.3315ms 0.2904ms 3.4439 KOps/s 3.4179 KOps/s $\color{#35bf28}+0.76\%$
test_mod_wrap[compile-overhead] 7.7486ms 4.0875ms 244.6475 Ops/s 241.9572 Ops/s $\color{#35bf28}+1.11\%$
test_mod_wrap_and_backward[eager] 1.4364ms 1.2963ms 771.4522 Ops/s 716.7777 Ops/s $\textbf{\color{#35bf28}+7.63\%}$
test_mod_wrap_and_backward[compile] 1.5502ms 1.2968ms 771.1442 Ops/s 709.8358 Ops/s $\textbf{\color{#35bf28}+8.64\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3115ms 0.8812ms 1.1348 KOps/s 1.0026 KOps/s $\textbf{\color{#35bf28}+13.19\%}$
test_seq_add[eager] 0.1623ms 98.2292μs 10.1803 KOps/s 10.0477 KOps/s $\color{#35bf28}+1.32\%$
test_seq_add[compile] 0.1479ms 89.7097μs 11.1471 KOps/s 10.4982 KOps/s $\textbf{\color{#35bf28}+6.18\%}$
test_seq_add[compile-overhead] 0.1672ms 0.1219ms 8.2060 KOps/s 8.1553 KOps/s $\color{#35bf28}+0.62\%$
test_seq_wrap[eager] 0.4377ms 0.3674ms 2.7220 KOps/s 2.5625 KOps/s $\textbf{\color{#35bf28}+6.23\%}$
test_seq_wrap[compile] 0.3672ms 0.3074ms 3.2532 KOps/s 3.2171 KOps/s $\color{#35bf28}+1.12\%$
test_seq_wrap[compile-overhead] 0.2612ms 0.2150ms 4.6519 KOps/s 4.5882 KOps/s $\color{#35bf28}+1.39\%$
test_func_call_runtime[False-eager] 0.8794ms 0.7321ms 1.3659 KOps/s 1.3978 KOps/s $\color{#d91a1a}-2.28\%$
test_func_call_runtime[False-compile] 0.8584ms 0.7728ms 1.2939 KOps/s 1.2749 KOps/s $\color{#35bf28}+1.49\%$
test_func_call_runtime[False-compile-overhead] 0.4061ms 0.3533ms 2.8306 KOps/s 2.8387 KOps/s $\color{#d91a1a}-0.28\%$
test_func_call_runtime[True-eager] 0.9513ms 0.8763ms 1.1412 KOps/s 1.1440 KOps/s $\color{#d91a1a}-0.25\%$
test_func_call_runtime[True-compile] 0.8688ms 0.7946ms 1.2586 KOps/s 1.2478 KOps/s $\color{#35bf28}+0.86\%$
test_func_call_runtime[True-compile-overhead] 0.4198ms 0.3724ms 2.6853 KOps/s 2.6702 KOps/s $\color{#35bf28}+0.57\%$
test_func_call_cm_runtime[False-eager] 0.7855ms 0.7222ms 1.3847 KOps/s 1.4131 KOps/s $\color{#d91a1a}-2.01\%$
test_func_call_cm_runtime[False-compile] 0.8522ms 0.7718ms 1.2956 KOps/s 1.2829 KOps/s $\color{#35bf28}+0.99\%$
test_func_call_cm_runtime[False-compile-overhead] 0.3966ms 0.3528ms 2.8341 KOps/s 2.8029 KOps/s $\color{#35bf28}+1.11\%$
test_func_call_cm_runtime[True-eager] 1.0859ms 0.9687ms 1.0324 KOps/s 1.0221 KOps/s $\color{#35bf28}+1.01\%$
test_func_call_cm_runtime[True-compile] 0.8756ms 0.8172ms 1.2236 KOps/s 1.2106 KOps/s $\color{#35bf28}+1.08\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5236ms 0.3974ms 2.5167 KOps/s 2.4893 KOps/s $\color{#35bf28}+1.10\%$
test_vmap_func_call_cm_runtime[eager] 2.4836ms 2.0315ms 492.2496 Ops/s 486.2142 Ops/s $\color{#35bf28}+1.24\%$
test_vmap_func_call_cm_runtime[compile] 0.9432ms 0.8305ms 1.2041 KOps/s 1.1601 KOps/s $\color{#35bf28}+3.79\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4622ms 0.4046ms 2.4714 KOps/s 2.4912 KOps/s $\color{#d91a1a}-0.79\%$
test_distributed 5.3532ms 0.2268ms 4.4092 KOps/s 8.8368 KOps/s $\textbf{\color{#d91a1a}-50.10\%}$
test_tdmodule 0.1266ms 14.7310μs 67.8839 KOps/s 62.6205 KOps/s $\textbf{\color{#35bf28}+8.41\%}$
test_tdmodule_dispatch 51.5210μs 28.5277μs 35.0537 KOps/s 35.4019 KOps/s $\color{#d91a1a}-0.98\%$
test_tdseq 36.0110μs 15.7330μs 63.5607 KOps/s 61.4916 KOps/s $\color{#35bf28}+3.36\%$
test_tdseq_dispatch 51.3610μs 31.5879μs 31.6577 KOps/s 30.7304 KOps/s $\color{#35bf28}+3.02\%$
test_instantiation_functorch 2.0047ms 1.8253ms 547.8561 Ops/s 532.0969 Ops/s $\color{#35bf28}+2.96\%$
test_exec_functorch 0.2400ms 0.2018ms 4.9551 KOps/s 4.8506 KOps/s $\color{#35bf28}+2.15\%$
test_exec_functional_call 0.3101ms 0.1992ms 5.0203 KOps/s 4.9600 KOps/s $\color{#35bf28}+1.22\%$
test_exec_td_decorator 0.4277ms 0.2521ms 3.9671 KOps/s 3.9174 KOps/s $\color{#35bf28}+1.27\%$
test_vmap_mlp_speed_decorator[True-True] 0.7782ms 0.6603ms 1.5146 KOps/s 1.5094 KOps/s $\color{#35bf28}+0.34\%$
test_vmap_mlp_speed_decorator[True-False] 0.7862ms 0.6612ms 1.5124 KOps/s 1.4979 KOps/s $\color{#35bf28}+0.97\%$
test_vmap_mlp_speed_decorator[False-True] 0.6873ms 0.5799ms 1.7245 KOps/s 1.7208 KOps/s $\color{#35bf28}+0.22\%$
test_vmap_mlp_speed_decorator[False-False] 0.6803ms 0.5805ms 1.7226 KOps/s 1.7088 KOps/s $\color{#35bf28}+0.81\%$
test_vmap_transformer_speed_decorator[True-True] 19.0034ms 18.9060ms 52.8934 Ops/s 53.0225 Ops/s $\color{#d91a1a}-0.24\%$
test_vmap_transformer_speed_decorator[True-False] 18.9540ms 18.8768ms 52.9752 Ops/s 52.9012 Ops/s $\color{#35bf28}+0.14\%$
test_vmap_transformer_speed_decorator[False-True] 18.8195ms 18.7179ms 53.4248 Ops/s 53.3415 Ops/s $\color{#35bf28}+0.16\%$
test_vmap_transformer_speed_decorator[False-False] 18.8365ms 18.7691ms 53.2791 Ops/s 53.1122 Ops/s $\color{#35bf28}+0.31\%$
test_to_module_speed[True] 1.5216ms 1.0036ms 996.4160 Ops/s 1.0004 KOps/s $\color{#d91a1a}-0.40\%$
test_to_module_speed[False] 1.3961ms 0.9840ms 1.0163 KOps/s 1.0267 KOps/s $\color{#d91a1a}-1.01\%$
test_tc_init 68.4910μs 35.9929μs 27.7833 KOps/s 28.8008 KOps/s $\color{#d91a1a}-3.53\%$
test_tc_init_nested 0.1114ms 75.7604μs 13.1995 KOps/s 13.8037 KOps/s $\color{#d91a1a}-4.38\%$
test_tc_first_layer_tensor 4.0771μs 0.6670μs 1.4991 MOps/s 1.4900 MOps/s $\color{#35bf28}+0.61\%$
test_tc_first_layer_nontensor 17.1200μs 2.2372μs 446.9893 KOps/s 451.0545 KOps/s $\color{#d91a1a}-0.90\%$
test_tc_second_layer_tensor 9.0027μs 1.3645μs 732.8806 KOps/s 737.5249 KOps/s $\color{#d91a1a}-0.63\%$
test_tc_second_layer_nontensor 28.0800μs 2.9686μs 336.8642 KOps/s 342.5078 KOps/s $\color{#d91a1a}-1.65\%$
test_unbind 0.1904s 9.4685ms 105.6136 Ops/s 92.9038 Ops/s $\textbf{\color{#35bf28}+13.68\%}$
test_full_like 0.6592ms 0.5756ms 1.7372 KOps/s 1.7467 KOps/s $\color{#d91a1a}-0.54\%$
test_zeros_like 0.2779ms 0.1980ms 5.0511 KOps/s 5.0530 KOps/s $\color{#d91a1a}-0.04\%$
test_ones_like 0.2341ms 0.1978ms 5.0562 KOps/s 5.0567 KOps/s $-0.01\%$
test_clone 0.4437ms 0.4149ms 2.4105 KOps/s 2.4106 KOps/s $-0.01\%$
test_squeeze 86.7220μs 9.8818μs 101.1957 KOps/s 103.0430 KOps/s $\color{#d91a1a}-1.79\%$
test_unsqueeze 0.2206ms 73.2976μs 13.6430 KOps/s 13.1024 KOps/s $\color{#35bf28}+4.13\%$
test_split 0.4125ms 0.1593ms 6.2764 KOps/s 6.3708 KOps/s $\color{#d91a1a}-1.48\%$
test_permute 0.2184ms 0.1773ms 5.6386 KOps/s 5.6430 KOps/s $\color{#d91a1a}-0.08\%$
test_stack 1.2524ms 0.8560ms 1.1682 KOps/s 1.1740 KOps/s $\color{#d91a1a}-0.50\%$
test_cat 1.2559ms 1.2315ms 812.0459 Ops/s 812.0900 Ops/s $-0.01\%$

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Oct 2, 2024
ghstack-source-id: 71812497f1efb9d20f67a7561e74d5111c4cc3f0
Pull Request resolved: #1022
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@vmoens vmoens added the enhancement New feature or request label Oct 16, 2024
[ghstack-poisoned]
[ghstack-poisoned]
@vmoens vmoens merged commit 628fbf8 into gh/vmoens/23/base Oct 16, 2024
51 of 55 checks passed
vmoens added a commit that referenced this pull request Oct 16, 2024
ghstack-source-id: 5f84ebc2a01e6dab26fe1d68d67bb166a295e885
Pull Request resolved: #1022
@vmoens vmoens deleted the gh/vmoens/23/head branch October 16, 2024 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants