Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Better list casting in TensorDict.from_any #1108

Merged
merged 3 commits into from
Nov 25, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Nov 24, 2024

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 24, 2024
ghstack-source-id: 6c4991313366cb29d58cc34463422aa3ab80da38
Pull Request resolved: #1108
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 24, 2024
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 24, 2024
ghstack-source-id: f83f49735113d165537a2b9f92e8d0f9b8356187
Pull Request resolved: #1108
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Nov 25, 2024
ghstack-source-id: 427d19d5ef7c0d2779e064e64522fc0094a885af
Pull Request resolved: #1108
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 229. Improved: $\large\color{#35bf28}33$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 35.3420μs 10.1986μs 98.0525 KOps/s 92.2355 KOps/s $\textbf{\color{#35bf28}+6.31\%}$
test_plain_set_stack_nested 35.6810μs 10.2250μs 97.7996 KOps/s 91.2907 KOps/s $\textbf{\color{#35bf28}+7.13\%}$
test_plain_set_nested_inplace 48.7020μs 11.1020μs 90.0742 KOps/s 85.0640 KOps/s $\textbf{\color{#35bf28}+5.89\%}$
test_plain_set_stack_nested_inplace 38.0520μs 11.0919μs 90.1558 KOps/s 85.3993 KOps/s $\textbf{\color{#35bf28}+5.57\%}$
test_items 29.1610μs 2.9116μs 343.4591 KOps/s 339.9752 KOps/s $\color{#35bf28}+1.02\%$
test_items_nested 0.3747ms 0.3247ms 3.0802 KOps/s 3.0875 KOps/s $\color{#d91a1a}-0.24\%$
test_items_nested_locked 0.3629ms 0.3262ms 3.0657 KOps/s 3.0642 KOps/s $\color{#35bf28}+0.05\%$
test_items_nested_leaf 87.3850μs 58.1667μs 17.1920 KOps/s 17.2714 KOps/s $\color{#d91a1a}-0.46\%$
test_items_stack_nested 0.4131ms 0.3282ms 3.0472 KOps/s 3.0880 KOps/s $\color{#d91a1a}-1.32\%$
test_items_stack_nested_leaf 85.5850μs 59.3682μs 16.8440 KOps/s 16.9839 KOps/s $\color{#d91a1a}-0.82\%$
test_items_stack_nested_locked 0.3967ms 0.3289ms 3.0405 KOps/s 3.0702 KOps/s $\color{#d91a1a}-0.97\%$
test_keys 31.5510μs 3.4661μs 288.5113 KOps/s 288.3811 KOps/s $\color{#35bf28}+0.05\%$
test_keys_nested 0.1028ms 70.0410μs 14.2773 KOps/s 14.2170 KOps/s $\color{#35bf28}+0.42\%$
test_keys_nested_locked 0.7233ms 75.3023μs 13.2798 KOps/s 13.1823 KOps/s $\color{#35bf28}+0.74\%$
test_keys_nested_leaf 0.1002ms 61.2603μs 16.3238 KOps/s 16.2909 KOps/s $\color{#35bf28}+0.20\%$
test_keys_stack_nested 0.1026ms 70.8050μs 14.1233 KOps/s 14.2684 KOps/s $\color{#d91a1a}-1.02\%$
test_keys_stack_nested_leaf 86.6650μs 61.8136μs 16.1777 KOps/s 16.3370 KOps/s $\color{#d91a1a}-0.98\%$
test_keys_stack_nested_locked 0.1079ms 75.8844μs 13.1779 KOps/s 13.3491 KOps/s $\color{#d91a1a}-1.28\%$
test_values 7.1005μs 0.8466μs 1.1813 MOps/s 1.1873 MOps/s $\color{#d91a1a}-0.51\%$
test_values_nested 61.1730μs 31.2708μs 31.9787 KOps/s 32.1209 KOps/s $\color{#d91a1a}-0.44\%$
test_values_nested_locked 58.7730μs 32.8836μs 30.4103 KOps/s 30.3815 KOps/s $\color{#35bf28}+0.09\%$
test_values_nested_leaf 63.8130μs 33.5495μs 29.8067 KOps/s 30.0908 KOps/s $\color{#d91a1a}-0.94\%$
test_values_stack_nested 64.9240μs 31.7656μs 31.4806 KOps/s 31.8234 KOps/s $\color{#d91a1a}-1.08\%$
test_values_stack_nested_leaf 64.7430μs 34.1715μs 29.2641 KOps/s 29.6317 KOps/s $\color{#d91a1a}-1.24\%$
test_values_stack_nested_locked 60.9830μs 33.3780μs 29.9599 KOps/s 29.9812 KOps/s $\color{#d91a1a}-0.07\%$
test_membership 1.8121μs 0.5094μs 1.9632 MOps/s 1.9568 MOps/s $\color{#35bf28}+0.32\%$
test_membership_nested 21.3360μs 1.8953μs 527.6310 KOps/s 520.4857 KOps/s $\color{#35bf28}+1.37\%$
test_membership_nested_leaf 14.5960μs 1.8968μs 527.2140 KOps/s 520.0750 KOps/s $\color{#35bf28}+1.37\%$
test_membership_stacked_nested 27.7520μs 1.9953μs 501.1877 KOps/s 495.0602 KOps/s $\color{#35bf28}+1.24\%$
test_membership_stacked_nested_leaf 27.2020μs 1.9756μs 506.1808 KOps/s 494.5709 KOps/s $\color{#35bf28}+2.35\%$
test_membership_nested_last 28.7520μs 2.8441μs 351.6051 KOps/s 349.3813 KOps/s $\color{#35bf28}+0.64\%$
test_membership_nested_leaf_last 31.5610μs 2.8023μs 356.8478 KOps/s 346.9149 KOps/s $\color{#35bf28}+2.86\%$
test_membership_stacked_nested_last 39.8120μs 2.8036μs 356.6848 KOps/s 125.2757 KOps/s $\textbf{\color{#35bf28}+184.72\%}$
test_membership_stacked_nested_leaf_last 34.1720μs 2.8224μs 354.3114 KOps/s 126.9189 KOps/s $\textbf{\color{#35bf28}+179.16\%}$
test_nested_getleaf 24.9010μs 5.9991μs 166.6910 KOps/s 165.8766 KOps/s $\color{#35bf28}+0.49\%$
test_nested_get 37.6820μs 5.6719μs 176.3075 KOps/s 174.5797 KOps/s $\color{#35bf28}+0.99\%$
test_stacked_getleaf 35.7220μs 6.0013μs 166.6295 KOps/s 166.4058 KOps/s $\color{#35bf28}+0.13\%$
test_stacked_get 28.9520μs 5.7091μs 175.1603 KOps/s 176.2046 KOps/s $\color{#d91a1a}-0.59\%$
test_nested_getitemleaf 26.0920μs 6.0857μs 164.3187 KOps/s 164.7334 KOps/s $\color{#d91a1a}-0.25\%$
test_nested_getitem 32.6320μs 5.7984μs 172.4607 KOps/s 172.1268 KOps/s $\color{#35bf28}+0.19\%$
test_stacked_getitemleaf 36.5420μs 6.1401μs 162.8630 KOps/s 164.0233 KOps/s $\color{#d91a1a}-0.71\%$
test_stacked_getitem 40.6920μs 5.7960μs 172.5326 KOps/s 173.8296 KOps/s $\color{#d91a1a}-0.75\%$
test_lock_nested 9.4184ms 0.3673ms 2.7227 KOps/s 2.6739 KOps/s $\color{#35bf28}+1.82\%$
test_lock_stack_nested 0.3918ms 0.3327ms 3.0056 KOps/s 3.0126 KOps/s $\color{#d91a1a}-0.23\%$
test_unlock_nested 0.6043ms 0.3014ms 3.3178 KOps/s 3.2688 KOps/s $\color{#35bf28}+1.50\%$
test_unlock_stack_nested 0.3115ms 0.2710ms 3.6904 KOps/s 3.6798 KOps/s $\color{#35bf28}+0.29\%$
test_flatten_speed 0.1069ms 72.4973μs 13.7936 KOps/s 13.8275 KOps/s $\color{#d91a1a}-0.24\%$
test_unflatten_speed 0.3500ms 0.2938ms 3.4037 KOps/s 3.3668 KOps/s $\color{#35bf28}+1.10\%$
test_common_ops 1.6557ms 0.5552ms 1.8012 KOps/s 1.7019 KOps/s $\textbf{\color{#35bf28}+5.84\%}$
test_creation 95.6850μs 1.4774μs 676.8795 KOps/s 684.6274 KOps/s $\color{#d91a1a}-1.13\%$
test_creation_empty 58.0130μs 6.5957μs 151.6144 KOps/s 124.3860 KOps/s $\textbf{\color{#35bf28}+21.89\%}$
test_creation_nested_1 1.7737ms 8.0972μs 123.5000 KOps/s 104.4857 KOps/s $\textbf{\color{#35bf28}+18.20\%}$
test_creation_nested_2 44.0320μs 10.5975μs 94.3617 KOps/s 81.8739 KOps/s $\textbf{\color{#35bf28}+15.25\%}$
test_clone 0.1042ms 9.9370μs 100.6336 KOps/s 100.1039 KOps/s $\color{#35bf28}+0.53\%$
test_getitem[int] 1.2864ms 10.7026μs 93.4354 KOps/s 90.3272 KOps/s $\color{#35bf28}+3.44\%$
test_getitem[slice_int] 0.1080ms 20.2532μs 49.3750 KOps/s 49.3623 KOps/s $\color{#35bf28}+0.03\%$
test_getitem[range] 0.1330ms 36.6870μs 27.2576 KOps/s 27.2162 KOps/s $\color{#35bf28}+0.15\%$
test_getitem[tuple] 0.1114ms 17.8640μs 55.9786 KOps/s 55.3705 KOps/s $\color{#35bf28}+1.10\%$
test_getitem[list] 0.1332ms 32.3840μs 30.8794 KOps/s 30.6827 KOps/s $\color{#35bf28}+0.64\%$
test_setitem_dim[int] 38.0820μs 17.8542μs 56.0092 KOps/s 56.6444 KOps/s $\color{#d91a1a}-1.12\%$
test_setitem_dim[slice_int] 57.3920μs 36.4241μs 27.4544 KOps/s 27.9949 KOps/s $\color{#d91a1a}-1.93\%$
test_setitem_dim[range] 80.5640μs 52.3101μs 19.1168 KOps/s 19.1522 KOps/s $\color{#d91a1a}-0.18\%$
test_setitem_dim[tuple] 50.6430μs 30.4594μs 32.8306 KOps/s 32.3304 KOps/s $\color{#35bf28}+1.55\%$
test_setitem 0.1007ms 13.1854μs 75.8413 KOps/s 69.9914 KOps/s $\textbf{\color{#35bf28}+8.36\%}$
test_set 0.1025ms 13.0829μs 76.4357 KOps/s 71.9121 KOps/s $\textbf{\color{#35bf28}+6.29\%}$
test_set_shared 1.8165ms 0.1451ms 6.8940 KOps/s 6.7244 KOps/s $\color{#35bf28}+2.52\%$
test_update 0.2365ms 15.2052μs 65.7671 KOps/s 61.1225 KOps/s $\textbf{\color{#35bf28}+7.60\%}$
test_update_nested 0.1857ms 19.8664μs 50.3363 KOps/s 46.4240 KOps/s $\textbf{\color{#35bf28}+8.43\%}$
test_update__nested 0.7522ms 23.2302μs 43.0473 KOps/s 42.9144 KOps/s $\color{#35bf28}+0.31\%$
test_set_nested 0.1007ms 14.2446μs 70.2020 KOps/s 67.5618 KOps/s $\color{#35bf28}+3.91\%$
test_set_nested_new 0.1052ms 16.5695μs 60.3518 KOps/s 58.8050 KOps/s $\color{#35bf28}+2.63\%$
test_select 0.1177ms 28.5590μs 35.0153 KOps/s 34.5524 KOps/s $\color{#35bf28}+1.34\%$
test_select_nested 78.2640μs 42.2841μs 23.6496 KOps/s 23.5999 KOps/s $\color{#35bf28}+0.21\%$
test_exclude_nested 87.1640μs 59.9891μs 16.6697 KOps/s 16.6555 KOps/s $\color{#35bf28}+0.09\%$
test_empty[True] 0.3198ms 0.2575ms 3.8840 KOps/s 3.8731 KOps/s $\color{#35bf28}+0.28\%$
test_empty[False] 3.4212μs 0.7435μs 1.3450 MOps/s 1.3553 MOps/s $\color{#d91a1a}-0.76\%$
test_to 86.4950μs 54.7271μs 18.2725 KOps/s 18.2254 KOps/s $\color{#35bf28}+0.26\%$
test_to_nonblocking 76.4030μs 45.3248μs 22.0630 KOps/s 22.0440 KOps/s $\color{#35bf28}+0.09\%$
test_unbind_speed 0.2678ms 0.2295ms 4.3574 KOps/s 4.2464 KOps/s $\color{#35bf28}+2.61\%$
test_unbind_speed_stack0 0.3628ms 0.2329ms 4.2933 KOps/s 4.4015 KOps/s $\color{#d91a1a}-2.46\%$
test_unbind_speed_stack1 94.2187ms 0.6429ms 1.5554 KOps/s 1.5692 KOps/s $\color{#d91a1a}-0.88\%$
test_split 94.5341ms 1.5916ms 628.3006 Ops/s 624.3463 Ops/s $\color{#35bf28}+0.63\%$
test_chunk 97.3880ms 1.7357ms 576.1495 Ops/s 573.6328 Ops/s $\color{#35bf28}+0.44\%$
test_consolidate[False-None] 3.2095ms 2.6025ms 384.2478 Ops/s 378.9902 Ops/s $\color{#35bf28}+1.39\%$
test_consolidate[default-None] 1.7609ms 1.6683ms 599.3969 Ops/s 582.8712 Ops/s $\color{#35bf28}+2.84\%$
test_consolidate[reduce-overhead-None] 1.7956ms 1.6984ms 588.8012 Ops/s 576.4415 Ops/s $\color{#35bf28}+2.14\%$
test_consolidate_njt[False-None] 6.6870ms 6.3515ms 157.4437 Ops/s 149.3910 Ops/s $\textbf{\color{#35bf28}+5.39\%}$
test_to[False-False-None] 1.7558ms 1.6806ms 595.0322 Ops/s 600.9274 Ops/s $\color{#d91a1a}-0.98\%$
test_to[True-False-None] 1.3892ms 1.2564ms 795.9236 Ops/s 751.1028 Ops/s $\textbf{\color{#35bf28}+5.97\%}$
test_to[within-False-None] 4.2081ms 3.9427ms 253.6350 Ops/s 249.6503 Ops/s $\color{#35bf28}+1.60\%$
test_to[True-default-None] 5.2820ms 5.0593ms 197.6574 Ops/s 196.3579 Ops/s $\color{#35bf28}+0.66\%$
test_to_njt[False-False-None] 6.9085ms 6.7945ms 147.1778 Ops/s 144.5409 Ops/s $\color{#35bf28}+1.82\%$
test_to_njt[True-False-None] 5.4190ms 5.2802ms 189.3854 Ops/s 186.1183 Ops/s $\color{#35bf28}+1.76\%$
test_to_njt[within-False-None] 11.8403ms 11.6863ms 85.5703 Ops/s 84.6953 Ops/s $\color{#35bf28}+1.03\%$
test_creation[device0] 0.4686ms 77.9357μs 12.8311 KOps/s 12.8194 KOps/s $\color{#35bf28}+0.09\%$
test_creation_from_tensor 0.4638ms 81.2553μs 12.3069 KOps/s 12.0831 KOps/s $\color{#35bf28}+1.85\%$
test_add_one[memmap_tensor0] 0.2306ms 6.1920μs 161.4982 KOps/s 160.2376 KOps/s $\color{#35bf28}+0.79\%$
test_contiguous[memmap_tensor0] 4.8583μs 0.4073μs 2.4555 MOps/s 2.4204 MOps/s $\color{#35bf28}+1.45\%$
test_stack[memmap_tensor0] 68.9030μs 4.6011μs 217.3409 KOps/s 218.8325 KOps/s $\color{#d91a1a}-0.68\%$
test_memmaptd_index 1.7058ms 0.2413ms 4.1439 KOps/s 4.0249 KOps/s $\color{#35bf28}+2.96\%$
test_memmaptd_index_astensor 0.5496ms 0.2977ms 3.3590 KOps/s 3.2695 KOps/s $\color{#35bf28}+2.74\%$
test_memmaptd_index_op 0.9470ms 0.5342ms 1.8721 KOps/s 1.7693 KOps/s $\textbf{\color{#35bf28}+5.81\%}$
test_serialize_model 0.1316s 0.1308s 7.6428 Ops/s 7.7052 Ops/s $\color{#d91a1a}-0.81\%$
test_serialize_model_pickle 1.3480s 1.1901s 0.8403 Ops/s 0.8234 Ops/s $\color{#35bf28}+2.05\%$
test_serialize_weights 0.1311s 0.1301s 7.6838 Ops/s 7.7195 Ops/s $\color{#d91a1a}-0.46\%$
test_serialize_weights_returnearly 0.3101s 54.1039ms 18.4829 Ops/s 23.4816 Ops/s $\textbf{\color{#d91a1a}-21.29\%}$
test_serialize_weights_pickle 1.3906s 1.1923s 0.8387 Ops/s 0.8141 Ops/s $\color{#35bf28}+3.03\%$
test_reshape_pytree 0.4038ms 22.2950μs 44.8530 KOps/s 43.9165 KOps/s $\color{#35bf28}+2.13\%$
test_reshape_td 53.5330μs 26.8382μs 37.2603 KOps/s 37.3088 KOps/s $\color{#d91a1a}-0.13\%$
test_view_pytree 49.2730μs 22.0791μs 45.2917 KOps/s 43.5367 KOps/s $\color{#35bf28}+4.03\%$
test_view_td 0.4118ms 29.2539μs 34.1835 KOps/s 32.2514 KOps/s $\textbf{\color{#35bf28}+5.99\%}$
test_unbind_pytree 64.4730μs 27.6746μs 36.1343 KOps/s 35.3019 KOps/s $\color{#35bf28}+2.36\%$
test_unbind_td 0.7148ms 35.0231μs 28.5526 KOps/s 27.0778 KOps/s $\textbf{\color{#35bf28}+5.45\%}$
test_split_pytree 58.2530μs 30.0964μs 33.2265 KOps/s 32.4673 KOps/s $\color{#35bf28}+2.34\%$
test_split_td 0.8970ms 38.3520μs 26.0743 KOps/s 25.6561 KOps/s $\color{#35bf28}+1.63\%$
test_add_pytree 0.4085ms 32.7869μs 30.4999 KOps/s 30.5324 KOps/s $\color{#d91a1a}-0.11\%$
test_add_td 84.2340μs 43.5715μs 22.9508 KOps/s 22.6464 KOps/s $\color{#35bf28}+1.34\%$
test_compile_add_one_nested[tensordict-compile] 0.1721ms 0.1232ms 8.1185 KOps/s 8.1167 KOps/s $\color{#35bf28}+0.02\%$
test_compile_add_one_nested[tensordict-eager] 0.2143ms 0.1256ms 7.9635 KOps/s 8.0804 KOps/s $\color{#d91a1a}-1.45\%$
test_compile_add_one_nested[pytree-compile] 0.1585ms 99.5829μs 10.0419 KOps/s 10.3882 KOps/s $\color{#d91a1a}-3.33\%$
test_compile_add_one_nested[pytree-eager] 0.2178ms 0.1485ms 6.7351 KOps/s 6.7715 KOps/s $\color{#d91a1a}-0.54\%$
test_compile_copy_nested[tensordict-compile] 63.1540μs 24.0978μs 41.4976 KOps/s 46.7769 KOps/s $\textbf{\color{#d91a1a}-11.29\%}$
test_compile_copy_nested[tensordict-eager] 0.1457ms 27.3278μs 36.5928 KOps/s 36.2039 KOps/s $\color{#35bf28}+1.07\%$
test_compile_copy_nested[pytree-compile] 0.4334ms 65.1392μs 15.3517 KOps/s 15.4252 KOps/s $\color{#d91a1a}-0.48\%$
test_compile_copy_nested[pytree-eager] 0.1033ms 49.6135μs 20.1558 KOps/s 20.2506 KOps/s $\color{#d91a1a}-0.47\%$
test_compile_add_one_flat[tensordict-compile] 0.2191ms 0.1448ms 6.9068 KOps/s 6.9207 KOps/s $\color{#d91a1a}-0.20\%$
test_compile_add_one_flat[tensordict-eager] 0.3143ms 0.2068ms 4.8352 KOps/s 4.8964 KOps/s $\color{#d91a1a}-1.25\%$
test_compile_add_one_flat[tensorclass-compile] 0.1496ms 0.1013ms 9.8716 KOps/s 9.3782 KOps/s $\textbf{\color{#35bf28}+5.26\%}$
test_compile_add_one_flat[tensorclass-eager] 0.1205ms 53.1502μs 18.8146 KOps/s 19.0654 KOps/s $\color{#d91a1a}-1.32\%$
test_compile_add_one_flat[pytree-compile] 0.2026ms 0.1415ms 7.0663 KOps/s 7.0313 KOps/s $\color{#35bf28}+0.50\%$
test_compile_add_one_flat[pytree-eager] 0.6332ms 0.4737ms 2.1110 KOps/s 2.0582 KOps/s $\color{#35bf28}+2.57\%$
test_compile_add_self_flat[tensordict-eager] 0.4118ms 0.2487ms 4.0203 KOps/s 3.9849 KOps/s $\color{#35bf28}+0.89\%$
test_compile_add_self_flat[tensordict-compile] 0.2208ms 0.1520ms 6.5795 KOps/s 6.7147 KOps/s $\color{#d91a1a}-2.01\%$
test_compile_add_self_flat[tensorclass-eager] 0.1528ms 63.3335μs 15.7894 KOps/s 15.7079 KOps/s $\color{#35bf28}+0.52\%$
test_compile_add_self_flat[tensorclass-compile] 0.1469ms 0.1004ms 9.9578 KOps/s 9.5761 KOps/s $\color{#35bf28}+3.99\%$
test_compile_add_self_flat[pytree-eager] 0.5145ms 0.4048ms 2.4701 KOps/s 2.4476 KOps/s $\color{#35bf28}+0.92\%$
test_compile_add_self_flat[pytree-compile] 0.2005ms 0.1447ms 6.9132 KOps/s 7.3884 KOps/s $\textbf{\color{#d91a1a}-6.43\%}$
test_compile_copy_flat[tensordict-compile] 52.1420μs 20.1594μs 49.6047 KOps/s 56.4301 KOps/s $\textbf{\color{#d91a1a}-12.10\%}$
test_compile_copy_flat[tensordict-eager] 0.4286ms 27.6504μs 36.1658 KOps/s 36.5296 KOps/s $\color{#d91a1a}-1.00\%$
test_compile_copy_flat[pytree-compile] 0.4521ms 69.9824μs 14.2893 KOps/s 14.1130 KOps/s $\color{#35bf28}+1.25\%$
test_compile_copy_flat[pytree-eager] 0.4367ms 51.8852μs 19.2733 KOps/s 19.0572 KOps/s $\color{#35bf28}+1.13\%$
test_compile_assign_and_add[tensordict-compile] 1.6198ms 0.3915ms 2.5546 KOps/s 2.1674 KOps/s $\textbf{\color{#35bf28}+17.86\%}$
test_compile_assign_and_add[tensordict-eager] 3.0094ms 2.6054ms 383.8114 Ops/s 399.1623 Ops/s $\color{#d91a1a}-3.85\%$
test_compile_assign_and_add[pytree-compile] 1.5750ms 0.4309ms 2.3207 KOps/s 2.2007 KOps/s $\textbf{\color{#35bf28}+5.45\%}$
test_compile_assign_and_add[pytree-eager] 2.7791ms 2.6231ms 381.2308 Ops/s 384.0557 Ops/s $\color{#d91a1a}-0.74\%$
test_compile_indexing[tensor-tensordict-compile] 0.4049ms 0.1166ms 8.5765 KOps/s 8.5670 KOps/s $\color{#35bf28}+0.11\%$
test_compile_indexing[tensor-tensordict-eager] 0.5543ms 79.4848μs 12.5810 KOps/s 11.6938 KOps/s $\textbf{\color{#35bf28}+7.59\%}$
test_compile_indexing[tensor-tensorclass-compile] 0.1726ms 0.1115ms 8.9652 KOps/s 8.9880 KOps/s $\color{#d91a1a}-0.25\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1247ms 71.5151μs 13.9831 KOps/s 14.8898 KOps/s $\textbf{\color{#d91a1a}-6.09\%}$
test_compile_indexing[tensor-pytree-compile] 0.1612ms 0.1067ms 9.3760 KOps/s 9.1065 KOps/s $\color{#35bf28}+2.96\%$
test_compile_indexing[tensor-pytree-eager] 0.1325ms 70.0543μs 14.2746 KOps/s 13.9744 KOps/s $\color{#35bf28}+2.15\%$
test_compile_indexing[slice-tensordict-compile] 0.1462ms 0.1012ms 9.8771 KOps/s 9.7811 KOps/s $\color{#35bf28}+0.98\%$
test_compile_indexing[slice-tensordict-eager] 0.1453ms 17.0673μs 58.5915 KOps/s 57.1308 KOps/s $\color{#35bf28}+2.56\%$
test_compile_indexing[slice-tensorclass-compile] 0.2121ms 0.1007ms 9.9291 KOps/s 9.8353 KOps/s $\color{#35bf28}+0.95\%$
test_compile_indexing[slice-tensorclass-eager] 52.6430μs 15.9142μs 62.8368 KOps/s 60.9139 KOps/s $\color{#35bf28}+3.16\%$
test_compile_indexing[slice-pytree-compile] 0.1566ms 0.1016ms 9.8439 KOps/s 9.7318 KOps/s $\color{#35bf28}+1.15\%$
test_compile_indexing[slice-pytree-eager] 50.9530μs 15.9170μs 62.8259 KOps/s 59.8242 KOps/s $\textbf{\color{#35bf28}+5.02\%}$
test_compile_indexing[int-tensordict-compile] 0.1538ms 0.1047ms 9.5479 KOps/s 9.3102 KOps/s $\color{#35bf28}+2.55\%$
test_compile_indexing[int-tensordict-eager] 0.5715ms 17.1423μs 58.3351 KOps/s 57.4872 KOps/s $\color{#35bf28}+1.47\%$
test_compile_indexing[int-tensorclass-compile] 0.1482ms 97.3122μs 10.2762 KOps/s 10.0093 KOps/s $\color{#35bf28}+2.67\%$
test_compile_indexing[int-tensorclass-eager] 0.1812ms 15.9879μs 62.5472 KOps/s 60.6034 KOps/s $\color{#35bf28}+3.21\%$
test_compile_indexing[int-pytree-compile] 0.1643ms 96.8544μs 10.3248 KOps/s 9.6982 KOps/s $\textbf{\color{#35bf28}+6.46\%}$
test_compile_indexing[int-pytree-eager] 49.7820μs 15.9650μs 62.6370 KOps/s 60.4070 KOps/s $\color{#35bf28}+3.69\%$
test_mod_add[eager] 92.2250μs 34.3966μs 29.0726 KOps/s 31.7176 KOps/s $\textbf{\color{#d91a1a}-8.34\%}$
test_mod_add[compile] 0.3696ms 85.7510μs 11.6617 KOps/s 12.8303 KOps/s $\textbf{\color{#d91a1a}-9.11\%}$
test_mod_add[compile-overhead] 0.3273ms 0.1695ms 5.9004 KOps/s 5.7992 KOps/s $\color{#35bf28}+1.75\%$
test_mod_wrap[eager] 0.3160ms 0.2378ms 4.2057 KOps/s 3.9895 KOps/s $\textbf{\color{#35bf28}+5.42\%}$
test_mod_wrap[compile] 0.3484ms 0.2840ms 3.5214 KOps/s 3.4555 KOps/s $\color{#35bf28}+1.91\%$
test_mod_wrap[compile-overhead] 7.1114ms 3.7673ms 265.4455 Ops/s 264.5247 Ops/s $\color{#35bf28}+0.35\%$
test_mod_wrap_and_backward[eager] 1.4476ms 1.3016ms 768.3121 Ops/s 714.8764 Ops/s $\textbf{\color{#35bf28}+7.47\%}$
test_mod_wrap_and_backward[compile] 1.4042ms 1.2627ms 791.9403 Ops/s 726.6895 Ops/s $\textbf{\color{#35bf28}+8.98\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3920ms 0.9226ms 1.0839 KOps/s 966.4891 Ops/s $\textbf{\color{#35bf28}+12.15\%}$
test_seq_add[eager] 0.1566ms 96.7568μs 10.3352 KOps/s 10.2018 KOps/s $\color{#35bf28}+1.31\%$
test_seq_add[compile] 0.1457ms 88.4876μs 11.3010 KOps/s 11.3106 KOps/s $\color{#d91a1a}-0.09\%$
test_seq_add[compile-overhead] 0.1877ms 0.1284ms 7.7891 KOps/s 7.7454 KOps/s $\color{#35bf28}+0.56\%$
test_seq_wrap[eager] 0.5043ms 0.3808ms 2.6263 KOps/s 2.6279 KOps/s $\color{#d91a1a}-0.06\%$
test_seq_wrap[compile] 0.3788ms 0.3010ms 3.3228 KOps/s 3.2766 KOps/s $\color{#35bf28}+1.41\%$
test_seq_wrap[compile-overhead] 0.2774ms 0.2241ms 4.4621 KOps/s 4.4488 KOps/s $\color{#35bf28}+0.30\%$
test_func_call_runtime[False-eager] 1.4197ms 0.7115ms 1.4054 KOps/s 1.3766 KOps/s $\color{#35bf28}+2.10\%$
test_func_call_runtime[False-compile] 0.8171ms 0.7430ms 1.3459 KOps/s 1.3114 KOps/s $\color{#35bf28}+2.63\%$
test_func_call_runtime[False-compile-overhead] 0.4106ms 0.3634ms 2.7522 KOps/s 2.7482 KOps/s $\color{#35bf28}+0.14\%$
test_func_call_runtime[True-eager] 0.9668ms 0.8688ms 1.1510 KOps/s 1.1132 KOps/s $\color{#35bf28}+3.40\%$
test_func_call_runtime[True-compile] 0.8393ms 0.7657ms 1.3060 KOps/s 1.2776 KOps/s $\color{#35bf28}+2.22\%$
test_func_call_runtime[True-compile-overhead] 0.4397ms 0.3858ms 2.5920 KOps/s 2.6126 KOps/s $\color{#d91a1a}-0.79\%$
test_func_call_cm_runtime[False-eager] 0.8337ms 0.7135ms 1.4016 KOps/s 1.3809 KOps/s $\color{#35bf28}+1.50\%$
test_func_call_cm_runtime[False-compile] 0.8222ms 0.7461ms 1.3403 KOps/s 1.3039 KOps/s $\color{#35bf28}+2.79\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4258ms 0.3633ms 2.7523 KOps/s 2.7368 KOps/s $\color{#35bf28}+0.56\%$
test_func_call_cm_runtime[True-eager] 1.0655ms 0.9631ms 1.0383 KOps/s 1.0070 KOps/s $\color{#35bf28}+3.11\%$
test_func_call_cm_runtime[True-compile] 0.8900ms 0.7949ms 1.2580 KOps/s 1.2243 KOps/s $\color{#35bf28}+2.76\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5802ms 0.4127ms 2.4233 KOps/s 2.4260 KOps/s $\color{#d91a1a}-0.11\%$
test_vmap_func_call_cm_runtime[eager] 2.4923ms 2.0133ms 496.6971 Ops/s 496.5024 Ops/s $\color{#35bf28}+0.04\%$
test_vmap_func_call_cm_runtime[compile] 0.8734ms 0.8215ms 1.2173 KOps/s 1.2110 KOps/s $\color{#35bf28}+0.52\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4607ms 0.4129ms 2.4220 KOps/s 2.3777 KOps/s $\color{#35bf28}+1.86\%$
test_distributed 3.2869ms 0.1761ms 5.6790 KOps/s 8.8502 KOps/s $\textbf{\color{#d91a1a}-35.83\%}$
test_tdmodule 48.6130μs 14.6634μs 68.1972 KOps/s 72.0893 KOps/s $\textbf{\color{#d91a1a}-5.40\%}$
test_tdmodule_dispatch 69.0440μs 28.3538μs 35.2686 KOps/s 34.7372 KOps/s $\color{#35bf28}+1.53\%$
test_tdseq 33.0620μs 14.5821μs 68.5771 KOps/s 64.7284 KOps/s $\textbf{\color{#35bf28}+5.95\%}$
test_tdseq_dispatch 57.5830μs 30.0506μs 33.2773 KOps/s 30.4144 KOps/s $\textbf{\color{#35bf28}+9.41\%}$
test_instantiation_functorch 1.6153ms 1.5357ms 651.1558 Ops/s 643.3343 Ops/s $\color{#35bf28}+1.22\%$
test_exec_functorch 0.2108ms 0.1415ms 7.0692 KOps/s 7.0814 KOps/s $\color{#d91a1a}-0.17\%$
test_exec_functional_call 0.1945ms 0.1333ms 7.5018 KOps/s 7.5376 KOps/s $\color{#d91a1a}-0.47\%$
test_exec_td_decorator 0.3621ms 0.1758ms 5.6894 KOps/s 5.5808 KOps/s $\color{#35bf28}+1.94\%$
test_vmap_mlp_speed_decorator[True-True] 0.7932ms 0.6527ms 1.5322 KOps/s 1.5155 KOps/s $\color{#35bf28}+1.10\%$
test_vmap_mlp_speed_decorator[True-False] 0.7621ms 0.6531ms 1.5311 KOps/s 1.5162 KOps/s $\color{#35bf28}+0.98\%$
test_vmap_mlp_speed_decorator[False-True] 0.7245ms 0.5724ms 1.7471 KOps/s 1.7412 KOps/s $\color{#35bf28}+0.34\%$
test_vmap_mlp_speed_decorator[False-False] 0.7105ms 0.5773ms 1.7322 KOps/s 1.7396 KOps/s $\color{#d91a1a}-0.42\%$
test_vmap_transformer_speed_decorator[True-True] 18.6805ms 18.6052ms 53.7485 Ops/s 53.7361 Ops/s $\color{#35bf28}+0.02\%$
test_vmap_transformer_speed_decorator[True-False] 19.0465ms 18.6209ms 53.7032 Ops/s 53.2531 Ops/s $\color{#35bf28}+0.85\%$
test_vmap_transformer_speed_decorator[False-True] 19.7172ms 18.4801ms 54.1123 Ops/s 53.8178 Ops/s $\color{#35bf28}+0.55\%$
test_vmap_transformer_speed_decorator[False-False] 18.5381ms 18.4786ms 54.1166 Ops/s 54.1104 Ops/s $\color{#35bf28}+0.01\%$
test_to_module_speed[True] 1.2738ms 0.9330ms 1.0718 KOps/s 1.0680 KOps/s $\color{#35bf28}+0.36\%$
test_to_module_speed[False] 1.4649ms 0.9153ms 1.0926 KOps/s 1.0759 KOps/s $\color{#35bf28}+1.55\%$
test_tc_init 61.4430μs 33.2197μs 30.1026 KOps/s 28.5424 KOps/s $\textbf{\color{#35bf28}+5.47\%}$
test_tc_init_nested 0.1660ms 68.6771μs 14.5609 KOps/s 13.4893 KOps/s $\textbf{\color{#35bf28}+7.94\%}$
test_tc_first_layer_tensor 8.6804μs 0.7073μs 1.4139 MOps/s 1.4093 MOps/s $\color{#35bf28}+0.33\%$
test_tc_first_layer_nontensor 37.7220μs 2.3388μs 427.5726 KOps/s 423.8412 KOps/s $\color{#35bf28}+0.88\%$
test_tc_second_layer_tensor 11.2173μs 1.4162μs 706.1146 KOps/s 689.2062 KOps/s $\color{#35bf28}+2.45\%$
test_tc_second_layer_nontensor 33.1720μs 3.0923μs 323.3861 KOps/s 324.9965 KOps/s $\color{#d91a1a}-0.50\%$
test_unbind 0.2292s 9.9510ms 100.4926 Ops/s 153.5423 Ops/s $\textbf{\color{#d91a1a}-34.55\%}$
test_full_like 9.6231ms 9.1841ms 108.8837 Ops/s 108.4763 Ops/s $\color{#35bf28}+0.38\%$
test_zeros_like 4.9663ms 4.3256ms 231.1800 Ops/s 235.8125 Ops/s $\color{#d91a1a}-1.96\%$
test_ones_like 4.4464ms 4.3333ms 230.7711 Ops/s 230.9275 Ops/s $\color{#d91a1a}-0.07\%$
test_clone 6.9491ms 6.3322ms 157.9240 Ops/s 155.8412 Ops/s $\color{#35bf28}+1.34\%$
test_squeeze 60.5030μs 9.5096μs 105.1566 KOps/s 107.6312 KOps/s $\color{#d91a1a}-2.30\%$
test_unsqueeze 0.1266ms 71.1931μs 14.0463 KOps/s 14.5741 KOps/s $\color{#d91a1a}-3.62\%$
test_split 0.3987ms 0.1571ms 6.3635 KOps/s 6.4680 KOps/s $\color{#d91a1a}-1.62\%$
test_permute 0.2333ms 0.1841ms 5.4312 KOps/s 5.3876 KOps/s $\color{#35bf28}+0.81\%$
test_stack 50.7966ms 50.3951ms 19.8432 Ops/s 19.6539 Ops/s $\color{#35bf28}+0.96\%$
test_cat 50.7289ms 50.3215ms 19.8722 Ops/s 19.5957 Ops/s $\color{#35bf28}+1.41\%$

@vmoens vmoens merged commit c535391 into gh/vmoens/39/base Nov 25, 2024
38 of 50 checks passed
vmoens added a commit that referenced this pull request Nov 25, 2024
ghstack-source-id: 427d19d5ef7c0d2779e064e64522fc0094a885af
Pull Request resolved: #1108
@vmoens vmoens deleted the gh/vmoens/39/head branch November 25, 2024 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants