Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Better shared/memmap inheritance and faster exclude #621

Merged
merged 22 commits into from
Jan 17, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 16, 2024

I'm grouping #620 and this PR.

The pitch is that we used to have a _is_memmap and _is_shared args in the tensordict constructor, but that is messy: is_shared() should only be true when the tensordict is locked, but when a td was created from another one the locked attribute wasn't passed whereas the shared attributee was.

To solve this, we will make sure that ops that are not modifying the tensors pass explicitly the shared and locked attributes.

I take the opportunity of the refactoring to make exclude faster.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 16, 2024
Copy link

github-actions bot commented Jan 16, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 120. Improved: $\large\color{#35bf28}20$. Worsened: $\large\color{#d91a1a}15$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 33.7230μs 17.2756μs 57.8852 KOps/s 56.2116 KOps/s $\color{#35bf28}+2.98\%$
test_plain_set_stack_nested 0.1899ms 0.1464ms 6.8293 KOps/s 6.5546 KOps/s $\color{#35bf28}+4.19\%$
test_plain_set_nested_inplace 51.9170μs 19.6123μs 50.9885 KOps/s 49.2778 KOps/s $\color{#35bf28}+3.47\%$
test_plain_set_stack_nested_inplace 0.3189ms 0.1791ms 5.5833 KOps/s 5.3610 KOps/s $\color{#35bf28}+4.15\%$
test_items 23.2340μs 2.4247μs 412.4277 KOps/s 401.4646 KOps/s $\color{#35bf28}+2.73\%$
test_items_nested 0.5734ms 0.2719ms 3.6784 KOps/s 3.6140 KOps/s $\color{#35bf28}+1.78\%$
test_items_nested_locked 0.9238ms 0.2735ms 3.6568 KOps/s 3.6281 KOps/s $\color{#35bf28}+0.79\%$
test_items_nested_leaf 0.2946ms 0.1689ms 5.9191 KOps/s 5.8637 KOps/s $\color{#35bf28}+0.95\%$
test_items_stack_nested 1.5671ms 1.3089ms 763.9744 Ops/s 742.6625 Ops/s $\color{#35bf28}+2.87\%$
test_items_stack_nested_leaf 1.4383ms 1.1734ms 852.2036 Ops/s 844.6809 Ops/s $\color{#35bf28}+0.89\%$
test_items_stack_nested_locked 1.0971ms 0.8729ms 1.1456 KOps/s 1.1412 KOps/s $\color{#35bf28}+0.38\%$
test_keys 19.3560μs 3.9163μs 255.3463 KOps/s 258.5924 KOps/s $\color{#d91a1a}-1.26\%$
test_keys_nested 52.4530ms 0.1591ms 6.2864 KOps/s 6.6495 KOps/s $\textbf{\color{#d91a1a}-5.46\%}$
test_keys_nested_locked 0.2929ms 0.1546ms 6.4699 KOps/s 6.5482 KOps/s $\color{#d91a1a}-1.20\%$
test_keys_nested_leaf 0.2481ms 0.1305ms 7.6647 KOps/s 7.5624 KOps/s $\color{#35bf28}+1.35\%$
test_keys_stack_nested 1.5618ms 1.2735ms 785.2430 Ops/s 785.6123 Ops/s $\color{#d91a1a}-0.05\%$
test_keys_stack_nested_leaf 2.0776ms 1.2638ms 791.2765 Ops/s 789.5110 Ops/s $\color{#35bf28}+0.22\%$
test_keys_stack_nested_locked 1.2419ms 0.8057ms 1.2411 KOps/s 1.2482 KOps/s $\color{#d91a1a}-0.56\%$
test_values 5.1020μs 1.1849μs 843.9391 KOps/s 870.3621 KOps/s $\color{#d91a1a}-3.04\%$
test_values_nested 0.1395ms 52.7588μs 18.9542 KOps/s 18.8274 KOps/s $\color{#35bf28}+0.67\%$
test_values_nested_locked 0.1094ms 52.0230μs 19.2223 KOps/s 18.9843 KOps/s $\color{#35bf28}+1.25\%$
test_values_nested_leaf 95.1180μs 46.4029μs 21.5504 KOps/s 21.3308 KOps/s $\color{#35bf28}+1.03\%$
test_values_stack_nested 1.7411ms 1.0343ms 966.8724 Ops/s 953.3543 Ops/s $\color{#35bf28}+1.42\%$
test_values_stack_nested_leaf 1.1398ms 1.0161ms 984.1388 Ops/s 975.5522 Ops/s $\color{#35bf28}+0.88\%$
test_values_stack_nested_locked 0.9828ms 0.6069ms 1.6476 KOps/s 1.6435 KOps/s $\color{#35bf28}+0.25\%$
test_membership 19.9070μs 1.3477μs 742.0108 KOps/s 739.3346 KOps/s $\color{#35bf28}+0.36\%$
test_membership_nested 20.4090μs 3.4295μs 291.5873 KOps/s 342.9532 KOps/s $\textbf{\color{#d91a1a}-14.98\%}$
test_membership_nested_leaf 43.7720μs 3.4504μs 289.8219 KOps/s 346.3993 KOps/s $\textbf{\color{#d91a1a}-16.33\%}$
test_membership_stacked_nested 36.9090μs 11.8748μs 84.2117 KOps/s 84.1318 KOps/s $\color{#35bf28}+0.09\%$
test_membership_stacked_nested_leaf 44.3630μs 11.6179μs 86.0741 KOps/s 84.0245 KOps/s $\color{#35bf28}+2.44\%$
test_membership_nested_last 20.2870μs 6.7135μs 148.9543 KOps/s 164.8352 KOps/s $\textbf{\color{#d91a1a}-9.63\%}$
test_membership_nested_leaf_last 32.3000μs 6.6848μs 149.5936 KOps/s 163.4445 KOps/s $\textbf{\color{#d91a1a}-8.47\%}$
test_membership_stacked_nested_last 0.3008ms 0.1738ms 5.7539 KOps/s 5.8688 KOps/s $\color{#d91a1a}-1.96\%$
test_membership_stacked_nested_leaf_last 43.0900μs 13.6868μs 73.0630 KOps/s 69.0766 KOps/s $\textbf{\color{#35bf28}+5.77\%}$
test_nested_getleaf 36.7090μs 10.7370μs 93.1355 KOps/s 93.0946 KOps/s $\color{#35bf28}+0.04\%$
test_nested_get 31.8500μs 10.2042μs 97.9990 KOps/s 97.6606 KOps/s $\color{#35bf28}+0.35\%$
test_stacked_getleaf 0.8687ms 0.3923ms 2.5492 KOps/s 2.4608 KOps/s $\color{#35bf28}+3.59\%$
test_stacked_get 0.6728ms 0.3614ms 2.7667 KOps/s 2.5845 KOps/s $\textbf{\color{#35bf28}+7.05\%}$
test_nested_getitemleaf 28.4540μs 10.8561μs 92.1140 KOps/s 92.0218 KOps/s $\color{#35bf28}+0.10\%$
test_nested_getitem 33.1420μs 10.1847μs 98.1865 KOps/s 97.9986 KOps/s $\color{#35bf28}+0.19\%$
test_stacked_getitemleaf 0.6158ms 0.3939ms 2.5389 KOps/s 2.4378 KOps/s $\color{#35bf28}+4.15\%$
test_stacked_getitem 0.5738ms 0.3608ms 2.7714 KOps/s 2.6715 KOps/s $\color{#35bf28}+3.74\%$
test_lock_nested 1.2018ms 0.3899ms 2.5650 KOps/s 2.3654 KOps/s $\textbf{\color{#35bf28}+8.44\%}$
test_lock_stack_nested 77.8575ms 6.3635ms 157.1471 Ops/s 141.8294 Ops/s $\textbf{\color{#35bf28}+10.80\%}$
test_unlock_nested 62.7863ms 0.4538ms 2.2037 KOps/s 2.3430 KOps/s $\textbf{\color{#d91a1a}-5.94\%}$
test_unlock_stack_nested 78.6858ms 5.9490ms 168.0949 Ops/s 160.6693 Ops/s $\color{#35bf28}+4.62\%$
test_flatten_speed 0.7324ms 0.3691ms 2.7091 KOps/s 2.7016 KOps/s $\color{#35bf28}+0.28\%$
test_unflatten_speed 0.6600ms 0.4603ms 2.1724 KOps/s 2.1390 KOps/s $\color{#35bf28}+1.56\%$
test_common_ops 4.3370ms 0.6956ms 1.4376 KOps/s 1.4558 KOps/s $\color{#d91a1a}-1.25\%$
test_creation 57.0360μs 1.8813μs 531.5542 KOps/s 485.6039 KOps/s $\textbf{\color{#35bf28}+9.46\%}$
test_creation_empty 27.8120μs 10.5607μs 94.6905 KOps/s 88.4595 KOps/s $\textbf{\color{#35bf28}+7.04\%}$
test_creation_nested_1 36.5090μs 13.0661μs 76.5339 KOps/s 71.3004 KOps/s $\textbf{\color{#35bf28}+7.34\%}$
test_creation_nested_2 45.1540μs 16.2803μs 61.4239 KOps/s 56.6805 KOps/s $\textbf{\color{#35bf28}+8.37\%}$
test_clone 0.1576ms 13.1375μs 76.1178 KOps/s 81.6216 KOps/s $\textbf{\color{#d91a1a}-6.74\%}$
test_getitem[int] 45.3740μs 11.1388μs 89.7762 KOps/s 81.5066 KOps/s $\textbf{\color{#35bf28}+10.15\%}$
test_getitem[slice_int] 0.1002ms 22.8103μs 43.8399 KOps/s 41.6410 KOps/s $\textbf{\color{#35bf28}+5.28\%}$
test_getitem[range] 0.1011ms 41.3249μs 24.1985 KOps/s 23.2863 KOps/s $\color{#35bf28}+3.92\%$
test_getitem[tuple] 45.0440μs 18.1522μs 55.0896 KOps/s 51.6222 KOps/s $\textbf{\color{#35bf28}+6.72\%}$
test_getitem[list] 0.4577ms 36.6980μs 27.2495 KOps/s 26.2707 KOps/s $\color{#35bf28}+3.73\%$
test_setitem_dim[int] 63.1490μs 29.1940μs 34.2536 KOps/s 32.4049 KOps/s $\textbf{\color{#35bf28}+5.71\%}$
test_setitem_dim[slice_int] 77.8050μs 55.1545μs 18.1309 KOps/s 17.2717 KOps/s $\color{#35bf28}+4.97\%$
test_setitem_dim[range] 0.1421ms 74.3549μs 13.4490 KOps/s 13.3695 KOps/s $\color{#35bf28}+0.59\%$
test_setitem_dim[tuple] 79.3390μs 43.3761μs 23.0542 KOps/s 21.9372 KOps/s $\textbf{\color{#35bf28}+5.09\%}$
test_setitem 0.2025ms 20.1016μs 49.7473 KOps/s 52.1455 KOps/s $\color{#d91a1a}-4.60\%$
test_set 0.1812ms 19.0064μs 52.6138 KOps/s 53.4470 KOps/s $\color{#d91a1a}-1.56\%$
test_set_shared 1.8258ms 0.1421ms 7.0379 KOps/s 7.1884 KOps/s $\color{#d91a1a}-2.09\%$
test_update 0.1148ms 22.0467μs 45.3583 KOps/s 44.8595 KOps/s $\color{#35bf28}+1.11\%$
test_update_nested 0.1537ms 29.3322μs 34.0923 KOps/s 33.8568 KOps/s $\color{#35bf28}+0.70\%$
test_set_nested 0.1050ms 21.1200μs 47.3486 KOps/s 48.2239 KOps/s $\color{#d91a1a}-1.82\%$
test_set_nested_new 0.1083ms 24.8901μs 40.1766 KOps/s 39.6834 KOps/s $\color{#35bf28}+1.24\%$
test_select 81.7230μs 38.7187μs 25.8273 KOps/s 20.4383 KOps/s $\textbf{\color{#35bf28}+26.37\%}$
test_unbind_speed 0.3981ms 0.3146ms 3.1783 KOps/s 2.8746 KOps/s $\textbf{\color{#35bf28}+10.56\%}$
test_unbind_speed_stack0 66.4912ms 4.1692ms 239.8545 Ops/s 220.1980 Ops/s $\textbf{\color{#35bf28}+8.93\%}$
test_unbind_speed_stack1 7.7005μs 0.6615μs 1.5118 MOps/s 1.5757 MOps/s $\color{#d91a1a}-4.05\%$
test_split 62.7564ms 1.5842ms 631.2258 Ops/s 628.3689 Ops/s $\color{#35bf28}+0.45\%$
test_chunk 61.2733ms 1.5642ms 639.3165 Ops/s 589.3234 Ops/s $\textbf{\color{#35bf28}+8.48\%}$
test_creation[device0] 0.1995ms 99.7427μs 10.0258 KOps/s 10.0151 KOps/s $\color{#35bf28}+0.11\%$
test_creation_from_tensor 3.1489ms 81.8615μs 12.2158 KOps/s 12.4228 KOps/s $\color{#d91a1a}-1.67\%$
test_add_one[memmap_tensor0] 0.5710ms 5.2385μs 190.8943 KOps/s 192.3301 KOps/s $\color{#d91a1a}-0.75\%$
test_contiguous[memmap_tensor0] 10.4190μs 0.6414μs 1.5591 MOps/s 1.6141 MOps/s $\color{#d91a1a}-3.41\%$
test_stack[memmap_tensor0] 0.1477ms 3.5799μs 279.3409 KOps/s 287.7785 KOps/s $\color{#d91a1a}-2.93\%$
test_memmaptd_index 1.2042ms 0.2201ms 4.5425 KOps/s 5.1051 KOps/s $\textbf{\color{#d91a1a}-11.02\%}$
test_memmaptd_index_astensor 0.6828ms 0.2813ms 3.5553 KOps/s 3.8500 KOps/s $\textbf{\color{#d91a1a}-7.65\%}$
test_memmaptd_index_op 1.2892ms 0.5786ms 1.7283 KOps/s 1.8205 KOps/s $\textbf{\color{#d91a1a}-5.06\%}$
test_serialize_model 0.1750s 0.1102s 9.0710 Ops/s 8.8718 Ops/s $\color{#35bf28}+2.25\%$
test_serialize_model_pickle 0.4506s 0.3793s 2.6363 Ops/s 2.6378 Ops/s $\color{#d91a1a}-0.06\%$
test_serialize_weights 0.1677s 0.1060s 9.4301 Ops/s 10.0517 Ops/s $\textbf{\color{#d91a1a}-6.18\%}$
test_serialize_weights_returnearly 0.3128s 0.1506s 6.6409 Ops/s 7.2666 Ops/s $\textbf{\color{#d91a1a}-8.61\%}$
test_serialize_weights_pickle 0.8026s 0.4982s 2.0074 Ops/s 2.4125 Ops/s $\textbf{\color{#d91a1a}-16.79\%}$
test_serialize_weights_filesystem 0.1768s 0.1021s 9.7948 Ops/s 10.9039 Ops/s $\textbf{\color{#d91a1a}-10.17\%}$
test_serialize_model_filesystem 0.1008s 93.3495ms 10.7124 Ops/s 10.5553 Ops/s $\color{#35bf28}+1.49\%$
test_reshape_pytree 63.1680μs 23.2093μs 43.0863 KOps/s 42.0653 KOps/s $\color{#35bf28}+2.43\%$
test_reshape_td 0.1154ms 29.8637μs 33.4854 KOps/s 31.8421 KOps/s $\textbf{\color{#35bf28}+5.16\%}$
test_view_pytree 98.8660μs 23.3353μs 42.8535 KOps/s 42.7545 KOps/s $\color{#35bf28}+0.23\%$
test_view_td 27.2010μs 4.8689μs 205.3835 KOps/s 204.5043 KOps/s $\color{#35bf28}+0.43\%$
test_unbind_pytree 61.0140μs 26.4367μs 37.8263 KOps/s 37.8246 KOps/s $+0.00\%$
test_unbind_td 0.1038ms 50.3968μs 19.8425 KOps/s 17.9742 KOps/s $\textbf{\color{#35bf28}+10.39\%}$
test_split_pytree 63.1480μs 26.6622μs 37.5062 KOps/s 38.1406 KOps/s $\color{#d91a1a}-1.66\%$
test_split_td 0.5534ms 41.0886μs 24.3377 KOps/s 23.1055 KOps/s $\textbf{\color{#35bf28}+5.33\%}$
test_add_pytree 82.1140μs 32.6273μs 30.6492 KOps/s 31.2864 KOps/s $\color{#d91a1a}-2.04\%$
test_add_td 0.1267ms 52.0578μs 19.2094 KOps/s 19.8636 KOps/s $\color{#d91a1a}-3.29\%$
test_distributed 0.2346ms 98.5878μs 10.1432 KOps/s 10.1147 KOps/s $\color{#35bf28}+0.28\%$
test_tdmodule 0.1130ms 21.6789μs 46.1278 KOps/s 44.3105 KOps/s $\color{#35bf28}+4.10\%$
test_tdmodule_dispatch 0.1987ms 39.9771μs 25.0143 KOps/s 23.9987 KOps/s $\color{#35bf28}+4.23\%$
test_tdseq 41.9480μs 25.5231μs 39.1803 KOps/s 39.4003 KOps/s $\color{#d91a1a}-0.56\%$
test_tdseq_dispatch 0.1539ms 44.7297μs 22.3565 KOps/s 22.0580 KOps/s $\color{#35bf28}+1.35\%$
test_instantiation_functorch 1.5029ms 1.3011ms 768.5671 Ops/s 775.2295 Ops/s $\color{#d91a1a}-0.86\%$
test_instantiation_td 70.6808ms 1.0858ms 920.9792 Ops/s 998.0756 Ops/s $\textbf{\color{#d91a1a}-7.72\%}$
test_exec_functorch 0.2190ms 0.1547ms 6.4641 KOps/s 6.3455 KOps/s $\color{#35bf28}+1.87\%$
test_exec_functional_call 0.2250ms 0.1415ms 7.0668 KOps/s 6.8722 KOps/s $\color{#35bf28}+2.83\%$
test_exec_td 0.2314ms 0.1412ms 7.0799 KOps/s 7.0035 KOps/s $\color{#35bf28}+1.09\%$
test_exec_td_decorator 0.6636ms 0.1740ms 5.7465 KOps/s 5.5832 KOps/s $\color{#35bf28}+2.93\%$
test_vmap_mlp_speed[True-True] 1.2637ms 0.8862ms 1.1284 KOps/s 1.1292 KOps/s $\color{#d91a1a}-0.07\%$
test_vmap_mlp_speed[True-False] 0.6670ms 0.4783ms 2.0906 KOps/s 2.1199 KOps/s $\color{#d91a1a}-1.38\%$
test_vmap_mlp_speed[False-True] 1.1581ms 0.7666ms 1.3045 KOps/s 1.3096 KOps/s $\color{#d91a1a}-0.39\%$
test_vmap_mlp_speed[False-False] 0.5957ms 0.3916ms 2.5537 KOps/s 2.5765 KOps/s $\color{#d91a1a}-0.88\%$
test_vmap_mlp_speed_decorator[True-True] 3.1197ms 2.4125ms 414.5066 Ops/s 410.0839 Ops/s $\color{#35bf28}+1.08\%$
test_vmap_mlp_speed_decorator[True-False] 0.9418ms 0.5304ms 1.8853 KOps/s 1.8961 KOps/s $\color{#d91a1a}-0.57\%$
test_vmap_mlp_speed_decorator[False-True] 2.6049ms 1.9696ms 507.7240 Ops/s 506.1305 Ops/s $\color{#35bf28}+0.31\%$
test_vmap_mlp_speed_decorator[False-False] 0.8721ms 0.4036ms 2.4775 KOps/s 2.4722 KOps/s $\color{#35bf28}+0.21\%$

Copy link

github-actions bot commented Jan 16, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 128. Improved: $\large\color{#35bf28}30$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.4144ms 12.6029μs 79.3469 KOps/s 69.0448 KOps/s $\textbf{\color{#35bf28}+14.92\%}$
test_plain_set_stack_nested 0.1628ms 0.1161ms 8.6149 KOps/s 8.4854 KOps/s $\color{#35bf28}+1.53\%$
test_plain_set_nested_inplace 32.4710μs 13.9156μs 71.8616 KOps/s 64.0723 KOps/s $\textbf{\color{#35bf28}+12.16\%}$
test_plain_set_stack_nested_inplace 0.1801ms 0.1447ms 6.9090 KOps/s 6.8258 KOps/s $\color{#35bf28}+1.22\%$
test_items 25.8110μs 4.6912μs 213.1648 KOps/s 211.2107 KOps/s $\color{#35bf28}+0.93\%$
test_items_nested 0.3949ms 0.3405ms 2.9369 KOps/s 2.9730 KOps/s $\color{#d91a1a}-1.21\%$
test_items_nested_locked 0.3843ms 0.3429ms 2.9166 KOps/s 2.9252 KOps/s $\color{#d91a1a}-0.29\%$
test_items_nested_leaf 0.2491ms 0.2006ms 4.9840 KOps/s 5.0115 KOps/s $\color{#d91a1a}-0.55\%$
test_items_stack_nested 1.3530ms 1.2897ms 775.3552 Ops/s 766.4193 Ops/s $\color{#35bf28}+1.17\%$
test_items_stack_nested_leaf 1.2297ms 1.1274ms 886.9807 Ops/s 881.2178 Ops/s $\color{#35bf28}+0.65\%$
test_items_stack_nested_locked 1.9300ms 0.9076ms 1.1018 KOps/s 1.1235 KOps/s $\color{#d91a1a}-1.93\%$
test_keys 28.3410μs 4.5387μs 220.3287 KOps/s 219.8091 KOps/s $\color{#35bf28}+0.24\%$
test_keys_nested 0.4463ms 94.7971μs 10.5488 KOps/s 10.7018 KOps/s $\color{#d91a1a}-1.43\%$
test_keys_nested_locked 0.1255ms 97.8469μs 10.2200 KOps/s 10.7735 KOps/s $\textbf{\color{#d91a1a}-5.14\%}$
test_keys_nested_leaf 0.1807ms 78.1870μs 12.7898 KOps/s 12.9631 KOps/s $\color{#d91a1a}-1.34\%$
test_keys_stack_nested 1.2670ms 1.1462ms 872.4627 Ops/s 872.2632 Ops/s $\color{#35bf28}+0.02\%$
test_keys_stack_nested_leaf 1.1769ms 1.1090ms 901.7373 Ops/s 882.1237 Ops/s $\color{#35bf28}+2.22\%$
test_keys_stack_nested_locked 0.7924ms 0.7136ms 1.4013 KOps/s 1.3866 KOps/s $\color{#35bf28}+1.06\%$
test_values 8.3703μs 1.8949μs 527.7231 KOps/s 533.7246 KOps/s $\color{#d91a1a}-1.12\%$
test_values_nested 66.8330μs 45.3372μs 22.0569 KOps/s 22.1026 KOps/s $\color{#d91a1a}-0.21\%$
test_values_nested_locked 86.2540μs 47.6077μs 21.0050 KOps/s 21.0466 KOps/s $\color{#d91a1a}-0.20\%$
test_values_nested_leaf 61.9620μs 39.4384μs 25.3560 KOps/s 25.4083 KOps/s $\color{#d91a1a}-0.21\%$
test_values_stack_nested 1.0190ms 0.9544ms 1.0478 KOps/s 1.0514 KOps/s $\color{#d91a1a}-0.34\%$
test_values_stack_nested_leaf 1.0265ms 0.9489ms 1.0539 KOps/s 1.0663 KOps/s $\color{#d91a1a}-1.17\%$
test_values_stack_nested_locked 0.6229ms 0.5751ms 1.7389 KOps/s 1.7126 KOps/s $\color{#35bf28}+1.54\%$
test_membership 5.0282μs 0.9531μs 1.0492 MOps/s 1.0569 MOps/s $\color{#d91a1a}-0.73\%$
test_membership_nested 27.8620μs 2.8888μs 346.1605 KOps/s 429.1854 KOps/s $\textbf{\color{#d91a1a}-19.34\%}$
test_membership_nested_leaf 21.1600μs 2.8795μs 347.2767 KOps/s 450.3711 KOps/s $\textbf{\color{#d91a1a}-22.89\%}$
test_membership_stacked_nested 45.5720μs 11.1172μs 89.9506 KOps/s 90.5400 KOps/s $\color{#d91a1a}-0.65\%$
test_membership_stacked_nested_leaf 23.8420μs 11.1409μs 89.7594 KOps/s 90.9280 KOps/s $\color{#d91a1a}-1.29\%$
test_membership_nested_last 32.0110μs 5.3233μs 187.8523 KOps/s 215.1527 KOps/s $\textbf{\color{#d91a1a}-12.69\%}$
test_membership_nested_leaf_last 37.5210μs 5.3169μs 188.0807 KOps/s 213.8689 KOps/s $\textbf{\color{#d91a1a}-12.06\%}$
test_membership_stacked_nested_last 0.1757ms 0.1435ms 6.9681 KOps/s 7.3791 KOps/s $\textbf{\color{#d91a1a}-5.57\%}$
test_membership_stacked_nested_leaf_last 53.1120μs 13.0022μs 76.9099 KOps/s 76.5587 KOps/s $\color{#35bf28}+0.46\%$
test_nested_getleaf 34.8310μs 8.4032μs 119.0024 KOps/s 120.3866 KOps/s $\color{#d91a1a}-1.15\%$
test_nested_get 22.8410μs 7.9451μs 125.8638 KOps/s 127.2493 KOps/s $\color{#d91a1a}-1.09\%$
test_stacked_getleaf 0.3750ms 0.3224ms 3.1013 KOps/s 3.1255 KOps/s $\color{#d91a1a}-0.78\%$
test_stacked_get 0.3492ms 0.2920ms 3.4245 KOps/s 3.4566 KOps/s $\color{#d91a1a}-0.93\%$
test_nested_getitemleaf 30.0420μs 8.4347μs 118.5581 KOps/s 119.6205 KOps/s $\color{#d91a1a}-0.89\%$
test_nested_getitem 29.4720μs 7.9808μs 125.3006 KOps/s 125.9383 KOps/s $\color{#d91a1a}-0.51\%$
test_stacked_getitemleaf 0.3657ms 0.3240ms 3.0867 KOps/s 3.1111 KOps/s $\color{#d91a1a}-0.79\%$
test_stacked_getitem 0.3978ms 0.2897ms 3.4522 KOps/s 3.4187 KOps/s $\color{#35bf28}+0.98\%$
test_lock_nested 0.8728ms 0.3951ms 2.5309 KOps/s 2.4089 KOps/s $\textbf{\color{#35bf28}+5.07\%}$
test_lock_stack_nested 83.8241ms 6.3064ms 158.5692 Ops/s 153.2298 Ops/s $\color{#35bf28}+3.48\%$
test_unlock_nested 1.0073ms 0.3972ms 2.5175 KOps/s 2.4018 KOps/s $\color{#35bf28}+4.82\%$
test_unlock_stack_nested 82.9870ms 6.7153ms 148.9139 Ops/s 142.5844 Ops/s $\color{#35bf28}+4.44\%$
test_flatten_speed 0.4563ms 0.2653ms 3.7689 KOps/s 3.7650 KOps/s $\color{#35bf28}+0.10\%$
test_unflatten_speed 0.4237ms 0.3691ms 2.7095 KOps/s 2.7883 KOps/s $\color{#d91a1a}-2.82\%$
test_common_ops 1.0297ms 0.5565ms 1.7969 KOps/s 1.5274 KOps/s $\textbf{\color{#35bf28}+17.64\%}$
test_creation 23.8110μs 1.5816μs 632.2647 KOps/s 611.9270 KOps/s $\color{#35bf28}+3.32\%$
test_creation_empty 29.9920μs 6.4035μs 156.1638 KOps/s 102.2265 KOps/s $\textbf{\color{#35bf28}+52.76\%}$
test_creation_nested_1 43.1120μs 8.1323μs 122.9658 KOps/s 84.2265 KOps/s $\textbf{\color{#35bf28}+45.99\%}$
test_creation_nested_2 26.0510μs 10.4992μs 95.2455 KOps/s 70.3813 KOps/s $\textbf{\color{#35bf28}+35.33\%}$
test_clone 0.1067ms 13.2482μs 75.4817 KOps/s 76.7503 KOps/s $\color{#d91a1a}-1.65\%$
test_getitem[int] 28.8310μs 10.7836μs 92.7332 KOps/s 88.9601 KOps/s $\color{#35bf28}+4.24\%$
test_getitem[slice_int] 46.1220μs 22.5554μs 44.3353 KOps/s 44.8127 KOps/s $\color{#d91a1a}-1.07\%$
test_getitem[range] 66.3730μs 36.5150μs 27.3860 KOps/s 27.3568 KOps/s $\color{#35bf28}+0.11\%$
test_getitem[tuple] 48.8820μs 18.9963μs 52.6419 KOps/s 52.7551 KOps/s $\color{#d91a1a}-0.21\%$
test_getitem[list] 0.3670ms 34.7525μs 28.7749 KOps/s 29.1854 KOps/s $\color{#d91a1a}-1.41\%$
test_setitem_dim[int] 42.6920μs 25.5101μs 39.2002 KOps/s 35.0825 KOps/s $\textbf{\color{#35bf28}+11.74\%}$
test_setitem_dim[slice_int] 70.5940μs 45.7663μs 21.8501 KOps/s 20.8380 KOps/s $\color{#35bf28}+4.86\%$
test_setitem_dim[range] 76.0830μs 57.5467μs 17.3772 KOps/s 16.0867 KOps/s $\textbf{\color{#35bf28}+8.02\%}$
test_setitem_dim[tuple] 61.5020μs 38.1796μs 26.1920 KOps/s 23.4463 KOps/s $\textbf{\color{#35bf28}+11.71\%}$
test_setitem 0.1068ms 16.7271μs 59.7831 KOps/s 55.1327 KOps/s $\textbf{\color{#35bf28}+8.43\%}$
test_set 0.1071ms 16.2662μs 61.4772 KOps/s 56.6492 KOps/s $\textbf{\color{#35bf28}+8.52\%}$
test_set_shared 2.9020ms 0.1022ms 9.7856 KOps/s 9.8392 KOps/s $\color{#d91a1a}-0.55\%$
test_update 97.6540μs 17.8178μs 56.1236 KOps/s 45.9201 KOps/s $\textbf{\color{#35bf28}+22.22\%}$
test_update_nested 0.1037ms 23.9216μs 41.8032 KOps/s 36.5329 KOps/s $\textbf{\color{#35bf28}+14.43\%}$
test_set_nested 95.3540μs 17.3658μs 57.5844 KOps/s 52.6310 KOps/s $\textbf{\color{#35bf28}+9.41\%}$
test_set_nested_new 98.9850μs 20.3475μs 49.1461 KOps/s 45.4053 KOps/s $\textbf{\color{#35bf28}+8.24\%}$
test_select 0.1130ms 33.4783μs 29.8701 KOps/s 21.0434 KOps/s $\textbf{\color{#35bf28}+41.95\%}$
test_to 74.4630μs 56.4906μs 17.7021 KOps/s 17.1849 KOps/s $\color{#35bf28}+3.01\%$
test_to_nonblocking 71.4940μs 33.9900μs 29.4204 KOps/s 27.5377 KOps/s $\textbf{\color{#35bf28}+6.84\%}$
test_unbind_speed 0.3775ms 0.3187ms 3.1377 KOps/s 3.0498 KOps/s $\color{#35bf28}+2.88\%$
test_unbind_speed_stack0 80.3481ms 3.7136ms 269.2800 Ops/s 256.7950 Ops/s $\color{#35bf28}+4.86\%$
test_unbind_speed_stack1 3.6081μs 0.5348μs 1.8698 MOps/s 1.8929 MOps/s $\color{#d91a1a}-1.22\%$
test_split 1.8512ms 1.5637ms 639.5239 Ops/s 566.7137 Ops/s $\textbf{\color{#35bf28}+12.85\%}$
test_chunk 74.0810ms 1.6828ms 594.2602 Ops/s 620.5868 Ops/s $\color{#d91a1a}-4.24\%$
test_creation[device0] 0.1311ms 70.8344μs 14.1174 KOps/s 13.0977 KOps/s $\textbf{\color{#35bf28}+7.79\%}$
test_creation_from_tensor 0.1310ms 53.0791μs 18.8398 KOps/s 17.3905 KOps/s $\textbf{\color{#35bf28}+8.33\%}$
test_add_one[memmap_tensor0] 0.2028ms 6.2406μs 160.2418 KOps/s 158.4085 KOps/s $\color{#35bf28}+1.16\%$
test_contiguous[memmap_tensor0] 10.8710μs 0.6434μs 1.5542 MOps/s 1.5546 MOps/s $\color{#d91a1a}-0.02\%$
test_stack[memmap_tensor0] 35.1420μs 4.3267μs 231.1209 KOps/s 231.2569 KOps/s $\color{#d91a1a}-0.06\%$
test_memmaptd_index 1.0033ms 0.2599ms 3.8472 KOps/s 4.0820 KOps/s $\textbf{\color{#d91a1a}-5.75\%}$
test_memmaptd_index_astensor 0.5709ms 0.3162ms 3.1628 KOps/s 3.3095 KOps/s $\color{#d91a1a}-4.43\%$
test_memmaptd_index_op 0.8539ms 0.5555ms 1.8002 KOps/s 1.6943 KOps/s $\textbf{\color{#35bf28}+6.26\%}$
test_serialize_model 0.1657s 96.6512ms 10.3465 Ops/s 9.7812 Ops/s $\textbf{\color{#35bf28}+5.78\%}$
test_serialize_model_pickle 1.3505s 1.2376s 0.8080 Ops/s 0.8083 Ops/s $\color{#d91a1a}-0.03\%$
test_serialize_weights 0.1693s 95.0087ms 10.5253 Ops/s 10.1586 Ops/s $\color{#35bf28}+3.61\%$
test_serialize_weights_returnearly 0.2712s 77.2043ms 12.9526 Ops/s 13.1193 Ops/s $\color{#d91a1a}-1.27\%$
test_serialize_weights_pickle 1.3839s 1.2453s 0.8030 Ops/s 0.8082 Ops/s $\color{#d91a1a}-0.64\%$
test_reshape_pytree 58.1930μs 24.7870μs 40.3438 KOps/s 40.8832 KOps/s $\color{#d91a1a}-1.32\%$
test_reshape_td 53.1120μs 28.8854μs 34.6195 KOps/s 33.9569 KOps/s $\color{#35bf28}+1.95\%$
test_view_pytree 54.3220μs 24.5471μs 40.7380 KOps/s 41.1844 KOps/s $\color{#d91a1a}-1.08\%$
test_view_td 27.7310μs 4.2691μs 234.2419 KOps/s 239.9033 KOps/s $\color{#d91a1a}-2.36\%$
test_unbind_pytree 0.1994ms 30.4560μs 32.8343 KOps/s 33.3571 KOps/s $\color{#d91a1a}-1.57\%$
test_unbind_td 0.1818ms 50.9637μs 19.6218 KOps/s 18.9450 KOps/s $\color{#35bf28}+3.57\%$
test_split_pytree 57.3330μs 28.7190μs 34.8202 KOps/s 35.1554 KOps/s $\color{#d91a1a}-0.95\%$
test_split_td 0.7104ms 40.4245μs 24.7375 KOps/s 23.9907 KOps/s $\color{#35bf28}+3.11\%$
test_add_pytree 61.0620μs 34.9333μs 28.6260 KOps/s 29.4181 KOps/s $\color{#d91a1a}-2.69\%$
test_add_td 88.9240μs 44.5493μs 22.4471 KOps/s 20.6938 KOps/s $\textbf{\color{#35bf28}+8.47\%}$
test_distributed 5.9354ms 91.7978μs 10.8935 KOps/s 13.1925 KOps/s $\textbf{\color{#d91a1a}-17.43\%}$
test_tdmodule 0.1042ms 16.7462μs 59.7151 KOps/s 50.7017 KOps/s $\textbf{\color{#35bf28}+17.78\%}$
test_tdmodule_dispatch 0.2228ms 31.7406μs 31.5054 KOps/s 26.9391 KOps/s $\textbf{\color{#35bf28}+16.95\%}$
test_tdseq 29.8620μs 19.3174μs 51.7669 KOps/s 46.1734 KOps/s $\textbf{\color{#35bf28}+12.11\%}$
test_tdseq_dispatch 50.1130μs 34.2919μs 29.1614 KOps/s 25.2576 KOps/s $\textbf{\color{#35bf28}+15.46\%}$
test_instantiation_functorch 1.7951ms 1.6763ms 596.5441 Ops/s 597.6452 Ops/s $\color{#d91a1a}-0.18\%$
test_instantiation_td 1.7843ms 1.1654ms 858.0438 Ops/s 859.5510 Ops/s $\color{#d91a1a}-0.18\%$
test_exec_functorch 0.1944ms 0.1583ms 6.3177 KOps/s 6.3071 KOps/s $\color{#35bf28}+0.17\%$
test_exec_functional_call 0.1894ms 0.1556ms 6.4261 KOps/s 6.5256 KOps/s $\color{#d91a1a}-1.52\%$
test_exec_td 0.1772ms 0.1432ms 6.9853 KOps/s 7.0182 KOps/s $\color{#d91a1a}-0.47\%$
test_exec_td_decorator 0.7786ms 0.1786ms 5.5981 KOps/s 5.4667 KOps/s $\color{#35bf28}+2.40\%$
test_vmap_mlp_speed[True-True] 1.1544ms 1.0902ms 917.2496 Ops/s 921.8875 Ops/s $\color{#d91a1a}-0.50\%$
test_vmap_mlp_speed[True-False] 0.6794ms 0.6364ms 1.5712 KOps/s 1.4995 KOps/s $\color{#35bf28}+4.78\%$
test_vmap_mlp_speed[False-True] 1.0523ms 1.0027ms 997.3132 Ops/s 959.0447 Ops/s $\color{#35bf28}+3.99\%$
test_vmap_mlp_speed[False-False] 0.6157ms 0.5697ms 1.7553 KOps/s 1.6907 KOps/s $\color{#35bf28}+3.82\%$
test_vmap_mlp_speed_decorator[True-True] 3.2486ms 2.4589ms 406.6797 Ops/s 393.9603 Ops/s $\color{#35bf28}+3.23\%$
test_vmap_mlp_speed_decorator[True-False] 1.1017ms 0.6953ms 1.4382 KOps/s 1.4019 KOps/s $\color{#35bf28}+2.58\%$
test_vmap_mlp_speed_decorator[False-True] 2.4272ms 2.0664ms 483.9413 Ops/s 475.8640 Ops/s $\color{#35bf28}+1.70\%$
test_vmap_mlp_speed_decorator[False-False] 0.9833ms 0.5926ms 1.6876 KOps/s 1.6825 KOps/s $\color{#35bf28}+0.30\%$
test_vmap_transformer_speed[True-True] 12.8205ms 12.4046ms 80.6153 Ops/s 82.4066 Ops/s $\color{#d91a1a}-2.17\%$
test_vmap_transformer_speed[True-False] 8.2704ms 7.9800ms 125.3135 Ops/s 126.7819 Ops/s $\color{#d91a1a}-1.16\%$
test_vmap_transformer_speed[False-True] 12.6294ms 12.2604ms 81.5637 Ops/s 83.7817 Ops/s $\color{#d91a1a}-2.65\%$
test_vmap_transformer_speed[False-False] 8.1712ms 7.9173ms 126.3052 Ops/s 125.4636 Ops/s $\color{#35bf28}+0.67\%$
test_vmap_transformer_speed_decorator[True-True] 75.4749ms 74.4945ms 13.4238 Ops/s 12.2242 Ops/s $\textbf{\color{#35bf28}+9.81\%}$
test_vmap_transformer_speed_decorator[True-False] 20.5643ms 18.8867ms 52.9472 Ops/s 53.0377 Ops/s $\color{#d91a1a}-0.17\%$
test_vmap_transformer_speed_decorator[False-True] 0.1621s 73.2386ms 13.6540 Ops/s 14.6890 Ops/s $\textbf{\color{#d91a1a}-7.05\%}$
test_vmap_transformer_speed_decorator[False-False] 20.3933ms 18.5782ms 53.8265 Ops/s 49.4055 Ops/s $\textbf{\color{#35bf28}+8.95\%}$

@vmoens vmoens changed the title [Performance] Faster exclude [Performance] Better shared/memmap inheritance and faster exclude Jan 16, 2024
@vmoens vmoens added the bug Something isn't working label Jan 17, 2024
@vmoens vmoens merged commit 99eff33 into main Jan 17, 2024
44 of 45 checks passed
@vmoens vmoens deleted the faster-exclude branch January 17, 2024 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants