-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] Better shared/memmap inheritance and faster exclude #621
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
facebook-github-bot
added
the
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
label
Jan 16, 2024
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 33.7230μs | 17.2756μs | 57.8852 KOps/s | 56.2116 KOps/s | |
test_plain_set_stack_nested | 0.1899ms | 0.1464ms | 6.8293 KOps/s | 6.5546 KOps/s | |
test_plain_set_nested_inplace | 51.9170μs | 19.6123μs | 50.9885 KOps/s | 49.2778 KOps/s | |
test_plain_set_stack_nested_inplace | 0.3189ms | 0.1791ms | 5.5833 KOps/s | 5.3610 KOps/s | |
test_items | 23.2340μs | 2.4247μs | 412.4277 KOps/s | 401.4646 KOps/s | |
test_items_nested | 0.5734ms | 0.2719ms | 3.6784 KOps/s | 3.6140 KOps/s | |
test_items_nested_locked | 0.9238ms | 0.2735ms | 3.6568 KOps/s | 3.6281 KOps/s | |
test_items_nested_leaf | 0.2946ms | 0.1689ms | 5.9191 KOps/s | 5.8637 KOps/s | |
test_items_stack_nested | 1.5671ms | 1.3089ms | 763.9744 Ops/s | 742.6625 Ops/s | |
test_items_stack_nested_leaf | 1.4383ms | 1.1734ms | 852.2036 Ops/s | 844.6809 Ops/s | |
test_items_stack_nested_locked | 1.0971ms | 0.8729ms | 1.1456 KOps/s | 1.1412 KOps/s | |
test_keys | 19.3560μs | 3.9163μs | 255.3463 KOps/s | 258.5924 KOps/s | |
test_keys_nested | 52.4530ms | 0.1591ms | 6.2864 KOps/s | 6.6495 KOps/s | |
test_keys_nested_locked | 0.2929ms | 0.1546ms | 6.4699 KOps/s | 6.5482 KOps/s | |
test_keys_nested_leaf | 0.2481ms | 0.1305ms | 7.6647 KOps/s | 7.5624 KOps/s | |
test_keys_stack_nested | 1.5618ms | 1.2735ms | 785.2430 Ops/s | 785.6123 Ops/s | |
test_keys_stack_nested_leaf | 2.0776ms | 1.2638ms | 791.2765 Ops/s | 789.5110 Ops/s | |
test_keys_stack_nested_locked | 1.2419ms | 0.8057ms | 1.2411 KOps/s | 1.2482 KOps/s | |
test_values | 5.1020μs | 1.1849μs | 843.9391 KOps/s | 870.3621 KOps/s | |
test_values_nested | 0.1395ms | 52.7588μs | 18.9542 KOps/s | 18.8274 KOps/s | |
test_values_nested_locked | 0.1094ms | 52.0230μs | 19.2223 KOps/s | 18.9843 KOps/s | |
test_values_nested_leaf | 95.1180μs | 46.4029μs | 21.5504 KOps/s | 21.3308 KOps/s | |
test_values_stack_nested | 1.7411ms | 1.0343ms | 966.8724 Ops/s | 953.3543 Ops/s | |
test_values_stack_nested_leaf | 1.1398ms | 1.0161ms | 984.1388 Ops/s | 975.5522 Ops/s | |
test_values_stack_nested_locked | 0.9828ms | 0.6069ms | 1.6476 KOps/s | 1.6435 KOps/s | |
test_membership | 19.9070μs | 1.3477μs | 742.0108 KOps/s | 739.3346 KOps/s | |
test_membership_nested | 20.4090μs | 3.4295μs | 291.5873 KOps/s | 342.9532 KOps/s | |
test_membership_nested_leaf | 43.7720μs | 3.4504μs | 289.8219 KOps/s | 346.3993 KOps/s | |
test_membership_stacked_nested | 36.9090μs | 11.8748μs | 84.2117 KOps/s | 84.1318 KOps/s | |
test_membership_stacked_nested_leaf | 44.3630μs | 11.6179μs | 86.0741 KOps/s | 84.0245 KOps/s | |
test_membership_nested_last | 20.2870μs | 6.7135μs | 148.9543 KOps/s | 164.8352 KOps/s | |
test_membership_nested_leaf_last | 32.3000μs | 6.6848μs | 149.5936 KOps/s | 163.4445 KOps/s | |
test_membership_stacked_nested_last | 0.3008ms | 0.1738ms | 5.7539 KOps/s | 5.8688 KOps/s | |
test_membership_stacked_nested_leaf_last | 43.0900μs | 13.6868μs | 73.0630 KOps/s | 69.0766 KOps/s | |
test_nested_getleaf | 36.7090μs | 10.7370μs | 93.1355 KOps/s | 93.0946 KOps/s | |
test_nested_get | 31.8500μs | 10.2042μs | 97.9990 KOps/s | 97.6606 KOps/s | |
test_stacked_getleaf | 0.8687ms | 0.3923ms | 2.5492 KOps/s | 2.4608 KOps/s | |
test_stacked_get | 0.6728ms | 0.3614ms | 2.7667 KOps/s | 2.5845 KOps/s | |
test_nested_getitemleaf | 28.4540μs | 10.8561μs | 92.1140 KOps/s | 92.0218 KOps/s | |
test_nested_getitem | 33.1420μs | 10.1847μs | 98.1865 KOps/s | 97.9986 KOps/s | |
test_stacked_getitemleaf | 0.6158ms | 0.3939ms | 2.5389 KOps/s | 2.4378 KOps/s | |
test_stacked_getitem | 0.5738ms | 0.3608ms | 2.7714 KOps/s | 2.6715 KOps/s | |
test_lock_nested | 1.2018ms | 0.3899ms | 2.5650 KOps/s | 2.3654 KOps/s | |
test_lock_stack_nested | 77.8575ms | 6.3635ms | 157.1471 Ops/s | 141.8294 Ops/s | |
test_unlock_nested | 62.7863ms | 0.4538ms | 2.2037 KOps/s | 2.3430 KOps/s | |
test_unlock_stack_nested | 78.6858ms | 5.9490ms | 168.0949 Ops/s | 160.6693 Ops/s | |
test_flatten_speed | 0.7324ms | 0.3691ms | 2.7091 KOps/s | 2.7016 KOps/s | |
test_unflatten_speed | 0.6600ms | 0.4603ms | 2.1724 KOps/s | 2.1390 KOps/s | |
test_common_ops | 4.3370ms | 0.6956ms | 1.4376 KOps/s | 1.4558 KOps/s | |
test_creation | 57.0360μs | 1.8813μs | 531.5542 KOps/s | 485.6039 KOps/s | |
test_creation_empty | 27.8120μs | 10.5607μs | 94.6905 KOps/s | 88.4595 KOps/s | |
test_creation_nested_1 | 36.5090μs | 13.0661μs | 76.5339 KOps/s | 71.3004 KOps/s | |
test_creation_nested_2 | 45.1540μs | 16.2803μs | 61.4239 KOps/s | 56.6805 KOps/s | |
test_clone | 0.1576ms | 13.1375μs | 76.1178 KOps/s | 81.6216 KOps/s | |
test_getitem[int] | 45.3740μs | 11.1388μs | 89.7762 KOps/s | 81.5066 KOps/s | |
test_getitem[slice_int] | 0.1002ms | 22.8103μs | 43.8399 KOps/s | 41.6410 KOps/s | |
test_getitem[range] | 0.1011ms | 41.3249μs | 24.1985 KOps/s | 23.2863 KOps/s | |
test_getitem[tuple] | 45.0440μs | 18.1522μs | 55.0896 KOps/s | 51.6222 KOps/s | |
test_getitem[list] | 0.4577ms | 36.6980μs | 27.2495 KOps/s | 26.2707 KOps/s | |
test_setitem_dim[int] | 63.1490μs | 29.1940μs | 34.2536 KOps/s | 32.4049 KOps/s | |
test_setitem_dim[slice_int] | 77.8050μs | 55.1545μs | 18.1309 KOps/s | 17.2717 KOps/s | |
test_setitem_dim[range] | 0.1421ms | 74.3549μs | 13.4490 KOps/s | 13.3695 KOps/s | |
test_setitem_dim[tuple] | 79.3390μs | 43.3761μs | 23.0542 KOps/s | 21.9372 KOps/s | |
test_setitem | 0.2025ms | 20.1016μs | 49.7473 KOps/s | 52.1455 KOps/s | |
test_set | 0.1812ms | 19.0064μs | 52.6138 KOps/s | 53.4470 KOps/s | |
test_set_shared | 1.8258ms | 0.1421ms | 7.0379 KOps/s | 7.1884 KOps/s | |
test_update | 0.1148ms | 22.0467μs | 45.3583 KOps/s | 44.8595 KOps/s | |
test_update_nested | 0.1537ms | 29.3322μs | 34.0923 KOps/s | 33.8568 KOps/s | |
test_set_nested | 0.1050ms | 21.1200μs | 47.3486 KOps/s | 48.2239 KOps/s | |
test_set_nested_new | 0.1083ms | 24.8901μs | 40.1766 KOps/s | 39.6834 KOps/s | |
test_select | 81.7230μs | 38.7187μs | 25.8273 KOps/s | 20.4383 KOps/s | |
test_unbind_speed | 0.3981ms | 0.3146ms | 3.1783 KOps/s | 2.8746 KOps/s | |
test_unbind_speed_stack0 | 66.4912ms | 4.1692ms | 239.8545 Ops/s | 220.1980 Ops/s | |
test_unbind_speed_stack1 | 7.7005μs | 0.6615μs | 1.5118 MOps/s | 1.5757 MOps/s | |
test_split | 62.7564ms | 1.5842ms | 631.2258 Ops/s | 628.3689 Ops/s | |
test_chunk | 61.2733ms | 1.5642ms | 639.3165 Ops/s | 589.3234 Ops/s | |
test_creation[device0] | 0.1995ms | 99.7427μs | 10.0258 KOps/s | 10.0151 KOps/s | |
test_creation_from_tensor | 3.1489ms | 81.8615μs | 12.2158 KOps/s | 12.4228 KOps/s | |
test_add_one[memmap_tensor0] | 0.5710ms | 5.2385μs | 190.8943 KOps/s | 192.3301 KOps/s | |
test_contiguous[memmap_tensor0] | 10.4190μs | 0.6414μs | 1.5591 MOps/s | 1.6141 MOps/s | |
test_stack[memmap_tensor0] | 0.1477ms | 3.5799μs | 279.3409 KOps/s | 287.7785 KOps/s | |
test_memmaptd_index | 1.2042ms | 0.2201ms | 4.5425 KOps/s | 5.1051 KOps/s | |
test_memmaptd_index_astensor | 0.6828ms | 0.2813ms | 3.5553 KOps/s | 3.8500 KOps/s | |
test_memmaptd_index_op | 1.2892ms | 0.5786ms | 1.7283 KOps/s | 1.8205 KOps/s | |
test_serialize_model | 0.1750s | 0.1102s | 9.0710 Ops/s | 8.8718 Ops/s | |
test_serialize_model_pickle | 0.4506s | 0.3793s | 2.6363 Ops/s | 2.6378 Ops/s | |
test_serialize_weights | 0.1677s | 0.1060s | 9.4301 Ops/s | 10.0517 Ops/s | |
test_serialize_weights_returnearly | 0.3128s | 0.1506s | 6.6409 Ops/s | 7.2666 Ops/s | |
test_serialize_weights_pickle | 0.8026s | 0.4982s | 2.0074 Ops/s | 2.4125 Ops/s | |
test_serialize_weights_filesystem | 0.1768s | 0.1021s | 9.7948 Ops/s | 10.9039 Ops/s | |
test_serialize_model_filesystem | 0.1008s | 93.3495ms | 10.7124 Ops/s | 10.5553 Ops/s | |
test_reshape_pytree | 63.1680μs | 23.2093μs | 43.0863 KOps/s | 42.0653 KOps/s | |
test_reshape_td | 0.1154ms | 29.8637μs | 33.4854 KOps/s | 31.8421 KOps/s | |
test_view_pytree | 98.8660μs | 23.3353μs | 42.8535 KOps/s | 42.7545 KOps/s | |
test_view_td | 27.2010μs | 4.8689μs | 205.3835 KOps/s | 204.5043 KOps/s | |
test_unbind_pytree | 61.0140μs | 26.4367μs | 37.8263 KOps/s | 37.8246 KOps/s | |
test_unbind_td | 0.1038ms | 50.3968μs | 19.8425 KOps/s | 17.9742 KOps/s | |
test_split_pytree | 63.1480μs | 26.6622μs | 37.5062 KOps/s | 38.1406 KOps/s | |
test_split_td | 0.5534ms | 41.0886μs | 24.3377 KOps/s | 23.1055 KOps/s | |
test_add_pytree | 82.1140μs | 32.6273μs | 30.6492 KOps/s | 31.2864 KOps/s | |
test_add_td | 0.1267ms | 52.0578μs | 19.2094 KOps/s | 19.8636 KOps/s | |
test_distributed | 0.2346ms | 98.5878μs | 10.1432 KOps/s | 10.1147 KOps/s | |
test_tdmodule | 0.1130ms | 21.6789μs | 46.1278 KOps/s | 44.3105 KOps/s | |
test_tdmodule_dispatch | 0.1987ms | 39.9771μs | 25.0143 KOps/s | 23.9987 KOps/s | |
test_tdseq | 41.9480μs | 25.5231μs | 39.1803 KOps/s | 39.4003 KOps/s | |
test_tdseq_dispatch | 0.1539ms | 44.7297μs | 22.3565 KOps/s | 22.0580 KOps/s | |
test_instantiation_functorch | 1.5029ms | 1.3011ms | 768.5671 Ops/s | 775.2295 Ops/s | |
test_instantiation_td | 70.6808ms | 1.0858ms | 920.9792 Ops/s | 998.0756 Ops/s | |
test_exec_functorch | 0.2190ms | 0.1547ms | 6.4641 KOps/s | 6.3455 KOps/s | |
test_exec_functional_call | 0.2250ms | 0.1415ms | 7.0668 KOps/s | 6.8722 KOps/s | |
test_exec_td | 0.2314ms | 0.1412ms | 7.0799 KOps/s | 7.0035 KOps/s | |
test_exec_td_decorator | 0.6636ms | 0.1740ms | 5.7465 KOps/s | 5.5832 KOps/s | |
test_vmap_mlp_speed[True-True] | 1.2637ms | 0.8862ms | 1.1284 KOps/s | 1.1292 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.6670ms | 0.4783ms | 2.0906 KOps/s | 2.1199 KOps/s | |
test_vmap_mlp_speed[False-True] | 1.1581ms | 0.7666ms | 1.3045 KOps/s | 1.3096 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.5957ms | 0.3916ms | 2.5537 KOps/s | 2.5765 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 3.1197ms | 2.4125ms | 414.5066 Ops/s | 410.0839 Ops/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.9418ms | 0.5304ms | 1.8853 KOps/s | 1.8961 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 2.6049ms | 1.9696ms | 507.7240 Ops/s | 506.1305 Ops/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.8721ms | 0.4036ms | 2.4775 KOps/s | 2.4722 KOps/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 0.4144ms | 12.6029μs | 79.3469 KOps/s | 69.0448 KOps/s | |
test_plain_set_stack_nested | 0.1628ms | 0.1161ms | 8.6149 KOps/s | 8.4854 KOps/s | |
test_plain_set_nested_inplace | 32.4710μs | 13.9156μs | 71.8616 KOps/s | 64.0723 KOps/s | |
test_plain_set_stack_nested_inplace | 0.1801ms | 0.1447ms | 6.9090 KOps/s | 6.8258 KOps/s | |
test_items | 25.8110μs | 4.6912μs | 213.1648 KOps/s | 211.2107 KOps/s | |
test_items_nested | 0.3949ms | 0.3405ms | 2.9369 KOps/s | 2.9730 KOps/s | |
test_items_nested_locked | 0.3843ms | 0.3429ms | 2.9166 KOps/s | 2.9252 KOps/s | |
test_items_nested_leaf | 0.2491ms | 0.2006ms | 4.9840 KOps/s | 5.0115 KOps/s | |
test_items_stack_nested | 1.3530ms | 1.2897ms | 775.3552 Ops/s | 766.4193 Ops/s | |
test_items_stack_nested_leaf | 1.2297ms | 1.1274ms | 886.9807 Ops/s | 881.2178 Ops/s | |
test_items_stack_nested_locked | 1.9300ms | 0.9076ms | 1.1018 KOps/s | 1.1235 KOps/s | |
test_keys | 28.3410μs | 4.5387μs | 220.3287 KOps/s | 219.8091 KOps/s | |
test_keys_nested | 0.4463ms | 94.7971μs | 10.5488 KOps/s | 10.7018 KOps/s | |
test_keys_nested_locked | 0.1255ms | 97.8469μs | 10.2200 KOps/s | 10.7735 KOps/s | |
test_keys_nested_leaf | 0.1807ms | 78.1870μs | 12.7898 KOps/s | 12.9631 KOps/s | |
test_keys_stack_nested | 1.2670ms | 1.1462ms | 872.4627 Ops/s | 872.2632 Ops/s | |
test_keys_stack_nested_leaf | 1.1769ms | 1.1090ms | 901.7373 Ops/s | 882.1237 Ops/s | |
test_keys_stack_nested_locked | 0.7924ms | 0.7136ms | 1.4013 KOps/s | 1.3866 KOps/s | |
test_values | 8.3703μs | 1.8949μs | 527.7231 KOps/s | 533.7246 KOps/s | |
test_values_nested | 66.8330μs | 45.3372μs | 22.0569 KOps/s | 22.1026 KOps/s | |
test_values_nested_locked | 86.2540μs | 47.6077μs | 21.0050 KOps/s | 21.0466 KOps/s | |
test_values_nested_leaf | 61.9620μs | 39.4384μs | 25.3560 KOps/s | 25.4083 KOps/s | |
test_values_stack_nested | 1.0190ms | 0.9544ms | 1.0478 KOps/s | 1.0514 KOps/s | |
test_values_stack_nested_leaf | 1.0265ms | 0.9489ms | 1.0539 KOps/s | 1.0663 KOps/s | |
test_values_stack_nested_locked | 0.6229ms | 0.5751ms | 1.7389 KOps/s | 1.7126 KOps/s | |
test_membership | 5.0282μs | 0.9531μs | 1.0492 MOps/s | 1.0569 MOps/s | |
test_membership_nested | 27.8620μs | 2.8888μs | 346.1605 KOps/s | 429.1854 KOps/s | |
test_membership_nested_leaf | 21.1600μs | 2.8795μs | 347.2767 KOps/s | 450.3711 KOps/s | |
test_membership_stacked_nested | 45.5720μs | 11.1172μs | 89.9506 KOps/s | 90.5400 KOps/s | |
test_membership_stacked_nested_leaf | 23.8420μs | 11.1409μs | 89.7594 KOps/s | 90.9280 KOps/s | |
test_membership_nested_last | 32.0110μs | 5.3233μs | 187.8523 KOps/s | 215.1527 KOps/s | |
test_membership_nested_leaf_last | 37.5210μs | 5.3169μs | 188.0807 KOps/s | 213.8689 KOps/s | |
test_membership_stacked_nested_last | 0.1757ms | 0.1435ms | 6.9681 KOps/s | 7.3791 KOps/s | |
test_membership_stacked_nested_leaf_last | 53.1120μs | 13.0022μs | 76.9099 KOps/s | 76.5587 KOps/s | |
test_nested_getleaf | 34.8310μs | 8.4032μs | 119.0024 KOps/s | 120.3866 KOps/s | |
test_nested_get | 22.8410μs | 7.9451μs | 125.8638 KOps/s | 127.2493 KOps/s | |
test_stacked_getleaf | 0.3750ms | 0.3224ms | 3.1013 KOps/s | 3.1255 KOps/s | |
test_stacked_get | 0.3492ms | 0.2920ms | 3.4245 KOps/s | 3.4566 KOps/s | |
test_nested_getitemleaf | 30.0420μs | 8.4347μs | 118.5581 KOps/s | 119.6205 KOps/s | |
test_nested_getitem | 29.4720μs | 7.9808μs | 125.3006 KOps/s | 125.9383 KOps/s | |
test_stacked_getitemleaf | 0.3657ms | 0.3240ms | 3.0867 KOps/s | 3.1111 KOps/s | |
test_stacked_getitem | 0.3978ms | 0.2897ms | 3.4522 KOps/s | 3.4187 KOps/s | |
test_lock_nested | 0.8728ms | 0.3951ms | 2.5309 KOps/s | 2.4089 KOps/s | |
test_lock_stack_nested | 83.8241ms | 6.3064ms | 158.5692 Ops/s | 153.2298 Ops/s | |
test_unlock_nested | 1.0073ms | 0.3972ms | 2.5175 KOps/s | 2.4018 KOps/s | |
test_unlock_stack_nested | 82.9870ms | 6.7153ms | 148.9139 Ops/s | 142.5844 Ops/s | |
test_flatten_speed | 0.4563ms | 0.2653ms | 3.7689 KOps/s | 3.7650 KOps/s | |
test_unflatten_speed | 0.4237ms | 0.3691ms | 2.7095 KOps/s | 2.7883 KOps/s | |
test_common_ops | 1.0297ms | 0.5565ms | 1.7969 KOps/s | 1.5274 KOps/s | |
test_creation | 23.8110μs | 1.5816μs | 632.2647 KOps/s | 611.9270 KOps/s | |
test_creation_empty | 29.9920μs | 6.4035μs | 156.1638 KOps/s | 102.2265 KOps/s | |
test_creation_nested_1 | 43.1120μs | 8.1323μs | 122.9658 KOps/s | 84.2265 KOps/s | |
test_creation_nested_2 | 26.0510μs | 10.4992μs | 95.2455 KOps/s | 70.3813 KOps/s | |
test_clone | 0.1067ms | 13.2482μs | 75.4817 KOps/s | 76.7503 KOps/s | |
test_getitem[int] | 28.8310μs | 10.7836μs | 92.7332 KOps/s | 88.9601 KOps/s | |
test_getitem[slice_int] | 46.1220μs | 22.5554μs | 44.3353 KOps/s | 44.8127 KOps/s | |
test_getitem[range] | 66.3730μs | 36.5150μs | 27.3860 KOps/s | 27.3568 KOps/s | |
test_getitem[tuple] | 48.8820μs | 18.9963μs | 52.6419 KOps/s | 52.7551 KOps/s | |
test_getitem[list] | 0.3670ms | 34.7525μs | 28.7749 KOps/s | 29.1854 KOps/s | |
test_setitem_dim[int] | 42.6920μs | 25.5101μs | 39.2002 KOps/s | 35.0825 KOps/s | |
test_setitem_dim[slice_int] | 70.5940μs | 45.7663μs | 21.8501 KOps/s | 20.8380 KOps/s | |
test_setitem_dim[range] | 76.0830μs | 57.5467μs | 17.3772 KOps/s | 16.0867 KOps/s | |
test_setitem_dim[tuple] | 61.5020μs | 38.1796μs | 26.1920 KOps/s | 23.4463 KOps/s | |
test_setitem | 0.1068ms | 16.7271μs | 59.7831 KOps/s | 55.1327 KOps/s | |
test_set | 0.1071ms | 16.2662μs | 61.4772 KOps/s | 56.6492 KOps/s | |
test_set_shared | 2.9020ms | 0.1022ms | 9.7856 KOps/s | 9.8392 KOps/s | |
test_update | 97.6540μs | 17.8178μs | 56.1236 KOps/s | 45.9201 KOps/s | |
test_update_nested | 0.1037ms | 23.9216μs | 41.8032 KOps/s | 36.5329 KOps/s | |
test_set_nested | 95.3540μs | 17.3658μs | 57.5844 KOps/s | 52.6310 KOps/s | |
test_set_nested_new | 98.9850μs | 20.3475μs | 49.1461 KOps/s | 45.4053 KOps/s | |
test_select | 0.1130ms | 33.4783μs | 29.8701 KOps/s | 21.0434 KOps/s | |
test_to | 74.4630μs | 56.4906μs | 17.7021 KOps/s | 17.1849 KOps/s | |
test_to_nonblocking | 71.4940μs | 33.9900μs | 29.4204 KOps/s | 27.5377 KOps/s | |
test_unbind_speed | 0.3775ms | 0.3187ms | 3.1377 KOps/s | 3.0498 KOps/s | |
test_unbind_speed_stack0 | 80.3481ms | 3.7136ms | 269.2800 Ops/s | 256.7950 Ops/s | |
test_unbind_speed_stack1 | 3.6081μs | 0.5348μs | 1.8698 MOps/s | 1.8929 MOps/s | |
test_split | 1.8512ms | 1.5637ms | 639.5239 Ops/s | 566.7137 Ops/s | |
test_chunk | 74.0810ms | 1.6828ms | 594.2602 Ops/s | 620.5868 Ops/s | |
test_creation[device0] | 0.1311ms | 70.8344μs | 14.1174 KOps/s | 13.0977 KOps/s | |
test_creation_from_tensor | 0.1310ms | 53.0791μs | 18.8398 KOps/s | 17.3905 KOps/s | |
test_add_one[memmap_tensor0] | 0.2028ms | 6.2406μs | 160.2418 KOps/s | 158.4085 KOps/s | |
test_contiguous[memmap_tensor0] | 10.8710μs | 0.6434μs | 1.5542 MOps/s | 1.5546 MOps/s | |
test_stack[memmap_tensor0] | 35.1420μs | 4.3267μs | 231.1209 KOps/s | 231.2569 KOps/s | |
test_memmaptd_index | 1.0033ms | 0.2599ms | 3.8472 KOps/s | 4.0820 KOps/s | |
test_memmaptd_index_astensor | 0.5709ms | 0.3162ms | 3.1628 KOps/s | 3.3095 KOps/s | |
test_memmaptd_index_op | 0.8539ms | 0.5555ms | 1.8002 KOps/s | 1.6943 KOps/s | |
test_serialize_model | 0.1657s | 96.6512ms | 10.3465 Ops/s | 9.7812 Ops/s | |
test_serialize_model_pickle | 1.3505s | 1.2376s | 0.8080 Ops/s | 0.8083 Ops/s | |
test_serialize_weights | 0.1693s | 95.0087ms | 10.5253 Ops/s | 10.1586 Ops/s | |
test_serialize_weights_returnearly | 0.2712s | 77.2043ms | 12.9526 Ops/s | 13.1193 Ops/s | |
test_serialize_weights_pickle | 1.3839s | 1.2453s | 0.8030 Ops/s | 0.8082 Ops/s | |
test_reshape_pytree | 58.1930μs | 24.7870μs | 40.3438 KOps/s | 40.8832 KOps/s | |
test_reshape_td | 53.1120μs | 28.8854μs | 34.6195 KOps/s | 33.9569 KOps/s | |
test_view_pytree | 54.3220μs | 24.5471μs | 40.7380 KOps/s | 41.1844 KOps/s | |
test_view_td | 27.7310μs | 4.2691μs | 234.2419 KOps/s | 239.9033 KOps/s | |
test_unbind_pytree | 0.1994ms | 30.4560μs | 32.8343 KOps/s | 33.3571 KOps/s | |
test_unbind_td | 0.1818ms | 50.9637μs | 19.6218 KOps/s | 18.9450 KOps/s | |
test_split_pytree | 57.3330μs | 28.7190μs | 34.8202 KOps/s | 35.1554 KOps/s | |
test_split_td | 0.7104ms | 40.4245μs | 24.7375 KOps/s | 23.9907 KOps/s | |
test_add_pytree | 61.0620μs | 34.9333μs | 28.6260 KOps/s | 29.4181 KOps/s | |
test_add_td | 88.9240μs | 44.5493μs | 22.4471 KOps/s | 20.6938 KOps/s | |
test_distributed | 5.9354ms | 91.7978μs | 10.8935 KOps/s | 13.1925 KOps/s | |
test_tdmodule | 0.1042ms | 16.7462μs | 59.7151 KOps/s | 50.7017 KOps/s | |
test_tdmodule_dispatch | 0.2228ms | 31.7406μs | 31.5054 KOps/s | 26.9391 KOps/s | |
test_tdseq | 29.8620μs | 19.3174μs | 51.7669 KOps/s | 46.1734 KOps/s | |
test_tdseq_dispatch | 50.1130μs | 34.2919μs | 29.1614 KOps/s | 25.2576 KOps/s | |
test_instantiation_functorch | 1.7951ms | 1.6763ms | 596.5441 Ops/s | 597.6452 Ops/s | |
test_instantiation_td | 1.7843ms | 1.1654ms | 858.0438 Ops/s | 859.5510 Ops/s | |
test_exec_functorch | 0.1944ms | 0.1583ms | 6.3177 KOps/s | 6.3071 KOps/s | |
test_exec_functional_call | 0.1894ms | 0.1556ms | 6.4261 KOps/s | 6.5256 KOps/s | |
test_exec_td | 0.1772ms | 0.1432ms | 6.9853 KOps/s | 7.0182 KOps/s | |
test_exec_td_decorator | 0.7786ms | 0.1786ms | 5.5981 KOps/s | 5.4667 KOps/s | |
test_vmap_mlp_speed[True-True] | 1.1544ms | 1.0902ms | 917.2496 Ops/s | 921.8875 Ops/s | |
test_vmap_mlp_speed[True-False] | 0.6794ms | 0.6364ms | 1.5712 KOps/s | 1.4995 KOps/s | |
test_vmap_mlp_speed[False-True] | 1.0523ms | 1.0027ms | 997.3132 Ops/s | 959.0447 Ops/s | |
test_vmap_mlp_speed[False-False] | 0.6157ms | 0.5697ms | 1.7553 KOps/s | 1.6907 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 3.2486ms | 2.4589ms | 406.6797 Ops/s | 393.9603 Ops/s | |
test_vmap_mlp_speed_decorator[True-False] | 1.1017ms | 0.6953ms | 1.4382 KOps/s | 1.4019 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 2.4272ms | 2.0664ms | 483.9413 Ops/s | 475.8640 Ops/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.9833ms | 0.5926ms | 1.6876 KOps/s | 1.6825 KOps/s | |
test_vmap_transformer_speed[True-True] | 12.8205ms | 12.4046ms | 80.6153 Ops/s | 82.4066 Ops/s | |
test_vmap_transformer_speed[True-False] | 8.2704ms | 7.9800ms | 125.3135 Ops/s | 126.7819 Ops/s | |
test_vmap_transformer_speed[False-True] | 12.6294ms | 12.2604ms | 81.5637 Ops/s | 83.7817 Ops/s | |
test_vmap_transformer_speed[False-False] | 8.1712ms | 7.9173ms | 126.3052 Ops/s | 125.4636 Ops/s | |
test_vmap_transformer_speed_decorator[True-True] | 75.4749ms | 74.4945ms | 13.4238 Ops/s | 12.2242 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 20.5643ms | 18.8867ms | 52.9472 Ops/s | 53.0377 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 0.1621s | 73.2386ms | 13.6540 Ops/s | 14.6890 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 20.3933ms | 18.5782ms | 53.8265 Ops/s | 49.4055 Ops/s |
vmoens
changed the title
[Performance] Faster exclude
[Performance] Better shared/memmap inheritance and faster exclude
Jan 16, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Something isn't working
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I'm grouping #620 and this PR.
The pitch is that we used to have a
_is_memmap
and_is_shared
args in the tensordict constructor, but that is messy: is_shared() should only be true when the tensordict is locked, but when a td was created from another one the locked attribute wasn't passed whereas the shared attributee was.To solve this, we will make sure that ops that are not modifying the tensors pass explicitly the shared and locked attributes.
I take the opportunity of the refactoring to make
exclude
faster.