Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Sync cuda only if initialized #767

Merged
merged 1 commit into from
Apr 30, 2024
Merged

[BugFix] Sync cuda only if initialized #767

merged 1 commit into from
Apr 30, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Apr 30, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 30, 2024
@vmoens vmoens added the bug Something isn't working label Apr 30, 2024
@vmoens vmoens merged commit 0c72dd7 into main Apr 30, 2024
21 of 28 checks passed
@vmoens vmoens deleted the sync-only-if-init branch April 30, 2024 09:06
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 51.3460μs 17.5902μs 56.8498 KOps/s 59.1602 KOps/s $\color{#d91a1a}-3.91\%$
test_plain_set_stack_nested 60.2720μs 17.8178μs 56.1237 KOps/s 58.5934 KOps/s $\color{#d91a1a}-4.21\%$
test_plain_set_nested_inplace 79.9200μs 20.1602μs 49.6026 KOps/s 51.7059 KOps/s $\color{#d91a1a}-4.07\%$
test_plain_set_stack_nested_inplace 66.8250μs 19.9491μs 50.1275 KOps/s 51.8456 KOps/s $\color{#d91a1a}-3.31\%$
test_items 14.3660μs 2.7144μs 368.4059 KOps/s 386.9707 KOps/s $\color{#d91a1a}-4.80\%$
test_items_nested 0.8325ms 0.2658ms 3.7624 KOps/s 3.7301 KOps/s $\color{#35bf28}+0.87\%$
test_items_nested_locked 0.5580ms 0.2635ms 3.7944 KOps/s 3.6584 KOps/s $\color{#35bf28}+3.72\%$
test_items_nested_leaf 0.1604ms 76.7687μs 13.0261 KOps/s 12.8469 KOps/s $\color{#35bf28}+1.39\%$
test_items_stack_nested 0.4526ms 0.2631ms 3.8015 KOps/s 3.7275 KOps/s $\color{#35bf28}+1.98\%$
test_items_stack_nested_leaf 0.5344ms 79.9748μs 12.5039 KOps/s 12.8925 KOps/s $\color{#d91a1a}-3.01\%$
test_items_stack_nested_locked 0.4093ms 0.2634ms 3.7967 KOps/s 3.7091 KOps/s $\color{#35bf28}+2.36\%$
test_keys 29.4950μs 3.9614μs 252.4364 KOps/s 258.9075 KOps/s $\color{#d91a1a}-2.50\%$
test_keys_nested 0.2620ms 0.1397ms 7.1596 KOps/s 7.2182 KOps/s $\color{#d91a1a}-0.81\%$
test_keys_nested_locked 2.5352ms 0.1433ms 6.9769 KOps/s 6.9100 KOps/s $\color{#35bf28}+0.97\%$
test_keys_nested_leaf 0.2077ms 0.1243ms 8.0422 KOps/s 8.4206 KOps/s $\color{#d91a1a}-4.49\%$
test_keys_stack_nested 0.2612ms 0.1392ms 7.1832 KOps/s 7.2497 KOps/s $\color{#d91a1a}-0.92\%$
test_keys_stack_nested_leaf 0.1999ms 0.1171ms 8.5366 KOps/s 8.4373 KOps/s $\color{#35bf28}+1.18\%$
test_keys_stack_nested_locked 0.2659ms 0.1443ms 6.9283 KOps/s 6.9704 KOps/s $\color{#d91a1a}-0.61\%$
test_values 10.9078μs 1.1533μs 867.0551 KOps/s 867.5608 KOps/s $\color{#d91a1a}-0.06\%$
test_values_nested 93.3550μs 50.1960μs 19.9219 KOps/s 19.2481 KOps/s $\color{#35bf28}+3.50\%$
test_values_nested_locked 0.1199ms 50.3084μs 19.8774 KOps/s 19.3296 KOps/s $\color{#35bf28}+2.83\%$
test_values_nested_leaf 84.0660μs 45.3785μs 22.0369 KOps/s 21.3802 KOps/s $\color{#35bf28}+3.07\%$
test_values_stack_nested 99.6020μs 50.3565μs 19.8584 KOps/s 19.4871 KOps/s $\color{#35bf28}+1.91\%$
test_values_stack_nested_leaf 96.2400μs 45.7002μs 21.8817 KOps/s 21.4426 KOps/s $\color{#35bf28}+2.05\%$
test_values_stack_nested_locked 0.1025ms 50.4478μs 19.8225 KOps/s 19.5070 KOps/s $\color{#35bf28}+1.62\%$
test_membership 16.1700μs 1.3268μs 753.6913 KOps/s 741.0437 KOps/s $\color{#35bf28}+1.71\%$
test_membership_nested 35.0050μs 3.3887μs 295.0964 KOps/s 288.4450 KOps/s $\color{#35bf28}+2.31\%$
test_membership_nested_leaf 47.3420μs 3.4141μs 292.9024 KOps/s 283.3889 KOps/s $\color{#35bf28}+3.36\%$
test_membership_stacked_nested 42.5400μs 3.3816μs 295.7195 KOps/s 292.1257 KOps/s $\color{#35bf28}+1.23\%$
test_membership_stacked_nested_leaf 55.4830μs 3.4315μs 291.4174 KOps/s 290.5896 KOps/s $\color{#35bf28}+0.28\%$
test_membership_nested_last 25.4080μs 4.1016μs 243.8088 KOps/s 239.2836 KOps/s $\color{#35bf28}+1.89\%$
test_membership_nested_leaf_last 38.7730μs 4.1377μs 241.6828 KOps/s 237.5970 KOps/s $\color{#35bf28}+1.72\%$
test_membership_stacked_nested_last 22.4620μs 4.1058μs 243.5604 KOps/s 231.5966 KOps/s $\textbf{\color{#35bf28}+5.17\%}$
test_membership_stacked_nested_leaf_last 47.1680μs 4.1456μs 241.2189 KOps/s 240.2639 KOps/s $\color{#35bf28}+0.40\%$
test_nested_getleaf 53.5600μs 10.7445μs 93.0712 KOps/s 92.6895 KOps/s $\color{#35bf28}+0.41\%$
test_nested_get 51.3350μs 10.3167μs 96.9300 KOps/s 97.3031 KOps/s $\color{#d91a1a}-0.38\%$
test_stacked_getleaf 31.3180μs 10.7220μs 93.2661 KOps/s 91.7729 KOps/s $\color{#35bf28}+1.63\%$
test_stacked_get 51.8770μs 10.1525μs 98.4982 KOps/s 97.4394 KOps/s $\color{#35bf28}+1.09\%$
test_nested_getitemleaf 34.1740μs 11.1318μs 89.8324 KOps/s 85.5681 KOps/s $\color{#35bf28}+4.98\%$
test_nested_getitem 52.4480μs 10.2063μs 97.9786 KOps/s 93.2217 KOps/s $\textbf{\color{#35bf28}+5.10\%}$
test_stacked_getitemleaf 54.1410μs 10.9186μs 91.5870 KOps/s 85.8679 KOps/s $\textbf{\color{#35bf28}+6.66\%}$
test_stacked_getitem 51.4960μs 10.1876μs 98.1589 KOps/s 93.7947 KOps/s $\color{#35bf28}+4.65\%$
test_lock_nested 50.7226ms 0.4096ms 2.4413 KOps/s 2.8112 KOps/s $\textbf{\color{#d91a1a}-13.16\%}$
test_lock_stack_nested 0.4337ms 0.3112ms 3.2139 KOps/s 3.1258 KOps/s $\color{#35bf28}+2.82\%$
test_unlock_nested 0.9028ms 0.3507ms 2.8516 KOps/s 2.4447 KOps/s $\textbf{\color{#35bf28}+16.64\%}$
test_unlock_stack_nested 0.5180ms 0.3197ms 3.1283 KOps/s 3.0557 KOps/s $\color{#35bf28}+2.38\%$
test_flatten_speed 0.2131ms 95.2863μs 10.4947 KOps/s 10.2598 KOps/s $\color{#35bf28}+2.29\%$
test_unflatten_speed 0.6887ms 0.4142ms 2.4142 KOps/s 2.3796 KOps/s $\color{#35bf28}+1.46\%$
test_common_ops 1.5337ms 0.7503ms 1.3327 KOps/s 1.3691 KOps/s $\color{#d91a1a}-2.65\%$
test_creation 70.8120μs 1.8981μs 526.8305 KOps/s 515.4778 KOps/s $\color{#35bf28}+2.20\%$
test_creation_empty 38.0200μs 11.8530μs 84.3671 KOps/s 97.5822 KOps/s $\textbf{\color{#d91a1a}-13.54\%}$
test_creation_nested_1 62.8070μs 14.4362μs 69.2704 KOps/s 78.0829 KOps/s $\textbf{\color{#d91a1a}-11.29\%}$
test_creation_nested_2 88.4250μs 17.7949μs 56.1959 KOps/s 60.7620 KOps/s $\textbf{\color{#d91a1a}-7.51\%}$
test_clone 0.1208ms 13.5011μs 74.0679 KOps/s 74.6047 KOps/s $\color{#d91a1a}-0.72\%$
test_getitem[int] 29.6060μs 11.6951μs 85.5060 KOps/s 83.8444 KOps/s $\color{#35bf28}+1.98\%$
test_getitem[slice_int] 94.6100μs 22.7199μs 44.0144 KOps/s 41.9315 KOps/s $\color{#35bf28}+4.97\%$
test_getitem[range] 80.7010μs 58.7554μs 17.0197 KOps/s 16.1083 KOps/s $\textbf{\color{#35bf28}+5.66\%}$
test_getitem[tuple] 78.4060μs 18.6917μs 53.4998 KOps/s 50.3216 KOps/s $\textbf{\color{#35bf28}+6.32\%}$
test_getitem[list] 0.1018ms 40.3404μs 24.7891 KOps/s 24.1511 KOps/s $\color{#35bf28}+2.64\%$
test_setitem_dim[int] 52.4080μs 34.7614μs 28.7675 KOps/s 27.3647 KOps/s $\textbf{\color{#35bf28}+5.13\%}$
test_setitem_dim[slice_int] 0.1065ms 62.7724μs 15.9306 KOps/s 15.5368 KOps/s $\color{#35bf28}+2.53\%$
test_setitem_dim[range] 0.1174ms 84.6696μs 11.8106 KOps/s 11.8245 KOps/s $\color{#d91a1a}-0.12\%$
test_setitem_dim[tuple] 95.2180μs 52.0605μs 19.2084 KOps/s 19.1217 KOps/s $\color{#35bf28}+0.45\%$
test_setitem 72.1750μs 21.5136μs 46.4823 KOps/s 49.1063 KOps/s $\textbf{\color{#d91a1a}-5.34\%}$
test_set 77.1240μs 20.5549μs 48.6501 KOps/s 50.7804 KOps/s $\color{#d91a1a}-4.20\%$
test_set_shared 2.7877ms 0.1432ms 6.9846 KOps/s 7.0510 KOps/s $\color{#d91a1a}-0.94\%$
test_update 0.1235ms 23.3399μs 42.8450 KOps/s 47.3442 KOps/s $\textbf{\color{#d91a1a}-9.50\%}$
test_update_nested 0.1186ms 31.8294μs 31.4175 KOps/s 33.0725 KOps/s $\textbf{\color{#d91a1a}-5.00\%}$
test_update__nested 70.3520μs 24.5922μs 40.6633 KOps/s 39.4629 KOps/s $\color{#35bf28}+3.04\%$
test_set_nested 0.1246ms 22.6045μs 44.2390 KOps/s 46.4292 KOps/s $\color{#d91a1a}-4.72\%$
test_set_nested_new 0.1294ms 26.5008μs 37.7347 KOps/s 38.8582 KOps/s $\color{#d91a1a}-2.89\%$
test_select 0.1566ms 40.7334μs 24.5499 KOps/s 24.4026 KOps/s $\color{#35bf28}+0.60\%$
test_select_nested 4.9209ms 59.6746μs 16.7575 KOps/s 16.2626 KOps/s $\color{#35bf28}+3.04\%$
test_exclude_nested 0.1821ms 0.1185ms 8.4383 KOps/s 8.2956 KOps/s $\color{#35bf28}+1.72\%$
test_empty[True] 0.5802ms 0.3892ms 2.5691 KOps/s 2.5280 KOps/s $\color{#35bf28}+1.63\%$
test_empty[False] 11.2176μs 1.0524μs 950.2217 KOps/s 918.6940 KOps/s $\color{#35bf28}+3.43\%$
test_unbind_speed 0.4953ms 0.2557ms 3.9114 KOps/s 3.8106 KOps/s $\color{#35bf28}+2.64\%$
test_unbind_speed_stack0 0.4150ms 0.2535ms 3.9440 KOps/s 3.8892 KOps/s $\color{#35bf28}+1.41\%$
test_unbind_speed_stack1 65.6232ms 0.7239ms 1.3813 KOps/s 1.2717 KOps/s $\textbf{\color{#35bf28}+8.62\%}$
test_split 64.8744ms 1.5915ms 628.3466 Ops/s 638.2248 Ops/s $\color{#d91a1a}-1.55\%$
test_chunk 68.0651ms 1.5967ms 626.2971 Ops/s 590.7942 Ops/s $\textbf{\color{#35bf28}+6.01\%}$
test_creation[device0] 0.2431ms 0.1045ms 9.5659 KOps/s 9.6265 KOps/s $\color{#d91a1a}-0.63\%$
test_creation_from_tensor 3.5599ms 83.0619μs 12.0392 KOps/s 11.9752 KOps/s $\color{#35bf28}+0.53\%$
test_add_one[memmap_tensor0] 0.1108ms 5.5116μs 181.4350 KOps/s 182.4363 KOps/s $\color{#d91a1a}-0.55\%$
test_contiguous[memmap_tensor0] 12.3830μs 0.6344μs 1.5763 MOps/s 1.5826 MOps/s $\color{#d91a1a}-0.40\%$
test_stack[memmap_tensor0] 26.8100μs 3.6289μs 275.5643 KOps/s 279.3413 KOps/s $\color{#d91a1a}-1.35\%$
test_memmaptd_index 1.0684ms 0.2369ms 4.2217 KOps/s 4.0702 KOps/s $\color{#35bf28}+3.72\%$
test_memmaptd_index_astensor 0.7564ms 0.3162ms 3.1630 KOps/s 3.0637 KOps/s $\color{#35bf28}+3.24\%$
test_memmaptd_index_op 1.0359ms 0.6112ms 1.6363 KOps/s 1.6375 KOps/s $\color{#d91a1a}-0.07\%$
test_serialize_model 0.1092s 0.1026s 9.7498 Ops/s 9.0720 Ops/s $\textbf{\color{#35bf28}+7.47\%}$
test_serialize_model_pickle 0.4492s 0.3774s 2.6495 Ops/s 2.5841 Ops/s $\color{#35bf28}+2.53\%$
test_serialize_weights 0.1639s 0.1085s 9.2163 Ops/s 9.1932 Ops/s $\color{#35bf28}+0.25\%$
test_serialize_weights_returnearly 0.1959s 0.1300s 7.6895 Ops/s 7.8823 Ops/s $\color{#d91a1a}-2.45\%$
test_serialize_weights_pickle 1.0403s 0.5855s 1.7079 Ops/s 2.4440 Ops/s $\textbf{\color{#d91a1a}-30.12\%}$
test_serialize_weights_filesystem 0.1606s 97.4265ms 10.2642 Ops/s 10.4443 Ops/s $\color{#d91a1a}-1.72\%$
test_serialize_model_filesystem 0.1587s 99.7421ms 10.0259 Ops/s 10.2932 Ops/s $\color{#d91a1a}-2.60\%$
test_reshape_pytree 55.8340μs 25.1231μs 39.8041 KOps/s 37.8913 KOps/s $\textbf{\color{#35bf28}+5.05\%}$
test_reshape_td 0.1021ms 33.4331μs 29.9105 KOps/s 30.1468 KOps/s $\color{#d91a1a}-0.78\%$
test_view_pytree 85.2590μs 24.8933μs 40.1714 KOps/s 37.4267 KOps/s $\textbf{\color{#35bf28}+7.33\%}$
test_view_td 88.0350μs 37.1383μs 26.9264 KOps/s 26.4547 KOps/s $\color{#35bf28}+1.78\%$
test_unbind_pytree 89.0760μs 29.2336μs 34.2072 KOps/s 32.9465 KOps/s $\color{#35bf28}+3.83\%$
test_unbind_td 0.4338ms 38.1794μs 26.1921 KOps/s 26.0581 KOps/s $\color{#35bf28}+0.51\%$
test_split_pytree 65.0010μs 28.8446μs 34.6685 KOps/s 32.8383 KOps/s $\textbf{\color{#35bf28}+5.57\%}$
test_split_td 0.1395ms 40.2698μs 24.8325 KOps/s 23.8393 KOps/s $\color{#35bf28}+4.17\%$
test_add_pytree 86.1910μs 34.6904μs 28.8265 KOps/s 28.0174 KOps/s $\color{#35bf28}+2.89\%$
test_add_td 0.1215ms 54.7270μs 18.2725 KOps/s 17.6494 KOps/s $\color{#35bf28}+3.53\%$
test_distributed 0.2148ms 98.3830μs 10.1644 KOps/s 9.8461 KOps/s $\color{#35bf28}+3.23\%$
test_tdmodule 44.4230μs 18.2911μs 54.6715 KOps/s 56.7570 KOps/s $\color{#d91a1a}-3.67\%$
test_tdmodule_dispatch 63.6390μs 36.6912μs 27.2545 KOps/s 29.4070 KOps/s $\textbf{\color{#d91a1a}-7.32\%}$
test_tdseq 51.6260μs 21.6714μs 46.1438 KOps/s 48.0649 KOps/s $\color{#d91a1a}-4.00\%$
test_tdseq_dispatch 79.2180μs 44.3015μs 22.5726 KOps/s 24.5834 KOps/s $\textbf{\color{#d91a1a}-8.18\%}$
test_instantiation_functorch 2.3318ms 1.2857ms 777.7943 Ops/s 749.2025 Ops/s $\color{#35bf28}+3.82\%$
test_instantiation_td 1.5963ms 1.0207ms 979.7271 Ops/s 910.1248 Ops/s $\textbf{\color{#35bf28}+7.65\%}$
test_exec_functorch 0.2972ms 0.1610ms 6.2104 KOps/s 6.1197 KOps/s $\color{#35bf28}+1.48\%$
test_exec_functional_call 0.2899ms 0.1499ms 6.6715 KOps/s 6.6117 KOps/s $\color{#35bf28}+0.91\%$
test_exec_td 0.2453ms 0.1469ms 6.8094 KOps/s 6.8668 KOps/s $\color{#d91a1a}-0.84\%$
test_exec_td_decorator 0.5602ms 0.2219ms 4.5074 KOps/s 4.4745 KOps/s $\color{#35bf28}+0.73\%$
test_vmap_mlp_speed[True-True] 0.7688ms 0.4928ms 2.0291 KOps/s 2.0400 KOps/s $\color{#d91a1a}-0.54\%$
test_vmap_mlp_speed[True-False] 0.5882ms 0.4851ms 2.0614 KOps/s 2.0833 KOps/s $\color{#d91a1a}-1.05\%$
test_vmap_mlp_speed[False-True] 0.5630ms 0.3952ms 2.5307 KOps/s 2.5743 KOps/s $\color{#d91a1a}-1.70\%$
test_vmap_mlp_speed[False-False] 0.7099ms 0.3945ms 2.5349 KOps/s 2.5663 KOps/s $\color{#d91a1a}-1.22\%$
test_vmap_mlp_speed_decorator[True-True] 1.0779ms 0.5535ms 1.8066 KOps/s 1.8142 KOps/s $\color{#d91a1a}-0.41\%$
test_vmap_mlp_speed_decorator[True-False] 0.8679ms 0.5538ms 1.8058 KOps/s 1.8153 KOps/s $\color{#d91a1a}-0.52\%$
test_vmap_mlp_speed_decorator[False-True] 0.7247ms 0.4515ms 2.2149 KOps/s 2.2135 KOps/s $\color{#35bf28}+0.06\%$
test_vmap_mlp_speed_decorator[False-False] 0.6995ms 0.4522ms 2.2112 KOps/s 2.2175 KOps/s $\color{#d91a1a}-0.28\%$
test_to_module_speed[True] 2.5198ms 1.6661ms 600.1983 Ops/s 592.7601 Ops/s $\color{#35bf28}+1.25\%$
test_to_module_speed[False] 2.2313ms 1.6362ms 611.1741 Ops/s 603.9089 Ops/s $\color{#35bf28}+1.20\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants