Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Compatibility with missing _global_parameter_registration_hooks #574

Merged
merged 1 commit into from
Nov 24, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Nov 24, 2023

Older pytorch versions lack the _global_parameter_registration_hooks function.
This PR makes it accessory.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 24, 2023
@vmoens vmoens marked this pull request as ready for review November 24, 2023 11:17
@vmoens vmoens added bug Something isn't working and removed CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. labels Nov 24, 2023
@vmoens vmoens merged commit 597ca61 into main Nov 24, 2023
26 of 33 checks passed
@vmoens vmoens deleted the fix-_global_parameter_registration_hooks branch November 24, 2023 11:17
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.5198ms 12.6416μs 79.1038 KOps/s 78.4386 KOps/s $\color{#35bf28}+0.85\%$
test_plain_set_stack_nested 0.1417ms 0.1152ms 8.6843 KOps/s 8.4056 KOps/s $\color{#35bf28}+3.32\%$
test_plain_set_nested_inplace 0.2005ms 14.9622μs 66.8351 KOps/s 65.6029 KOps/s $\color{#35bf28}+1.88\%$
test_plain_set_stack_nested_inplace 0.1691ms 0.1411ms 7.0853 KOps/s 7.0284 KOps/s $\color{#35bf28}+0.81\%$
test_items 0.1796ms 4.7050μs 212.5391 KOps/s 213.6628 KOps/s $\color{#d91a1a}-0.53\%$
test_items_nested 0.5184ms 0.3408ms 2.9344 KOps/s 2.9320 KOps/s $\color{#35bf28}+0.08\%$
test_items_nested_locked 0.5157ms 0.3403ms 2.9385 KOps/s 2.9162 KOps/s $\color{#35bf28}+0.77\%$
test_items_nested_leaf 0.3547ms 0.2012ms 4.9708 KOps/s 4.9820 KOps/s $\color{#d91a1a}-0.22\%$
test_items_stack_nested 1.6790ms 1.4887ms 671.7122 Ops/s 668.6964 Ops/s $\color{#35bf28}+0.45\%$
test_items_stack_nested_leaf 1.4937ms 1.3108ms 762.8733 Ops/s 760.3058 Ops/s $\color{#35bf28}+0.34\%$
test_items_stack_nested_locked 1.8318ms 0.8092ms 1.2357 KOps/s 1.2324 KOps/s $\color{#35bf28}+0.27\%$
test_keys 0.1823ms 4.5754μs 218.5580 KOps/s 219.5301 KOps/s $\color{#d91a1a}-0.44\%$
test_keys_nested 0.5005ms 90.2406μs 11.0815 KOps/s 11.0373 KOps/s $\color{#35bf28}+0.40\%$
test_keys_nested_locked 0.2709ms 89.7287μs 11.1447 KOps/s 11.1710 KOps/s $\color{#d91a1a}-0.24\%$
test_keys_nested_leaf 41.8027ms 87.1270μs 11.4775 KOps/s 12.2462 KOps/s $\textbf{\color{#d91a1a}-6.28\%}$
test_keys_stack_nested 1.4878ms 1.3050ms 766.2995 Ops/s 769.9375 Ops/s $\color{#d91a1a}-0.47\%$
test_keys_stack_nested_leaf 1.4702ms 1.3006ms 768.8978 Ops/s 773.3941 Ops/s $\color{#d91a1a}-0.58\%$
test_keys_stack_nested_locked 0.7909ms 0.6102ms 1.6389 KOps/s 1.6436 KOps/s $\color{#d91a1a}-0.29\%$
test_values 8.5003μs 1.8860μs 530.2153 KOps/s 524.4637 KOps/s $\color{#35bf28}+1.10\%$
test_values_nested 0.2214ms 43.6017μs 22.9349 KOps/s 23.1274 KOps/s $\color{#d91a1a}-0.83\%$
test_values_nested_locked 0.1957ms 43.3717μs 23.0565 KOps/s 23.0476 KOps/s $\color{#35bf28}+0.04\%$
test_values_nested_leaf 0.2160ms 37.5149μs 26.6561 KOps/s 26.6743 KOps/s $\color{#d91a1a}-0.07\%$
test_values_stack_nested 1.3318ms 1.1406ms 876.7620 Ops/s 882.6390 Ops/s $\color{#d91a1a}-0.67\%$
test_values_stack_nested_leaf 1.1618ms 1.1256ms 888.4055 Ops/s 893.6457 Ops/s $\color{#d91a1a}-0.59\%$
test_values_stack_nested_locked 0.6655ms 0.4856ms 2.0592 KOps/s 2.0734 KOps/s $\color{#d91a1a}-0.68\%$
test_membership 35.6746μs 0.9434μs 1.0599 MOps/s 1.0612 MOps/s $\color{#d91a1a}-0.12\%$
test_membership_nested 32.7190μs 2.2132μs 451.8416 KOps/s 450.8737 KOps/s $\color{#35bf28}+0.21\%$
test_membership_nested_leaf 17.0300μs 2.1287μs 469.7717 KOps/s 472.1704 KOps/s $\color{#d91a1a}-0.51\%$
test_membership_stacked_nested 32.8800μs 10.8645μs 92.0428 KOps/s 92.3435 KOps/s $\color{#d91a1a}-0.33\%$
test_membership_stacked_nested_leaf 0.2240ms 10.9839μs 91.0421 KOps/s 93.2058 KOps/s $\color{#d91a1a}-2.32\%$
test_membership_nested_last 21.1700μs 4.6168μs 216.5991 KOps/s 215.6721 KOps/s $\color{#35bf28}+0.43\%$
test_membership_nested_leaf_last 0.1957ms 4.6151μs 216.6821 KOps/s 218.1447 KOps/s $\color{#d91a1a}-0.67\%$
test_membership_stacked_nested_last 0.3241ms 0.1350ms 7.4095 KOps/s 7.5359 KOps/s $\color{#d91a1a}-1.68\%$
test_membership_stacked_nested_leaf_last 25.8110μs 12.6893μs 78.8066 KOps/s 79.7279 KOps/s $\color{#d91a1a}-1.16\%$
test_nested_getleaf 0.1934ms 8.4075μs 118.9413 KOps/s 119.2424 KOps/s $\color{#d91a1a}-0.25\%$
test_nested_get 0.1848ms 7.9457μs 125.8544 KOps/s 125.9941 KOps/s $\color{#d91a1a}-0.11\%$
test_stacked_getleaf 0.7303ms 0.5648ms 1.7705 KOps/s 1.7011 KOps/s $\color{#35bf28}+4.08\%$
test_stacked_get 0.7107ms 0.5305ms 1.8849 KOps/s 1.7920 KOps/s $\textbf{\color{#35bf28}+5.19\%}$
test_nested_getitemleaf 0.1839ms 8.4986μs 117.6660 KOps/s 118.7978 KOps/s $\color{#d91a1a}-0.95\%$
test_nested_getitem 0.1855ms 7.9912μs 125.1375 KOps/s 125.6311 KOps/s $\color{#d91a1a}-0.39\%$
test_stacked_getitemleaf 0.7650ms 0.5702ms 1.7538 KOps/s 1.7232 KOps/s $\color{#35bf28}+1.78\%$
test_stacked_getitem 0.7222ms 0.5430ms 1.8415 KOps/s 1.8162 KOps/s $\color{#35bf28}+1.39\%$
test_lock_nested 4.3363ms 0.4583ms 2.1822 KOps/s 2.1515 KOps/s $\color{#35bf28}+1.43\%$
test_lock_stack_nested 68.3795ms 6.5426ms 152.8433 Ops/s 150.2901 Ops/s $\color{#35bf28}+1.70\%$
test_unlock_nested 1.3060ms 0.4371ms 2.2879 KOps/s 2.0157 KOps/s $\textbf{\color{#35bf28}+13.50\%}$
test_unlock_stack_nested 63.4438ms 7.2589ms 137.7623 Ops/s 137.1656 Ops/s $\color{#35bf28}+0.44\%$
test_flatten_speed 0.5197ms 0.1861ms 5.3742 KOps/s 5.3658 KOps/s $\color{#35bf28}+0.16\%$
test_unflatten_speed 0.5360ms 0.3599ms 2.7785 KOps/s 2.7639 KOps/s $\color{#35bf28}+0.53\%$
test_common_ops 1.0166ms 0.6012ms 1.6633 KOps/s 1.6251 KOps/s $\color{#35bf28}+2.35\%$
test_creation 14.5200μs 1.9684μs 508.0200 KOps/s 512.5480 KOps/s $\color{#d91a1a}-0.88\%$
test_creation_empty 22.6500μs 6.7251μs 148.6963 KOps/s 144.0743 KOps/s $\color{#35bf28}+3.21\%$
test_creation_nested_1 0.1983ms 9.0895μs 110.0166 KOps/s 107.0242 KOps/s $\color{#35bf28}+2.80\%$
test_creation_nested_2 30.6100μs 11.5740μs 86.4007 KOps/s 83.3431 KOps/s $\color{#35bf28}+3.67\%$
test_clone 88.7110μs 14.5740μs 68.6152 KOps/s 70.2371 KOps/s $\color{#d91a1a}-2.31\%$
test_getitem[int] 0.1989ms 12.1023μs 82.6287 KOps/s 82.9242 KOps/s $\color{#d91a1a}-0.36\%$
test_getitem[slice_int] 41.5810μs 23.6245μs 42.3290 KOps/s 43.7384 KOps/s $\color{#d91a1a}-3.22\%$
test_getitem[range] 78.1610μs 40.3842μs 24.7622 KOps/s 26.0346 KOps/s $\color{#d91a1a}-4.89\%$
test_getitem[tuple] 66.2010μs 20.8135μs 48.0457 KOps/s 49.5540 KOps/s $\color{#d91a1a}-3.04\%$
test_getitem[list] 0.2342ms 36.7095μs 27.2409 KOps/s 27.9280 KOps/s $\color{#d91a1a}-2.46\%$
test_setitem_dim[int] 44.9500μs 25.9965μs 38.4668 KOps/s 38.1771 KOps/s $\color{#35bf28}+0.76\%$
test_setitem_dim[slice_int] 63.4610μs 46.3730μs 21.5643 KOps/s 21.2868 KOps/s $\color{#35bf28}+1.30\%$
test_setitem_dim[range] 84.3710μs 62.9649μs 15.8819 KOps/s 15.7140 KOps/s $\color{#35bf28}+1.07\%$
test_setitem_dim[tuple] 56.7610μs 38.7559μs 25.8026 KOps/s 25.2198 KOps/s $\color{#35bf28}+2.31\%$
test_setitem 0.2059ms 18.1426μs 55.1188 KOps/s 55.1356 KOps/s $\color{#d91a1a}-0.03\%$
test_set 91.1220μs 17.0578μs 58.6243 KOps/s 57.0899 KOps/s $\color{#35bf28}+2.69\%$
test_set_shared 2.8256ms 0.1020ms 9.8077 KOps/s 9.9925 KOps/s $\color{#d91a1a}-1.85\%$
test_update 87.1110μs 21.2613μs 47.0338 KOps/s 45.1582 KOps/s $\color{#35bf28}+4.15\%$
test_update_nested 0.5704ms 30.6519μs 32.6244 KOps/s 32.4301 KOps/s $\color{#35bf28}+0.60\%$
test_set_nested 0.2236ms 18.7457μs 53.3456 KOps/s 52.5018 KOps/s $\color{#35bf28}+1.61\%$
test_set_nested_new 92.0220μs 23.0464μs 43.3907 KOps/s 42.6659 KOps/s $\color{#35bf28}+1.70\%$
test_select 0.2482ms 44.8918μs 22.2758 KOps/s 21.3222 KOps/s $\color{#35bf28}+4.47\%$
test_to 73.5110μs 52.6674μs 18.9871 KOps/s 19.1915 KOps/s $\color{#d91a1a}-1.07\%$
test_to_nonblocking 59.5800μs 35.3159μs 28.3158 KOps/s 28.5787 KOps/s $\color{#d91a1a}-0.92\%$
test_unbind_speed 0.5413ms 0.3499ms 2.8582 KOps/s 2.8309 KOps/s $\color{#35bf28}+0.97\%$
test_unbind_speed_stack0 60.7660ms 5.1640ms 193.6489 Ops/s 193.0033 Ops/s $\color{#35bf28}+0.33\%$
test_unbind_speed_stack1 1.9480μs 0.5331μs 1.8758 MOps/s 1.9294 MOps/s $\color{#d91a1a}-2.78\%$
test_split 52.9918ms 1.8045ms 554.1703 Ops/s 547.2410 Ops/s $\color{#35bf28}+1.27\%$
test_chunk 52.8646ms 1.7819ms 561.1868 Ops/s 552.1597 Ops/s $\color{#35bf28}+1.63\%$
test_creation[device0] 0.5091ms 0.3080ms 3.2469 KOps/s 3.2323 KOps/s $\color{#35bf28}+0.45\%$
test_creation[device1] 0.8047ms 0.3120ms 3.2051 KOps/s 3.2059 KOps/s $\color{#d91a1a}-0.02\%$
test_creation_from_tensor 56.7722ms 0.3651ms 2.7393 KOps/s 2.9576 KOps/s $\textbf{\color{#d91a1a}-7.38\%}$
test_add_one[memmap_tensor0] 0.1058ms 23.8469μs 41.9341 KOps/s 40.9699 KOps/s $\color{#35bf28}+2.35\%$
test_add_one[memmap_tensor1] 0.2033ms 73.6041μs 13.5862 KOps/s 13.6071 KOps/s $\color{#d91a1a}-0.15\%$
test_contiguous[memmap_tensor0] 0.1945ms 5.7109μs 175.1043 KOps/s 168.5631 KOps/s $\color{#35bf28}+3.88\%$
test_contiguous[memmap_tensor1] 46.0700μs 21.8361μs 45.7958 KOps/s 45.8333 KOps/s $\color{#d91a1a}-0.08\%$
test_stack[memmap_tensor0] 50.2600μs 19.5961μs 51.0307 KOps/s 50.9024 KOps/s $\color{#35bf28}+0.25\%$
test_stack[memmap_tensor1] 0.1546ms 74.1718μs 13.4822 KOps/s 13.4302 KOps/s $\color{#35bf28}+0.39\%$
test_memmaptd_index 0.4085ms 0.2206ms 4.5333 KOps/s 4.5190 KOps/s $\color{#35bf28}+0.32\%$
test_memmaptd_index_astensor 0.2990ms 0.2776ms 3.6017 KOps/s 3.6289 KOps/s $\color{#d91a1a}-0.75\%$
test_memmaptd_index_op 0.7017ms 0.5422ms 1.8443 KOps/s 1.8420 KOps/s $\color{#35bf28}+0.12\%$
test_reshape_pytree 0.2015ms 20.9298μs 47.7788 KOps/s 47.7095 KOps/s $\color{#35bf28}+0.15\%$
test_reshape_td 56.8210μs 29.8639μs 33.4853 KOps/s 33.2570 KOps/s $\color{#35bf28}+0.69\%$
test_view_pytree 37.5310μs 20.6658μs 48.3891 KOps/s 48.0801 KOps/s $\color{#35bf28}+0.64\%$
test_view_td 0.1829ms 4.0915μs 244.4090 KOps/s 246.5590 KOps/s $\color{#d91a1a}-0.87\%$
test_unbind_pytree 50.7510μs 25.7596μs 38.8204 KOps/s 39.1136 KOps/s $\color{#d91a1a}-0.75\%$
test_unbind_td 0.2362ms 56.1667μs 17.8041 KOps/s 17.8351 KOps/s $\color{#d91a1a}-0.17\%$
test_split_pytree 46.9510μs 24.2013μs 41.3201 KOps/s 41.5096 KOps/s $\color{#d91a1a}-0.46\%$
test_split_td 73.4310μs 44.3809μs 22.5322 KOps/s 22.7853 KOps/s $\color{#d91a1a}-1.11\%$
test_add_pytree 0.2177ms 32.3008μs 30.9590 KOps/s 30.6372 KOps/s $\color{#35bf28}+1.05\%$
test_add_td 74.2510μs 43.2661μs 23.1128 KOps/s 22.6666 KOps/s $\color{#35bf28}+1.97\%$
test_distributed 22.2210μs 5.7398μs 174.2233 KOps/s 181.4335 KOps/s $\color{#d91a1a}-3.97\%$
test_tdmodule 31.2400μs 16.4920μs 60.6355 KOps/s 58.8798 KOps/s $\color{#35bf28}+2.98\%$
test_tdmodule_dispatch 0.2572ms 32.4294μs 30.8362 KOps/s 30.5767 KOps/s $\color{#35bf28}+0.85\%$
test_tdseq 0.1223ms 19.7425μs 50.6523 KOps/s 49.1723 KOps/s $\color{#35bf28}+3.01\%$
test_tdseq_dispatch 61.0720μs 34.9953μs 28.5753 KOps/s 27.6471 KOps/s $\color{#35bf28}+3.36\%$
test_instantiation_functorch 1.8641ms 1.6800ms 595.2553 Ops/s 595.4418 Ops/s $\color{#d91a1a}-0.03\%$
test_instantiation_td 1.8775ms 1.1760ms 850.3135 Ops/s 845.0308 Ops/s $\color{#35bf28}+0.63\%$
test_exec_functorch 0.1998ms 0.1586ms 6.3057 KOps/s 6.2790 KOps/s $\color{#35bf28}+0.42\%$
test_exec_functional_call 0.3445ms 0.1595ms 6.2710 KOps/s 6.3548 KOps/s $\color{#d91a1a}-1.32\%$
test_exec_td 0.1905ms 0.1500ms 6.6658 KOps/s 6.8405 KOps/s $\color{#d91a1a}-2.55\%$
test_exec_td_decorator 1.0071ms 0.2259ms 4.4271 KOps/s 4.5844 KOps/s $\color{#d91a1a}-3.43\%$
test_vmap_mlp_speed[True-True] 1.2979ms 1.0915ms 916.1732 Ops/s 914.6575 Ops/s $\color{#35bf28}+0.17\%$
test_vmap_mlp_speed[True-False] 0.8201ms 0.6180ms 1.6182 KOps/s 1.6087 KOps/s $\color{#35bf28}+0.59\%$
test_vmap_mlp_speed[False-True] 1.0636ms 1.0010ms 998.9649 Ops/s 989.1547 Ops/s $\color{#35bf28}+0.99\%$
test_vmap_mlp_speed[False-False] 0.7448ms 0.5512ms 1.8143 KOps/s 1.8142 KOps/s $+0.01\%$
test_vmap_mlp_speed_decorator[True-True] 2.6045ms 1.8053ms 553.9227 Ops/s 550.2826 Ops/s $\color{#35bf28}+0.66\%$
test_vmap_mlp_speed_decorator[True-False] 1.2169ms 0.6921ms 1.4450 KOps/s 1.4390 KOps/s $\color{#35bf28}+0.41\%$
test_vmap_mlp_speed_decorator[False-True] 2.1038ms 1.6219ms 616.5757 Ops/s 609.8945 Ops/s $\color{#35bf28}+1.10\%$
test_vmap_mlp_speed_decorator[False-False] 0.9947ms 0.5875ms 1.7020 KOps/s 1.6907 KOps/s $\color{#35bf28}+0.67\%$
test_vmap_transformer_speed[True-True] 12.9259ms 12.7540ms 78.4066 Ops/s 77.8699 Ops/s $\color{#35bf28}+0.69\%$
test_vmap_transformer_speed[True-False] 9.4135ms 8.3269ms 120.0923 Ops/s 119.9144 Ops/s $\color{#35bf28}+0.15\%$
test_vmap_transformer_speed[False-True] 12.8437ms 12.6442ms 79.0879 Ops/s 78.7098 Ops/s $\color{#35bf28}+0.48\%$
test_vmap_transformer_speed[False-False] 8.3924ms 8.2217ms 121.6288 Ops/s 120.8547 Ops/s $\color{#35bf28}+0.64\%$
test_vmap_transformer_speed_decorator[True-True] 44.4728ms 43.4252ms 23.0281 Ops/s 22.8199 Ops/s $\color{#35bf28}+0.91\%$
test_vmap_transformer_speed_decorator[True-False] 96.4458ms 21.9883ms 45.4787 Ops/s 48.7437 Ops/s $\textbf{\color{#d91a1a}-6.70\%}$
test_vmap_transformer_speed_decorator[False-True] 43.9470ms 42.9193ms 23.2996 Ops/s 21.4506 Ops/s $\textbf{\color{#35bf28}+8.62\%}$
test_vmap_transformer_speed_decorator[False-False] 0.1005s 21.5826ms 46.3336 Ops/s 49.7008 Ops/s $\textbf{\color{#d91a1a}-6.77\%}$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants