Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Fix benchmark on gpu #1706

Merged
merged 10 commits into from
Nov 20, 2023
Merged

[CI] Fix benchmark on gpu #1706

merged 10 commits into from
Nov 20, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Nov 20, 2023

Copy link

pytorch-bot bot commented Nov 20, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1706

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (5 Unrelated Failures)

As of commit 23a4786 with merge base 5cac16a (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 20, 2023
@vmoens vmoens added the CI Has to do with CI setup (e.g. wheels & builds, tests...) label Nov 20, 2023
Copy link

github-actions bot commented Nov 20, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 89. Improved: $\large\color{#35bf28}27$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 62.5154ms 61.7410ms 16.1967 Ops/s 14.9557 Ops/s $\textbf{\color{#35bf28}+8.30\%}$
test_sync 44.5157ms 35.4200ms 28.2326 Ops/s 29.5509 Ops/s $\color{#d91a1a}-4.46\%$
test_async 73.0377ms 33.6547ms 29.7135 Ops/s 30.3100 Ops/s $\color{#d91a1a}-1.97\%$
test_simple 0.4896s 0.4386s 2.2802 Ops/s 2.2172 Ops/s $\color{#35bf28}+2.84\%$
test_transformed 0.6392s 0.6017s 1.6621 Ops/s 1.5919 Ops/s $\color{#35bf28}+4.41\%$
test_serial 1.2945s 1.2507s 0.7996 Ops/s 0.7738 Ops/s $\color{#35bf28}+3.33\%$
test_parallel 1.2887s 1.2357s 0.8092 Ops/s 0.7905 Ops/s $\color{#35bf28}+2.37\%$
test_step_mdp_speed[True-True-True-True-True] 0.1702ms 24.0983μs 41.4968 KOps/s 40.0450 KOps/s $\color{#35bf28}+3.63\%$
test_step_mdp_speed[True-True-True-True-False] 62.8180μs 14.3811μs 69.5355 KOps/s 67.0055 KOps/s $\color{#35bf28}+3.78\%$
test_step_mdp_speed[True-True-True-False-True] 46.5260μs 15.0312μs 66.5285 KOps/s 64.1555 KOps/s $\color{#35bf28}+3.70\%$
test_step_mdp_speed[True-True-True-False-False] 33.2320μs 8.8500μs 112.9944 KOps/s 110.8568 KOps/s $\color{#35bf28}+1.93\%$
test_step_mdp_speed[True-True-False-True-True] 53.3990μs 25.6097μs 39.0477 KOps/s 38.0386 KOps/s $\color{#35bf28}+2.65\%$
test_step_mdp_speed[True-True-False-True-False] 40.5350μs 15.8203μs 63.2100 KOps/s 61.5316 KOps/s $\color{#35bf28}+2.73\%$
test_step_mdp_speed[True-True-False-False-True] 69.6090μs 16.2228μs 61.6415 KOps/s 58.8020 KOps/s $\color{#35bf28}+4.83\%$
test_step_mdp_speed[True-True-False-False-False] 36.0470μs 10.1671μs 98.3563 KOps/s 95.3764 KOps/s $\color{#35bf28}+3.12\%$
test_step_mdp_speed[True-False-True-True-True] 96.4290μs 26.8511μs 37.2424 KOps/s 35.5418 KOps/s $\color{#35bf28}+4.78\%$
test_step_mdp_speed[True-False-True-True-False] 44.1620μs 17.2902μs 57.8363 KOps/s 56.3148 KOps/s $\color{#35bf28}+2.70\%$
test_step_mdp_speed[True-False-True-False-True] 45.1240μs 16.3805μs 61.0481 KOps/s 59.4257 KOps/s $\color{#35bf28}+2.73\%$
test_step_mdp_speed[True-False-True-False-False] 26.1190μs 10.2080μs 97.9626 KOps/s 95.1095 KOps/s $\color{#35bf28}+3.00\%$
test_step_mdp_speed[True-False-False-True-True] 86.7100μs 28.0404μs 35.6628 KOps/s 34.0904 KOps/s $\color{#35bf28}+4.61\%$
test_step_mdp_speed[True-False-False-True-False] 52.4580μs 18.5346μs 53.9530 KOps/s 52.6150 KOps/s $\color{#35bf28}+2.54\%$
test_step_mdp_speed[True-False-False-False-True] 44.3530μs 17.4503μs 57.3057 KOps/s 54.8669 KOps/s $\color{#35bf28}+4.44\%$
test_step_mdp_speed[True-False-False-False-False] 34.2440μs 11.5211μs 86.7976 KOps/s 84.8262 KOps/s $\color{#35bf28}+2.32\%$
test_step_mdp_speed[False-True-True-True-True] 57.2360μs 26.9698μs 37.0785 KOps/s 35.6064 KOps/s $\color{#35bf28}+4.13\%$
test_step_mdp_speed[False-True-True-True-False] 54.0800μs 17.2101μs 58.1055 KOps/s 55.9718 KOps/s $\color{#35bf28}+3.81\%$
test_step_mdp_speed[False-True-True-False-True] 84.6870μs 18.5325μs 53.9593 KOps/s 51.6884 KOps/s $\color{#35bf28}+4.39\%$
test_step_mdp_speed[False-True-True-False-False] 70.7710μs 11.6596μs 85.7664 KOps/s 83.1341 KOps/s $\color{#35bf28}+3.17\%$
test_step_mdp_speed[False-True-False-True-True] 61.3540μs 28.1914μs 35.4718 KOps/s 34.2482 KOps/s $\color{#35bf28}+3.57\%$
test_step_mdp_speed[False-True-False-True-False] 53.7400μs 18.6355μs 53.6610 KOps/s 52.5981 KOps/s $\color{#35bf28}+2.02\%$
test_step_mdp_speed[False-True-False-False-True] 44.6830μs 19.8059μs 50.4900 KOps/s 49.1444 KOps/s $\color{#35bf28}+2.74\%$
test_step_mdp_speed[False-True-False-False-False] 0.3210ms 13.7372μs 72.7950 KOps/s 76.0637 KOps/s $\color{#d91a1a}-4.30\%$
test_step_mdp_speed[False-False-True-True-True] 74.5190μs 29.8410μs 33.5109 KOps/s 32.0556 KOps/s $\color{#35bf28}+4.54\%$
test_step_mdp_speed[False-False-True-True-False] 48.5600μs 19.9080μs 50.2310 KOps/s 48.6874 KOps/s $\color{#35bf28}+3.17\%$
test_step_mdp_speed[False-False-True-False-True] 61.8450μs 20.3750μs 49.0798 KOps/s 48.0878 KOps/s $\color{#35bf28}+2.06\%$
test_step_mdp_speed[False-False-True-False-False] 37.1090μs 12.7018μs 78.7288 KOps/s 75.5423 KOps/s $\color{#35bf28}+4.22\%$
test_step_mdp_speed[False-False-False-True-True] 73.3360μs 31.1979μs 32.0535 KOps/s 31.3141 KOps/s $\color{#35bf28}+2.36\%$
test_step_mdp_speed[False-False-False-True-False] 51.9070μs 21.0533μs 47.4986 KOps/s 46.4969 KOps/s $\color{#35bf28}+2.15\%$
test_step_mdp_speed[False-False-False-False-True] 67.5050μs 21.2559μs 47.0457 KOps/s 46.4217 KOps/s $\color{#35bf28}+1.34\%$
test_step_mdp_speed[False-False-False-False-False] 0.1969ms 13.9537μs 71.6658 KOps/s 70.5812 KOps/s $\color{#35bf28}+1.54\%$
test_values[generalized_advantage_estimate-True-True] 12.9128ms 12.1522ms 82.2898 Ops/s 83.3635 Ops/s $\color{#d91a1a}-1.29\%$
test_values[vec_generalized_advantage_estimate-True-True] 35.3532ms 27.1657ms 36.8112 Ops/s 36.6605 Ops/s $\color{#35bf28}+0.41\%$
test_values[td0_return_estimate-False-False] 0.2718ms 0.1819ms 5.4976 KOps/s 5.2817 KOps/s $\color{#35bf28}+4.09\%$
test_values[td1_return_estimate-False-False] 26.4060ms 26.1301ms 38.2700 Ops/s 38.6078 Ops/s $\color{#d91a1a}-0.87\%$
test_values[vec_td1_return_estimate-False-False] 97.1792ms 29.1177ms 34.3433 Ops/s 36.8657 Ops/s $\textbf{\color{#d91a1a}-6.84\%}$
test_values[td_lambda_return_estimate-True-False] 39.4624ms 36.5543ms 27.3565 Ops/s 27.6783 Ops/s $\color{#d91a1a}-1.16\%$
test_values[vec_td_lambda_return_estimate-True-False] 34.2728ms 27.0792ms 36.9287 Ops/s 37.1063 Ops/s $\color{#d91a1a}-0.48\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.2525ms 8.1437ms 122.7946 Ops/s 124.5832 Ops/s $\color{#d91a1a}-1.44\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 10.5656ms 1.9344ms 516.9462 Ops/s 542.0698 Ops/s $\color{#d91a1a}-4.63\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5214ms 0.4322ms 2.3139 KOps/s 2.2801 KOps/s $\color{#35bf28}+1.48\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 47.6006ms 40.4534ms 24.7198 Ops/s 24.3991 Ops/s $\color{#35bf28}+1.31\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 10.5770ms 2.5708ms 388.9790 Ops/s 390.0140 Ops/s $\color{#d91a1a}-0.27\%$
test_dqn_speed 9.9387ms 1.6770ms 596.2907 Ops/s 578.2314 Ops/s $\color{#35bf28}+3.12\%$
test_ddpg_speed 10.8118ms 3.0500ms 327.8639 Ops/s 289.4127 Ops/s $\textbf{\color{#35bf28}+13.29\%}$
test_sac_speed 16.4800ms 8.5519ms 116.9337 Ops/s 114.4677 Ops/s $\color{#35bf28}+2.15\%$
test_redq_speed 24.3172ms 16.5042ms 60.5905 Ops/s 59.2271 Ops/s $\color{#35bf28}+2.30\%$
test_redq_deprec_speed 35.3961ms 14.9413ms 66.9285 Ops/s 59.3933 Ops/s $\textbf{\color{#35bf28}+12.69\%}$
test_td3_speed 9.7218ms 8.7924ms 113.7344 Ops/s 104.5309 Ops/s $\textbf{\color{#35bf28}+8.80\%}$
test_cql_speed 43.4538ms 36.1255ms 27.6813 Ops/s 25.3722 Ops/s $\textbf{\color{#35bf28}+9.10\%}$
test_a2c_speed 15.9539ms 7.9957ms 125.0674 Ops/s 112.9589 Ops/s $\textbf{\color{#35bf28}+10.72\%}$
test_ppo_speed 16.4317ms 8.3071ms 120.3786 Ops/s 108.0167 Ops/s $\textbf{\color{#35bf28}+11.44\%}$
test_reinforce_speed 14.9191ms 7.0443ms 141.9594 Ops/s 130.3705 Ops/s $\textbf{\color{#35bf28}+8.89\%}$
test_iql_speed 39.9873ms 32.4778ms 30.7903 Ops/s 29.8206 Ops/s $\color{#35bf28}+3.25\%$
test_sample_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.4703ms 1.8594ms 537.7989 Ops/s 503.1183 Ops/s $\textbf{\color{#35bf28}+6.89\%}$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.2850ms 1.9732ms 506.7809 Ops/s 469.8956 Ops/s $\textbf{\color{#35bf28}+7.85\%}$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.1230s 2.2306ms 448.3161 Ops/s 468.2426 Ops/s $\color{#d91a1a}-4.26\%$
test_sample_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.3002ms 1.9261ms 519.1858 Ops/s 494.7636 Ops/s $\color{#35bf28}+4.94\%$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 3.1092ms 1.9819ms 504.5649 Ops/s 472.5865 Ops/s $\textbf{\color{#35bf28}+6.77\%}$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 2.9752ms 1.9950ms 501.2628 Ops/s 468.9024 Ops/s $\textbf{\color{#35bf28}+6.90\%}$
test_sample_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 2.1505ms 1.8551ms 539.0618 Ops/s 496.4476 Ops/s $\textbf{\color{#35bf28}+8.58\%}$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.1256s 2.2626ms 441.9652 Ops/s 471.4341 Ops/s $\textbf{\color{#d91a1a}-6.25\%}$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 2.8490ms 2.0059ms 498.5206 Ops/s 472.1447 Ops/s $\textbf{\color{#35bf28}+5.59\%}$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.5096ms 1.8710ms 534.4707 Ops/s 492.8659 Ops/s $\textbf{\color{#35bf28}+8.44\%}$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 4.2355ms 2.0005ms 499.8649 Ops/s 417.7806 Ops/s $\textbf{\color{#35bf28}+19.65\%}$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.1236s 2.2709ms 440.3569 Ops/s 465.9573 Ops/s $\textbf{\color{#d91a1a}-5.49\%}$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.5132ms 1.8557ms 538.8662 Ops/s 495.2967 Ops/s $\textbf{\color{#35bf28}+8.80\%}$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 3.1125ms 1.9938ms 501.5481 Ops/s 469.4124 Ops/s $\textbf{\color{#35bf28}+6.85\%}$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3.2073ms 2.0037ms 499.0801 Ops/s 395.3341 Ops/s $\textbf{\color{#35bf28}+26.24\%}$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 2.1744ms 1.8520ms 539.9480 Ops/s 489.1325 Ops/s $\textbf{\color{#35bf28}+10.39\%}$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.1256s 2.2470ms 445.0297 Ops/s 464.7674 Ops/s $\color{#d91a1a}-4.25\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 3.6229ms 2.0184ms 495.4379 Ops/s 461.0278 Ops/s $\textbf{\color{#35bf28}+7.46\%}$
test_populate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.2371s 21.6611ms 46.1658 Ops/s 39.7333 Ops/s $\textbf{\color{#35bf28}+16.19\%}$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 0.1157s 21.0858ms 47.4253 Ops/s 49.1334 Ops/s $\color{#d91a1a}-3.48\%$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 0.1178s 19.0842ms 52.3993 Ops/s 44.5266 Ops/s $\textbf{\color{#35bf28}+17.68\%}$
test_populate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1155s 21.0711ms 47.4584 Ops/s 49.7535 Ops/s $\color{#d91a1a}-4.61\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 0.1147s 21.0324ms 47.5458 Ops/s 44.3263 Ops/s $\textbf{\color{#35bf28}+7.26\%}$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 0.1145s 19.0630ms 52.4576 Ops/s 49.1923 Ops/s $\textbf{\color{#35bf28}+6.64\%}$
test_populate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1161s 21.2388ms 47.0837 Ops/s 43.5306 Ops/s $\textbf{\color{#35bf28}+8.16\%}$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 0.1178s 19.4094ms 51.5215 Ops/s 48.9973 Ops/s $\textbf{\color{#35bf28}+5.15\%}$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 0.1193s 21.4223ms 46.6803 Ops/s 44.6638 Ops/s $\color{#35bf28}+4.51\%$

@vmoens vmoens marked this pull request as ready for review November 20, 2023 15:49
@vmoens vmoens merged commit c2edf35 into main Nov 20, 2023
55 of 60 checks passed
@vmoens vmoens deleted the fix-gpu-bench branch November 20, 2023 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Has to do with CI setup (e.g. wheels & builds, tests...) CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants