Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Faster target update using foreach #2046

Merged
merged 2 commits into from
Oct 11, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Mar 28, 2024

No description provided.

Copy link

pytorch-bot bot commented Mar 28, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2046

Note: Links to docs will display an error until the docs builds have been completed.

❌ 22 New Failures, 4 Unrelated Failures

As of commit 3e3a528 with merge base d0e4c04 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 28, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 53.1385ms 52.4446ms 19.0677 Ops/s 18.0239 Ops/s $\textbf{\color{#35bf28}+5.79\%}$
test_sync 40.3967ms 32.1841ms 31.0713 Ops/s 34.3531 Ops/s $\textbf{\color{#d91a1a}-9.55\%}$
test_async 48.9088ms 26.9624ms 37.0887 Ops/s 38.4238 Ops/s $\color{#d91a1a}-3.47\%$
test_simple 0.3922s 0.3425s 2.9198 Ops/s 3.0218 Ops/s $\color{#d91a1a}-3.37\%$
test_transformed 0.5261s 0.4794s 2.0861 Ops/s 2.0968 Ops/s $\color{#d91a1a}-0.51\%$
test_serial 1.2494s 1.2006s 0.8329 Ops/s 0.8351 Ops/s $\color{#d91a1a}-0.25\%$
test_parallel 1.0529s 1.0020s 0.9981 Ops/s 0.9909 Ops/s $\color{#35bf28}+0.73\%$
test_step_mdp_speed[True-True-True-True-True] 0.1395ms 21.4656μs 46.5861 KOps/s 46.4131 KOps/s $\color{#35bf28}+0.37\%$
test_step_mdp_speed[True-True-True-True-False] 32.5700μs 13.0431μs 76.6692 KOps/s 76.1973 KOps/s $\color{#35bf28}+0.62\%$
test_step_mdp_speed[True-True-True-False-True] 40.1850μs 12.5013μs 79.9917 KOps/s 79.0123 KOps/s $\color{#35bf28}+1.24\%$
test_step_mdp_speed[True-True-True-False-False] 26.6700μs 7.7836μs 128.4753 KOps/s 129.0293 KOps/s $\color{#d91a1a}-0.43\%$
test_step_mdp_speed[True-True-False-True-True] 52.0070μs 22.7834μs 43.8915 KOps/s 43.5487 KOps/s $\color{#35bf28}+0.79\%$
test_step_mdp_speed[True-True-False-True-False] 43.4910μs 14.2236μs 70.3059 KOps/s 68.0605 KOps/s $\color{#35bf28}+3.30\%$
test_step_mdp_speed[True-True-False-False-True] 49.9830μs 13.7829μs 72.5536 KOps/s 71.8001 KOps/s $\color{#35bf28}+1.05\%$
test_step_mdp_speed[True-True-False-False-False] 37.1900μs 8.8654μs 112.7985 KOps/s 110.8573 KOps/s $\color{#35bf28}+1.75\%$
test_step_mdp_speed[True-False-True-True-True] 50.3440μs 23.8578μs 41.9150 KOps/s 40.7026 KOps/s $\color{#35bf28}+2.98\%$
test_step_mdp_speed[True-False-True-True-False] 56.7760μs 15.5050μs 64.4954 KOps/s 63.6136 KOps/s $\color{#35bf28}+1.39\%$
test_step_mdp_speed[True-False-True-False-True] 49.8220μs 13.7258μs 72.8554 KOps/s 71.0920 KOps/s $\color{#35bf28}+2.48\%$
test_step_mdp_speed[True-False-True-False-False] 52.6480μs 8.8853μs 112.5453 KOps/s 109.8930 KOps/s $\color{#35bf28}+2.41\%$
test_step_mdp_speed[True-False-False-True-True] 54.6010μs 24.8475μs 40.2455 KOps/s 38.8535 KOps/s $\color{#35bf28}+3.58\%$
test_step_mdp_speed[True-False-False-True-False] 41.4770μs 16.6272μs 60.1424 KOps/s 59.0320 KOps/s $\color{#35bf28}+1.88\%$
test_step_mdp_speed[True-False-False-False-True] 45.6850μs 14.7247μs 67.9130 KOps/s 66.1507 KOps/s $\color{#35bf28}+2.66\%$
test_step_mdp_speed[True-False-False-False-False] 39.3130μs 9.9209μs 100.7973 KOps/s 98.3061 KOps/s $\color{#35bf28}+2.53\%$
test_step_mdp_speed[False-True-True-True-True] 66.6950μs 23.7818μs 42.0489 KOps/s 41.0460 KOps/s $\color{#35bf28}+2.44\%$
test_step_mdp_speed[False-True-True-True-False] 56.8660μs 15.5785μs 64.1911 KOps/s 63.3707 KOps/s $\color{#35bf28}+1.29\%$
test_step_mdp_speed[False-True-True-False-True] 41.3970μs 15.9710μs 62.6137 KOps/s 61.8382 KOps/s $\color{#35bf28}+1.25\%$
test_step_mdp_speed[False-True-True-False-False] 35.5160μs 10.0932μs 99.0768 KOps/s 98.5065 KOps/s $\color{#35bf28}+0.58\%$
test_step_mdp_speed[False-True-False-True-True] 36.9190μs 25.4126μs 39.3506 KOps/s 39.1449 KOps/s $\color{#35bf28}+0.53\%$
test_step_mdp_speed[False-True-False-True-False] 48.8010μs 16.5756μs 60.3296 KOps/s 59.1002 KOps/s $\color{#35bf28}+2.08\%$
test_step_mdp_speed[False-True-False-False-True] 50.8450μs 16.9095μs 59.1384 KOps/s 58.1689 KOps/s $\color{#35bf28}+1.67\%$
test_step_mdp_speed[False-True-False-False-False] 41.0760μs 11.2491μs 88.8963 KOps/s 87.1728 KOps/s $\color{#35bf28}+1.98\%$
test_step_mdp_speed[False-False-True-True-True] 55.3330μs 26.0258μs 38.4233 KOps/s 37.7050 KOps/s $\color{#35bf28}+1.91\%$
test_step_mdp_speed[False-False-True-True-False] 49.2120μs 17.8819μs 55.9225 KOps/s 54.7735 KOps/s $\color{#35bf28}+2.10\%$
test_step_mdp_speed[False-False-True-False-True] 43.7010μs 17.0718μs 58.5761 KOps/s 57.7762 KOps/s $\color{#35bf28}+1.38\%$
test_step_mdp_speed[False-False-True-False-False] 45.8250μs 11.2019μs 89.2703 KOps/s 87.7343 KOps/s $\color{#35bf28}+1.75\%$
test_step_mdp_speed[False-False-False-True-True] 58.8790μs 27.1232μs 36.8688 KOps/s 35.9400 KOps/s $\color{#35bf28}+2.58\%$
test_step_mdp_speed[False-False-False-True-False] 40.0940μs 18.8334μs 53.0973 KOps/s 51.4752 KOps/s $\color{#35bf28}+3.15\%$
test_step_mdp_speed[False-False-False-False-True] 51.9660μs 18.1240μs 55.1756 KOps/s 54.7720 KOps/s $\color{#35bf28}+0.74\%$
test_step_mdp_speed[False-False-False-False-False] 32.9810μs 12.2769μs 81.4537 KOps/s 79.9488 KOps/s $\color{#35bf28}+1.88\%$
test_values[generalized_advantage_estimate-True-True] 10.3263ms 9.1490ms 109.3018 Ops/s 109.9491 Ops/s $\color{#d91a1a}-0.59\%$
test_values[vec_generalized_advantage_estimate-True-True] 36.3825ms 34.9155ms 28.6406 Ops/s 30.0625 Ops/s $\color{#d91a1a}-4.73\%$
test_values[td0_return_estimate-False-False] 0.2253ms 0.1646ms 6.0739 KOps/s 5.5663 KOps/s $\textbf{\color{#35bf28}+9.12\%}$
test_values[td1_return_estimate-False-False] 26.3460ms 22.5867ms 44.2737 Ops/s 44.8227 Ops/s $\color{#d91a1a}-1.22\%$
test_values[vec_td1_return_estimate-False-False] 42.4738ms 35.4829ms 28.1826 Ops/s 30.0103 Ops/s $\textbf{\color{#d91a1a}-6.09\%}$
test_values[td_lambda_return_estimate-True-False] 36.8184ms 32.7850ms 30.5018 Ops/s 31.2057 Ops/s $\color{#d91a1a}-2.26\%$
test_values[vec_td_lambda_return_estimate-True-False] 37.0594ms 35.0481ms 28.5322 Ops/s 30.0398 Ops/s $\textbf{\color{#d91a1a}-5.02\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 10.6050ms 7.9250ms 126.1831 Ops/s 125.0904 Ops/s $\color{#35bf28}+0.87\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.3168ms 1.9322ms 517.5342 Ops/s 489.2996 Ops/s $\textbf{\color{#35bf28}+5.77\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4295ms 0.3508ms 2.8508 KOps/s 2.8974 KOps/s $\color{#d91a1a}-1.61\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 50.9064ms 47.1599ms 21.2045 Ops/s 21.3715 Ops/s $\color{#d91a1a}-0.78\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 3.6362ms 3.0310ms 329.9254 Ops/s 329.8149 Ops/s $\color{#35bf28}+0.03\%$
test_dqn_speed 3.4886ms 1.3398ms 746.3812 Ops/s 737.0429 Ops/s $\color{#35bf28}+1.27\%$
test_ddpg_speed 3.8059ms 2.6851ms 372.4204 Ops/s 369.6439 Ops/s $\color{#35bf28}+0.75\%$
test_sac_speed 8.8151ms 8.1397ms 122.8539 Ops/s 121.6031 Ops/s $\color{#35bf28}+1.03\%$
test_redq_speed 14.8758ms 13.1098ms 76.2791 Ops/s 77.3227 Ops/s $\color{#d91a1a}-1.35\%$
test_redq_deprec_speed 14.7556ms 12.9926ms 76.9670 Ops/s 77.3236 Ops/s $\color{#d91a1a}-0.46\%$
test_td3_speed 8.3950ms 8.0159ms 124.7526 Ops/s 121.9160 Ops/s $\color{#35bf28}+2.33\%$
test_cql_speed 0.1085s 39.0537ms 25.6058 Ops/s 27.6042 Ops/s $\textbf{\color{#d91a1a}-7.24\%}$
test_a2c_speed 9.2476ms 7.3716ms 135.6550 Ops/s 135.2346 Ops/s $\color{#35bf28}+0.31\%$
test_ppo_speed 8.3902ms 7.6286ms 131.0853 Ops/s 130.4328 Ops/s $\color{#35bf28}+0.50\%$
test_reinforce_speed 7.7323ms 6.5298ms 153.1433 Ops/s 152.9155 Ops/s $\color{#35bf28}+0.15\%$
test_iql_speed 33.3654ms 32.3306ms 30.9304 Ops/s 30.6742 Ops/s $\color{#35bf28}+0.84\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.3956ms 2.1509ms 464.9158 Ops/s 460.9318 Ops/s $\color{#35bf28}+0.86\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9277ms 0.4962ms 2.0154 KOps/s 2.0160 KOps/s $\color{#d91a1a}-0.03\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7884ms 0.4726ms 2.1159 KOps/s 2.1182 KOps/s $\color{#d91a1a}-0.11\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 2.5200ms 2.1756ms 459.6352 Ops/s 460.0248 Ops/s $\color{#d91a1a}-0.08\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.9808ms 0.4896ms 2.0425 KOps/s 2.0161 KOps/s $\color{#35bf28}+1.31\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7870ms 0.4670ms 2.1412 KOps/s 2.1234 KOps/s $\color{#35bf28}+0.84\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5284ms 1.2054ms 829.5751 Ops/s 820.3815 Ops/s $\color{#35bf28}+1.12\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6894ms 1.1467ms 872.0757 Ops/s 868.1263 Ops/s $\color{#35bf28}+0.45\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 2.6745ms 2.2550ms 443.4535 Ops/s 422.8924 Ops/s $\color{#35bf28}+4.86\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 91.2298ms 0.6812ms 1.4680 KOps/s 1.5971 KOps/s $\textbf{\color{#d91a1a}-8.09\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8374ms 0.5853ms 1.7085 KOps/s 1.7038 KOps/s $\color{#35bf28}+0.28\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.3708ms 2.1839ms 457.8949 Ops/s 465.4652 Ops/s $\color{#d91a1a}-1.63\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0324ms 0.5013ms 1.9947 KOps/s 2.0047 KOps/s $\color{#d91a1a}-0.49\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.8224ms 0.4740ms 2.1098 KOps/s 2.1140 KOps/s $\color{#d91a1a}-0.20\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 2.4233ms 2.1550ms 464.0368 Ops/s 462.7662 Ops/s $\color{#35bf28}+0.27\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7538ms 0.4920ms 2.0323 KOps/s 2.0000 KOps/s $\color{#35bf28}+1.62\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3.7804ms 0.4700ms 2.1278 KOps/s 2.1235 KOps/s $\color{#35bf28}+0.20\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.5265ms 2.3362ms 428.0411 Ops/s 429.0883 Ops/s $\color{#d91a1a}-0.24\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7159ms 0.6094ms 1.6409 KOps/s 1.6285 KOps/s $\color{#35bf28}+0.76\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8354ms 0.5877ms 1.7015 KOps/s 1.6890 KOps/s $\color{#35bf28}+0.74\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1029s 7.2155ms 138.5898 Ops/s 139.7666 Ops/s $\color{#d91a1a}-0.84\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 14.1108ms 11.8695ms 84.2497 Ops/s 84.0385 Ops/s $\color{#35bf28}+0.25\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.5444ms 1.0465ms 955.5605 Ops/s 958.1894 Ops/s $\color{#d91a1a}-0.27\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 87.9180ms 5.2798ms 189.4018 Ops/s 139.9130 Ops/s $\textbf{\color{#35bf28}+35.37\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 14.1584ms 11.8178ms 84.6180 Ops/s 83.2143 Ops/s $\color{#35bf28}+1.69\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.6354ms 1.0495ms 952.8692 Ops/s 919.0567 Ops/s $\color{#35bf28}+3.68\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 88.6943ms 7.2722ms 137.5093 Ops/s 175.1356 Ops/s $\textbf{\color{#d91a1a}-21.48\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 19.2351ms 12.1697ms 82.1711 Ops/s 81.3940 Ops/s $\color{#35bf28}+0.95\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 4.0607ms 1.4040ms 712.2452 Ops/s 743.3058 Ops/s $\color{#d91a1a}-4.18\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1030s 0.1007s 9.9288 Ops/s 9.4905 Ops/s $\color{#35bf28}+4.62\%$
test_sync 87.0383ms 86.6073ms 11.5464 Ops/s 11.4630 Ops/s $\color{#35bf28}+0.73\%$
test_async 0.1600s 70.7523ms 14.1338 Ops/s 11.8707 Ops/s $\textbf{\color{#35bf28}+19.06\%}$
test_single_pixels 0.1111s 0.1109s 9.0179 Ops/s 9.0697 Ops/s $\color{#d91a1a}-0.57\%$
test_sync_pixels 69.3222ms 67.1603ms 14.8897 Ops/s 15.0631 Ops/s $\color{#d91a1a}-1.15\%$
test_async_pixels 0.1210s 55.2807ms 18.0895 Ops/s 17.9114 Ops/s $\color{#35bf28}+0.99\%$
test_simple 0.6773s 0.6725s 1.4870 Ops/s 1.4563 Ops/s $\color{#35bf28}+2.11\%$
test_transformed 0.9006s 0.8981s 1.1134 Ops/s 1.1068 Ops/s $\color{#35bf28}+0.60\%$
test_serial 2.1663s 2.1018s 0.4758 Ops/s 0.4727 Ops/s $\color{#35bf28}+0.65\%$
test_parallel 1.8632s 1.8070s 0.5534 Ops/s 0.5665 Ops/s $\color{#d91a1a}-2.31\%$
test_step_mdp_speed[True-True-True-True-True] 85.3910μs 32.2800μs 30.9789 KOps/s 29.8103 KOps/s $\color{#35bf28}+3.92\%$
test_step_mdp_speed[True-True-True-True-False] 45.3210μs 19.4870μs 51.3163 KOps/s 49.7331 KOps/s $\color{#35bf28}+3.18\%$
test_step_mdp_speed[True-True-True-False-True] 45.3210μs 18.2513μs 54.7907 KOps/s 52.7310 KOps/s $\color{#35bf28}+3.91\%$
test_step_mdp_speed[True-True-True-False-False] 33.0700μs 11.1204μs 89.9248 KOps/s 87.4629 KOps/s $\color{#35bf28}+2.81\%$
test_step_mdp_speed[True-True-False-True-True] 63.6810μs 33.6745μs 29.6961 KOps/s 29.0401 KOps/s $\color{#35bf28}+2.26\%$
test_step_mdp_speed[True-True-False-True-False] 45.5110μs 21.1834μs 47.2069 KOps/s 46.0117 KOps/s $\color{#35bf28}+2.60\%$
test_step_mdp_speed[True-True-False-False-True] 44.2700μs 19.8874μs 50.2831 KOps/s 48.8114 KOps/s $\color{#35bf28}+3.02\%$
test_step_mdp_speed[True-True-False-False-False] 34.4400μs 12.8213μs 77.9954 KOps/s 75.6891 KOps/s $\color{#35bf28}+3.05\%$
test_step_mdp_speed[True-False-True-True-True] 66.9010μs 35.1778μs 28.4270 KOps/s 27.2569 KOps/s $\color{#35bf28}+4.29\%$
test_step_mdp_speed[True-False-True-True-False] 91.1410μs 22.8034μs 43.8531 KOps/s 42.2882 KOps/s $\color{#35bf28}+3.70\%$
test_step_mdp_speed[True-False-True-False-True] 45.6010μs 19.7461μs 50.6430 KOps/s 48.6166 KOps/s $\color{#35bf28}+4.17\%$
test_step_mdp_speed[True-False-True-False-False] 38.4410μs 12.8296μs 77.9448 KOps/s 74.2185 KOps/s $\textbf{\color{#35bf28}+5.02\%}$
test_step_mdp_speed[True-False-False-True-True] 77.1310μs 36.9915μs 27.0332 KOps/s 25.2880 KOps/s $\textbf{\color{#35bf28}+6.90\%}$
test_step_mdp_speed[True-False-False-True-False] 51.8810μs 24.5540μs 40.7266 KOps/s 39.2290 KOps/s $\color{#35bf28}+3.82\%$
test_step_mdp_speed[True-False-False-False-True] 47.6210μs 21.7599μs 45.9560 KOps/s 44.1437 KOps/s $\color{#35bf28}+4.11\%$
test_step_mdp_speed[True-False-False-False-False] 38.2410μs 14.5888μs 68.5456 KOps/s 65.3699 KOps/s $\color{#35bf28}+4.86\%$
test_step_mdp_speed[False-True-True-True-True] 62.3310μs 35.1476μs 28.4514 KOps/s 26.9673 KOps/s $\textbf{\color{#35bf28}+5.50\%}$
test_step_mdp_speed[False-True-True-True-False] 48.9610μs 22.7862μs 43.8862 KOps/s 41.8486 KOps/s $\color{#35bf28}+4.87\%$
test_step_mdp_speed[False-True-True-False-True] 48.2510μs 23.5973μs 42.3777 KOps/s 40.3334 KOps/s $\textbf{\color{#35bf28}+5.07\%}$
test_step_mdp_speed[False-True-True-False-False] 41.0710μs 14.7128μs 67.9681 KOps/s 66.0462 KOps/s $\color{#35bf28}+2.91\%$
test_step_mdp_speed[False-True-False-True-True] 67.5320μs 37.4546μs 26.6990 KOps/s 25.1049 KOps/s $\textbf{\color{#35bf28}+6.35\%}$
test_step_mdp_speed[False-True-False-True-False] 60.1210μs 24.9450μs 40.0882 KOps/s 39.0789 KOps/s $\color{#35bf28}+2.58\%$
test_step_mdp_speed[False-True-False-False-True] 51.2810μs 25.2282μs 39.6382 KOps/s 38.0535 KOps/s $\color{#35bf28}+4.16\%$
test_step_mdp_speed[False-True-False-False-False] 39.0210μs 16.2989μs 61.3539 KOps/s 58.5539 KOps/s $\color{#35bf28}+4.78\%$
test_step_mdp_speed[False-False-True-True-True] 68.3610μs 38.6111μs 25.8993 KOps/s 24.1870 KOps/s $\textbf{\color{#35bf28}+7.08\%}$
test_step_mdp_speed[False-False-True-True-False] 67.8920μs 26.5484μs 37.6670 KOps/s 36.4607 KOps/s $\color{#35bf28}+3.31\%$
test_step_mdp_speed[False-False-True-False-True] 70.0510μs 25.2451μs 39.6116 KOps/s 37.9913 KOps/s $\color{#35bf28}+4.27\%$
test_step_mdp_speed[False-False-True-False-False] 36.6810μs 16.3241μs 61.2591 KOps/s 58.7573 KOps/s $\color{#35bf28}+4.26\%$
test_step_mdp_speed[False-False-False-True-True] 73.8920μs 40.2557μs 24.8412 KOps/s 23.7630 KOps/s $\color{#35bf28}+4.54\%$
test_step_mdp_speed[False-False-False-True-False] 65.6610μs 28.2059μs 35.4536 KOps/s 34.5068 KOps/s $\color{#35bf28}+2.74\%$
test_step_mdp_speed[False-False-False-False-True] 50.9610μs 26.6940μs 37.4616 KOps/s 36.1311 KOps/s $\color{#35bf28}+3.68\%$
test_step_mdp_speed[False-False-False-False-False] 36.4510μs 17.8561μs 56.0034 KOps/s 53.7435 KOps/s $\color{#35bf28}+4.20\%$
test_values[generalized_advantage_estimate-True-True] 25.0772ms 24.5919ms 40.6637 Ops/s 41.5530 Ops/s $\color{#d91a1a}-2.14\%$
test_values[vec_generalized_advantage_estimate-True-True] 84.8004ms 3.2666ms 306.1318 Ops/s 313.4472 Ops/s $\color{#d91a1a}-2.33\%$
test_values[td0_return_estimate-False-False] 91.0120μs 65.2314μs 15.3300 KOps/s 15.4385 KOps/s $\color{#d91a1a}-0.70\%$
test_values[td1_return_estimate-False-False] 55.1476ms 53.3858ms 18.7316 Ops/s 19.0109 Ops/s $\color{#d91a1a}-1.47\%$
test_values[vec_td1_return_estimate-False-False] 2.1521ms 1.7688ms 565.3634 Ops/s 565.8735 Ops/s $\color{#d91a1a}-0.09\%$
test_values[td_lambda_return_estimate-True-False] 87.3315ms 84.8701ms 11.7827 Ops/s 11.8376 Ops/s $\color{#d91a1a}-0.46\%$
test_values[vec_td_lambda_return_estimate-True-False] 2.1102ms 1.7622ms 567.4802 Ops/s 567.7792 Ops/s $\color{#d91a1a}-0.05\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 24.5796ms 23.3605ms 42.8072 Ops/s 42.5567 Ops/s $\color{#35bf28}+0.59\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.9085ms 0.7222ms 1.3846 KOps/s 1.4151 KOps/s $\color{#d91a1a}-2.15\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7073ms 0.6510ms 1.5360 KOps/s 1.5321 KOps/s $\color{#35bf28}+0.26\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5612ms 1.4546ms 687.4579 Ops/s 686.1073 Ops/s $\color{#35bf28}+0.20\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.9522ms 0.6734ms 1.4851 KOps/s 1.4846 KOps/s $\color{#35bf28}+0.03\%$
test_dqn_speed 1.9113ms 1.4633ms 683.3882 Ops/s 695.4252 Ops/s $\color{#d91a1a}-1.73\%$
test_ddpg_speed 3.0299ms 2.7478ms 363.9334 Ops/s 364.4297 Ops/s $\color{#d91a1a}-0.14\%$
test_sac_speed 8.6969ms 8.1512ms 122.6807 Ops/s 124.5326 Ops/s $\color{#d91a1a}-1.49\%$
test_redq_speed 11.5367ms 10.4302ms 95.8753 Ops/s 96.5625 Ops/s $\color{#d91a1a}-0.71\%$
test_redq_deprec_speed 11.8321ms 11.1544ms 89.6506 Ops/s 90.5105 Ops/s $\color{#d91a1a}-0.95\%$
test_td3_speed 8.2962ms 8.0766ms 123.8141 Ops/s 123.9169 Ops/s $\color{#d91a1a}-0.08\%$
test_cql_speed 26.4065ms 25.6876ms 38.9294 Ops/s 38.9262 Ops/s $+0.01\%$
test_a2c_speed 5.9202ms 5.6677ms 176.4388 Ops/s 177.5600 Ops/s $\color{#d91a1a}-0.63\%$
test_ppo_speed 6.1636ms 5.8997ms 169.4997 Ops/s 167.6345 Ops/s $\color{#35bf28}+1.11\%$
test_reinforce_speed 4.8083ms 4.5837ms 218.1649 Ops/s 221.0318 Ops/s $\color{#d91a1a}-1.30\%$
test_iql_speed 20.3345ms 19.7208ms 50.7079 Ops/s 50.9447 Ops/s $\color{#d91a1a}-0.46\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.0392ms 2.9232ms 342.0954 Ops/s 344.0826 Ops/s $\color{#d91a1a}-0.58\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0222ms 0.5489ms 1.8218 KOps/s 1.8401 KOps/s $\color{#d91a1a}-0.99\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7587ms 0.5287ms 1.8916 KOps/s 1.9147 KOps/s $\color{#d91a1a}-1.21\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.1607ms 2.9653ms 337.2358 Ops/s 341.4032 Ops/s $\color{#d91a1a}-1.22\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.1280ms 0.5420ms 1.8450 KOps/s 1.8627 KOps/s $\color{#d91a1a}-0.95\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7894ms 0.5179ms 1.9309 KOps/s 1.9346 KOps/s $\color{#d91a1a}-0.19\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 4.1193ms 1.5143ms 660.3862 Ops/s 694.6649 Ops/s $\color{#d91a1a}-4.93\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.8956ms 1.4472ms 691.0133 Ops/s 731.1086 Ops/s $\textbf{\color{#d91a1a}-5.48\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.1284ms 3.0548ms 327.3497 Ops/s 328.7827 Ops/s $\color{#d91a1a}-0.44\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7655ms 0.6734ms 1.4849 KOps/s 1.3042 KOps/s $\textbf{\color{#35bf28}+13.85\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8122ms 0.6491ms 1.5406 KOps/s 1.5404 KOps/s $\color{#35bf28}+0.01\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.9895ms 2.9016ms 344.6347 Ops/s 343.2964 Ops/s $\color{#35bf28}+0.39\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7410ms 0.5481ms 1.8246 KOps/s 1.8498 KOps/s $\color{#d91a1a}-1.36\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7254ms 0.5318ms 1.8805 KOps/s 1.9303 KOps/s $\color{#d91a1a}-2.58\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.1494ms 2.9611ms 337.7067 Ops/s 345.3948 Ops/s $\color{#d91a1a}-2.23\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.2195ms 0.5420ms 1.8449 KOps/s 1.8684 KOps/s $\color{#d91a1a}-1.26\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6721ms 0.5197ms 1.9244 KOps/s 1.9551 KOps/s $\color{#d91a1a}-1.57\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.1547ms 3.0532ms 327.5270 Ops/s 328.6835 Ops/s $\color{#d91a1a}-0.35\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9278ms 0.6803ms 1.4699 KOps/s 1.4919 KOps/s $\color{#d91a1a}-1.48\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 4.3968ms 0.6629ms 1.5084 KOps/s 1.5468 KOps/s $\color{#d91a1a}-2.48\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1099s 9.0097ms 110.9916 Ops/s 108.0853 Ops/s $\color{#35bf28}+2.69\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 17.3097ms 14.8365ms 67.4015 Ops/s 67.4491 Ops/s $\color{#d91a1a}-0.07\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.2230ms 1.0932ms 914.7538 Ops/s 784.4209 Ops/s $\textbf{\color{#35bf28}+16.62\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1037s 8.8180ms 113.4042 Ops/s 148.2367 Ops/s $\textbf{\color{#d91a1a}-23.50\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 17.0489ms 14.8590ms 67.2993 Ops/s 67.2262 Ops/s $\color{#35bf28}+0.11\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.3813ms 1.2022ms 831.8146 Ops/s 796.8339 Ops/s $\color{#35bf28}+4.39\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1010s 7.1483ms 139.8942 Ops/s 109.2933 Ops/s $\textbf{\color{#35bf28}+28.00\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 17.1227ms 15.1050ms 66.2031 Ops/s 65.7106 Ops/s $\color{#35bf28}+0.75\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 7.8550ms 1.6598ms 602.4862 Ops/s 678.9314 Ops/s $\textbf{\color{#d91a1a}-11.26\%}$

@vmoens vmoens force-pushed the faster-target-update branch from c94910e to 8aac458 Compare October 11, 2024 15:06
@vmoens vmoens added the performance Performance issue or suggestion for improvement label Oct 11, 2024
@vmoens vmoens merged commit 56cc525 into main Oct 11, 2024
5 of 11 checks passed
@vmoens vmoens deleted the faster-target-update branch October 17, 2024 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. performance Performance issue or suggestion for improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants