Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Minor improvements to step_and_maybe_reset in batched envs #1807

Merged
merged 3 commits into from
Jan 16, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 16, 2024

No description provided.

Copy link

pytorch-bot bot commented Jan 16, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1807

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 1 Pending, 3 Unrelated Failures

As of commit 6ba3c85 with merge base 3d7e49c (image):

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 16, 2024
Copy link

github-actions bot commented Jan 16, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 89. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 63.9152ms 62.9908ms 15.8753 Ops/s 15.1212 Ops/s $\color{#35bf28}+4.99\%$
test_sync 56.0164ms 39.3707ms 25.3996 Ops/s 27.4694 Ops/s $\textbf{\color{#d91a1a}-7.53\%}$
test_async 98.1210ms 34.1165ms 29.3113 Ops/s 29.3916 Ops/s $\color{#d91a1a}-0.27\%$
test_simple 0.4970s 0.4377s 2.2846 Ops/s 2.2411 Ops/s $\color{#35bf28}+1.94\%$
test_transformed 0.6633s 0.6152s 1.6254 Ops/s 1.6179 Ops/s $\color{#35bf28}+0.46\%$
test_serial 1.4135s 1.3485s 0.7416 Ops/s 0.7063 Ops/s $\color{#35bf28}+5.00\%$
test_parallel 1.3381s 1.2724s 0.7859 Ops/s 0.7586 Ops/s $\color{#35bf28}+3.60\%$
test_step_mdp_speed[True-True-True-True-True] 0.1718ms 22.0895μs 45.2704 KOps/s 45.3579 KOps/s $\color{#d91a1a}-0.19\%$
test_step_mdp_speed[True-True-True-True-False] 40.4160μs 13.4504μs 74.3470 KOps/s 75.5643 KOps/s $\color{#d91a1a}-1.61\%$
test_step_mdp_speed[True-True-True-False-True] 33.4530μs 13.0518μs 76.6180 KOps/s 77.0190 KOps/s $\color{#d91a1a}-0.52\%$
test_step_mdp_speed[True-True-True-False-False] 32.6110μs 8.0565μs 124.1230 KOps/s 128.4810 KOps/s $\color{#d91a1a}-3.39\%$
test_step_mdp_speed[True-True-False-True-True] 54.1420μs 23.5295μs 42.4998 KOps/s 42.2180 KOps/s $\color{#35bf28}+0.67\%$
test_step_mdp_speed[True-True-False-True-False] 48.1910μs 14.8405μs 67.3831 KOps/s 67.3093 KOps/s $\color{#35bf28}+0.11\%$
test_step_mdp_speed[True-True-False-False-True] 32.4510μs 14.4961μs 68.9841 KOps/s 69.5636 KOps/s $\color{#d91a1a}-0.83\%$
test_step_mdp_speed[True-True-False-False-False] 33.2730μs 9.3269μs 107.2162 KOps/s 109.2014 KOps/s $\color{#d91a1a}-1.82\%$
test_step_mdp_speed[True-False-True-True-True] 47.8300μs 25.0976μs 39.8444 KOps/s 40.0175 KOps/s $\color{#d91a1a}-0.43\%$
test_step_mdp_speed[True-False-True-True-False] 70.8130μs 16.1198μs 62.0355 KOps/s 61.8902 KOps/s $\color{#35bf28}+0.23\%$
test_step_mdp_speed[True-False-True-False-True] 41.3180μs 14.3269μs 69.7986 KOps/s 69.0176 KOps/s $\color{#35bf28}+1.13\%$
test_step_mdp_speed[True-False-True-False-False] 23.1040μs 9.2714μs 107.8591 KOps/s 109.3479 KOps/s $\color{#d91a1a}-1.36\%$
test_step_mdp_speed[True-False-False-True-True] 65.4120μs 26.3230μs 37.9896 KOps/s 38.8597 KOps/s $\color{#d91a1a}-2.24\%$
test_step_mdp_speed[True-False-False-True-False] 50.9650μs 17.3304μs 57.7020 KOps/s 58.4371 KOps/s $\color{#d91a1a}-1.26\%$
test_step_mdp_speed[True-False-False-False-True] 47.9900μs 15.4331μs 64.7957 KOps/s 64.3543 KOps/s $\color{#35bf28}+0.69\%$
test_step_mdp_speed[True-False-False-False-False] 45.3450μs 10.5179μs 95.0758 KOps/s 96.9731 KOps/s $\color{#d91a1a}-1.96\%$
test_step_mdp_speed[False-True-True-True-True] 0.1169ms 24.9517μs 40.0774 KOps/s 37.8611 KOps/s $\textbf{\color{#35bf28}+5.85\%}$
test_step_mdp_speed[False-True-True-True-False] 66.1530μs 16.1758μs 61.8208 KOps/s 59.3138 KOps/s $\color{#35bf28}+4.23\%$
test_step_mdp_speed[False-True-True-False-True] 41.2270μs 16.7887μs 59.5640 KOps/s 60.4413 KOps/s $\color{#d91a1a}-1.45\%$
test_step_mdp_speed[False-True-True-False-False] 39.8850μs 10.5665μs 94.6391 KOps/s 96.3671 KOps/s $\color{#d91a1a}-1.79\%$
test_step_mdp_speed[False-True-False-True-True] 69.7100μs 25.9269μs 38.5699 KOps/s 38.0792 KOps/s $\color{#35bf28}+1.29\%$
test_step_mdp_speed[False-True-False-True-False] 49.0820μs 17.3653μs 57.5863 KOps/s 58.1388 KOps/s $\color{#d91a1a}-0.95\%$
test_step_mdp_speed[False-True-False-False-True] 41.1770μs 17.9177μs 55.8106 KOps/s 56.0061 KOps/s $\color{#d91a1a}-0.35\%$
test_step_mdp_speed[False-True-False-False-False] 33.4720μs 11.6743μs 85.6579 KOps/s 85.7198 KOps/s $\color{#d91a1a}-0.07\%$
test_step_mdp_speed[False-False-True-True-True] 59.8020μs 27.5163μs 36.3421 KOps/s 36.1080 KOps/s $\color{#35bf28}+0.65\%$
test_step_mdp_speed[False-False-True-True-False] 62.7680μs 18.7389μs 53.3650 KOps/s 53.0926 KOps/s $\color{#35bf28}+0.51\%$
test_step_mdp_speed[False-False-True-False-True] 56.7760μs 17.8932μs 55.8873 KOps/s 54.9082 KOps/s $\color{#35bf28}+1.78\%$
test_step_mdp_speed[False-False-True-False-False] 29.2850μs 11.8050μs 84.7102 KOps/s 84.7530 KOps/s $\color{#d91a1a}-0.05\%$
test_step_mdp_speed[False-False-False-True-True] 0.1056ms 28.5154μs 35.0688 KOps/s 35.0370 KOps/s $\color{#35bf28}+0.09\%$
test_step_mdp_speed[False-False-False-True-False] 54.4210μs 19.9268μs 50.1837 KOps/s 50.6287 KOps/s $\color{#d91a1a}-0.88\%$
test_step_mdp_speed[False-False-False-False-True] 49.8140μs 18.8144μs 53.1508 KOps/s 52.4778 KOps/s $\color{#35bf28}+1.28\%$
test_step_mdp_speed[False-False-False-False-False] 39.6940μs 12.7823μs 78.2332 KOps/s 77.9398 KOps/s $\color{#35bf28}+0.38\%$
test_values[generalized_advantage_estimate-True-True] 12.5204ms 11.8820ms 84.1607 Ops/s 82.4419 Ops/s $\color{#35bf28}+2.08\%$
test_values[vec_generalized_advantage_estimate-True-True] 35.0684ms 27.0910ms 36.9126 Ops/s 36.6057 Ops/s $\color{#35bf28}+0.84\%$
test_values[td0_return_estimate-False-False] 0.2432ms 0.1805ms 5.5391 KOps/s 5.7048 KOps/s $\color{#d91a1a}-2.91\%$
test_values[td1_return_estimate-False-False] 25.1403ms 24.9127ms 40.1402 Ops/s 38.5856 Ops/s $\color{#35bf28}+4.03\%$
test_values[vec_td1_return_estimate-False-False] 35.1975ms 27.4797ms 36.3906 Ops/s 32.3297 Ops/s $\textbf{\color{#35bf28}+12.56\%}$
test_values[td_lambda_return_estimate-True-False] 35.3985ms 34.9812ms 28.5868 Ops/s 27.7086 Ops/s $\color{#35bf28}+3.17\%$
test_values[vec_td_lambda_return_estimate-True-False] 36.9679ms 27.4770ms 36.3940 Ops/s 36.5363 Ops/s $\color{#d91a1a}-0.39\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.0873ms 7.8679ms 127.0990 Ops/s 121.8558 Ops/s $\color{#35bf28}+4.30\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.3119ms 1.9199ms 520.8637 Ops/s 514.6368 Ops/s $\color{#35bf28}+1.21\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 9.5685ms 0.4283ms 2.3346 KOps/s 2.3264 KOps/s $\color{#35bf28}+0.35\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 48.0291ms 39.1320ms 25.5546 Ops/s 26.0190 Ops/s $\color{#d91a1a}-1.78\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 11.3014ms 2.6328ms 379.8217 Ops/s 378.6601 Ops/s $\color{#35bf28}+0.31\%$
test_dqn_speed 18.0578ms 7.7095ms 129.7105 Ops/s 118.4531 Ops/s $\textbf{\color{#35bf28}+9.50\%}$
test_ddpg_speed 22.5488ms 14.5848ms 68.5647 Ops/s 65.5957 Ops/s $\color{#35bf28}+4.53\%$
test_sac_speed 35.9385ms 29.4172ms 33.9937 Ops/s 33.2678 Ops/s $\color{#35bf28}+2.18\%$
test_redq_speed 50.3302ms 46.9836ms 21.2840 Ops/s 21.5700 Ops/s $\color{#d91a1a}-1.33\%$
test_redq_deprec_speed 27.2073ms 25.6884ms 38.9280 Ops/s 38.6662 Ops/s $\color{#35bf28}+0.68\%$
test_td3_speed 30.4294ms 20.4464ms 48.9083 Ops/s 48.2521 Ops/s $\color{#35bf28}+1.36\%$
test_cql_speed 91.6567ms 87.8187ms 11.3871 Ops/s 11.2436 Ops/s $\color{#35bf28}+1.28\%$
test_a2c_speed 29.4090ms 27.0472ms 36.9724 Ops/s 36.5002 Ops/s $\color{#35bf28}+1.29\%$
test_ppo_speed 36.9635ms 27.9684ms 35.7546 Ops/s 36.3943 Ops/s $\color{#d91a1a}-1.76\%$
test_reinforce_speed 31.4043ms 26.3275ms 37.9830 Ops/s 37.9172 Ops/s $\color{#35bf28}+0.17\%$
test_iql_speed 69.5717ms 64.2593ms 15.5619 Ops/s 15.6205 Ops/s $\color{#d91a1a}-0.38\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 1.8549ms 1.5114ms 661.6343 Ops/s 721.3631 Ops/s $\textbf{\color{#d91a1a}-8.28\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 8.8265ms 0.5270ms 1.8976 KOps/s 1.8778 KOps/s $\color{#35bf28}+1.06\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 8.7894ms 0.5154ms 1.9403 KOps/s 1.9460 KOps/s $\color{#d91a1a}-0.29\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 2.0237ms 1.4651ms 682.5420 Ops/s 738.5651 Ops/s $\textbf{\color{#d91a1a}-7.59\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 8.8731ms 0.5204ms 1.9218 KOps/s 1.9237 KOps/s $\color{#d91a1a}-0.10\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 2.7266ms 0.5012ms 1.9952 KOps/s 2.0118 KOps/s $\color{#d91a1a}-0.82\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 2.4686ms 1.6809ms 594.9208 Ops/s 637.3991 Ops/s $\textbf{\color{#d91a1a}-6.66\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8764ms 0.6520ms 1.5336 KOps/s 1.5398 KOps/s $\color{#d91a1a}-0.40\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 8.9043ms 0.6465ms 1.5469 KOps/s 1.5521 KOps/s $\color{#d91a1a}-0.34\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.1982ms 1.5029ms 665.3882 Ops/s 709.1735 Ops/s $\textbf{\color{#d91a1a}-6.17\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6252ms 0.5174ms 1.9327 KOps/s 1.8846 KOps/s $\color{#35bf28}+2.55\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7115ms 0.5049ms 1.9805 KOps/s 1.9651 KOps/s $\color{#35bf28}+0.78\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 1.6334ms 1.4656ms 682.3220 Ops/s 725.1670 Ops/s $\textbf{\color{#d91a1a}-5.91\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6261ms 0.5150ms 1.9419 KOps/s 1.8873 KOps/s $\color{#35bf28}+2.90\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 8.7501ms 0.5141ms 1.9453 KOps/s 2.0021 KOps/s $\color{#d91a1a}-2.84\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 2.6257ms 1.7371ms 575.6663 Ops/s 632.2609 Ops/s $\textbf{\color{#d91a1a}-8.95\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8530ms 0.6574ms 1.5211 KOps/s 1.5116 KOps/s $\color{#35bf28}+0.62\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 11.1851ms 0.6652ms 1.5034 KOps/s 1.5633 KOps/s $\color{#d91a1a}-3.83\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1296s 20.2419ms 49.4024 Ops/s 58.7754 Ops/s $\textbf{\color{#d91a1a}-15.95\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 19.1477ms 13.7929ms 72.5009 Ops/s 73.5584 Ops/s $\color{#d91a1a}-1.44\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 15.9930ms 3.5258ms 283.6209 Ops/s 305.3620 Ops/s $\textbf{\color{#d91a1a}-7.12\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1165s 17.7246ms 56.4188 Ops/s 58.4469 Ops/s $\color{#d91a1a}-3.47\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 15.9580ms 13.7385ms 72.7880 Ops/s 73.4186 Ops/s $\color{#d91a1a}-0.86\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 4.5733ms 3.2594ms 306.8091 Ops/s 305.3517 Ops/s $\color{#35bf28}+0.48\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1224s 18.2215ms 54.8803 Ops/s 58.1224 Ops/s $\textbf{\color{#d91a1a}-5.58\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 16.1390ms 14.0083ms 71.3862 Ops/s 72.1933 Ops/s $\color{#d91a1a}-1.12\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 6.1468ms 3.5538ms 281.3891 Ops/s 271.4022 Ops/s $\color{#35bf28}+3.68\%$

Copy link

github-actions bot commented Jan 16, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 92. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}1$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1151s 0.1144s 8.7391 Ops/s 8.5162 Ops/s $\color{#35bf28}+2.62\%$
test_sync 0.1733s 0.1034s 9.6675 Ops/s 9.7200 Ops/s $\color{#d91a1a}-0.54\%$
test_async 0.2445s 91.4248ms 10.9380 Ops/s 10.9213 Ops/s $\color{#35bf28}+0.15\%$
test_single_pixels 0.1402s 0.1395s 7.1682 Ops/s 7.1229 Ops/s $\color{#35bf28}+0.64\%$
test_sync_pixels 82.1640ms 76.6023ms 13.0544 Ops/s 12.7278 Ops/s $\color{#35bf28}+2.57\%$
test_async_pixels 0.1481s 72.8528ms 13.7263 Ops/s 13.3416 Ops/s $\color{#35bf28}+2.88\%$
test_simple 0.8940s 0.8313s 1.2030 Ops/s 1.2172 Ops/s $\color{#d91a1a}-1.17\%$
test_transformed 1.1321s 1.0658s 0.9383 Ops/s 0.9277 Ops/s $\color{#35bf28}+1.14\%$
test_serial 2.3827s 2.3247s 0.4302 Ops/s 0.4264 Ops/s $\color{#35bf28}+0.87\%$
test_parallel 1.9473s 1.8568s 0.5385 Ops/s 0.5199 Ops/s $\color{#35bf28}+3.59\%$
test_step_mdp_speed[True-True-True-True-True] 93.4720μs 33.2630μs 30.0634 KOps/s 29.7825 KOps/s $\color{#35bf28}+0.94\%$
test_step_mdp_speed[True-True-True-True-False] 40.2510μs 19.6253μs 50.9547 KOps/s 51.0734 KOps/s $\color{#d91a1a}-0.23\%$
test_step_mdp_speed[True-True-True-False-True] 38.4610μs 18.7720μs 53.2709 KOps/s 52.5865 KOps/s $\color{#35bf28}+1.30\%$
test_step_mdp_speed[True-True-True-False-False] 36.8710μs 11.2430μs 88.9444 KOps/s 87.9172 KOps/s $\color{#35bf28}+1.17\%$
test_step_mdp_speed[True-True-False-True-True] 57.1510μs 34.8608μs 28.6855 KOps/s 28.1700 KOps/s $\color{#35bf28}+1.83\%$
test_step_mdp_speed[True-True-False-True-False] 65.5310μs 21.4267μs 46.6707 KOps/s 46.0765 KOps/s $\color{#35bf28}+1.29\%$
test_step_mdp_speed[True-True-False-False-True] 47.6710μs 20.5737μs 48.6058 KOps/s 48.5042 KOps/s $\color{#35bf28}+0.21\%$
test_step_mdp_speed[True-True-False-False-False] 28.6910μs 13.2058μs 75.7242 KOps/s 74.9475 KOps/s $\color{#35bf28}+1.04\%$
test_step_mdp_speed[True-False-True-True-True] 61.2110μs 37.0624μs 26.9815 KOps/s 26.6039 KOps/s $\color{#35bf28}+1.42\%$
test_step_mdp_speed[True-False-True-True-False] 49.8610μs 23.2754μs 42.9639 KOps/s 41.9130 KOps/s $\color{#35bf28}+2.51\%$
test_step_mdp_speed[True-False-True-False-True] 0.1066ms 21.0004μs 47.6181 KOps/s 47.7207 KOps/s $\color{#d91a1a}-0.22\%$
test_step_mdp_speed[True-False-True-False-False] 66.6410μs 13.1007μs 76.3317 KOps/s 75.1159 KOps/s $\color{#35bf28}+1.62\%$
test_step_mdp_speed[True-False-False-True-True] 66.8910μs 38.9133μs 25.6982 KOps/s 25.6088 KOps/s $\color{#35bf28}+0.35\%$
test_step_mdp_speed[True-False-False-True-False] 44.2710μs 25.6414μs 38.9994 KOps/s 39.1824 KOps/s $\color{#d91a1a}-0.47\%$
test_step_mdp_speed[True-False-False-False-True] 48.1010μs 22.6043μs 44.2394 KOps/s 43.7317 KOps/s $\color{#35bf28}+1.16\%$
test_step_mdp_speed[True-False-False-False-False] 30.0600μs 15.0097μs 66.6236 KOps/s 66.0830 KOps/s $\color{#35bf28}+0.82\%$
test_step_mdp_speed[False-True-True-True-True] 62.3810μs 37.3265μs 26.7906 KOps/s 26.4747 KOps/s $\color{#35bf28}+1.19\%$
test_step_mdp_speed[False-True-True-True-False] 43.4310μs 23.7683μs 42.0729 KOps/s 42.2681 KOps/s $\color{#d91a1a}-0.46\%$
test_step_mdp_speed[False-True-True-False-True] 43.7610μs 25.8589μs 38.6714 KOps/s 39.2771 KOps/s $\color{#d91a1a}-1.54\%$
test_step_mdp_speed[False-True-True-False-False] 33.5500μs 14.9223μs 67.0136 KOps/s 66.1309 KOps/s $\color{#35bf28}+1.33\%$
test_step_mdp_speed[False-True-False-True-True] 79.1520μs 38.7307μs 25.8193 KOps/s 25.5240 KOps/s $\color{#35bf28}+1.16\%$
test_step_mdp_speed[False-True-False-True-False] 42.5610μs 25.5921μs 39.0746 KOps/s 39.1604 KOps/s $\color{#d91a1a}-0.22\%$
test_step_mdp_speed[False-True-False-False-True] 44.7410μs 27.0875μs 36.9174 KOps/s 36.2890 KOps/s $\color{#35bf28}+1.73\%$
test_step_mdp_speed[False-True-False-False-False] 58.6810μs 16.9445μs 59.0163 KOps/s 59.0464 KOps/s $\color{#d91a1a}-0.05\%$
test_step_mdp_speed[False-False-True-True-True] 67.5420μs 40.6907μs 24.5756 KOps/s 24.3037 KOps/s $\color{#35bf28}+1.12\%$
test_step_mdp_speed[False-False-True-True-False] 53.7910μs 27.3204μs 36.6027 KOps/s 36.1561 KOps/s $\color{#35bf28}+1.24\%$
test_step_mdp_speed[False-False-True-False-True] 45.8210μs 27.2826μs 36.6533 KOps/s 36.9297 KOps/s $\color{#d91a1a}-0.75\%$
test_step_mdp_speed[False-False-True-False-False] 33.6290μs 16.8690μs 59.2804 KOps/s 58.9068 KOps/s $\color{#35bf28}+0.63\%$
test_step_mdp_speed[False-False-False-True-True] 68.5110μs 42.9859μs 23.2634 KOps/s 23.5327 KOps/s $\color{#d91a1a}-1.14\%$
test_step_mdp_speed[False-False-False-True-False] 48.6300μs 29.4435μs 33.9633 KOps/s 34.1988 KOps/s $\color{#d91a1a}-0.69\%$
test_step_mdp_speed[False-False-False-False-True] 57.1220μs 28.2469μs 35.4021 KOps/s 35.3453 KOps/s $\color{#35bf28}+0.16\%$
test_step_mdp_speed[False-False-False-False-False] 42.3810μs 18.6438μs 53.6370 KOps/s 53.2952 KOps/s $\color{#35bf28}+0.64\%$
test_values[generalized_advantage_estimate-True-True] 23.8751ms 23.3359ms 42.8525 Ops/s 40.1834 Ops/s $\textbf{\color{#35bf28}+6.64\%}$
test_values[vec_generalized_advantage_estimate-True-True] 88.4258ms 3.3142ms 301.7295 Ops/s 301.3235 Ops/s $\color{#35bf28}+0.13\%$
test_values[td0_return_estimate-False-False] 91.6610μs 60.6476μs 16.4887 KOps/s 16.3041 KOps/s $\color{#35bf28}+1.13\%$
test_values[td1_return_estimate-False-False] 50.6963ms 50.0407ms 19.9837 Ops/s 18.4629 Ops/s $\textbf{\color{#35bf28}+8.24\%}$
test_values[vec_td1_return_estimate-False-False] 1.9780ms 1.7374ms 575.5814 Ops/s 565.1371 Ops/s $\color{#35bf28}+1.85\%$
test_values[td_lambda_return_estimate-True-False] 82.8892ms 80.7322ms 12.3866 Ops/s 12.2495 Ops/s $\color{#35bf28}+1.12\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.9918ms 1.7277ms 578.7946 Ops/s 574.0583 Ops/s $\color{#35bf28}+0.83\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 22.3012ms 21.6864ms 46.1118 Ops/s 43.2948 Ops/s $\textbf{\color{#35bf28}+6.51\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.8351ms 0.6842ms 1.4616 KOps/s 1.3613 KOps/s $\textbf{\color{#35bf28}+7.37\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7195ms 0.6326ms 1.5808 KOps/s 1.5704 KOps/s $\color{#35bf28}+0.66\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.4937ms 1.4405ms 694.2207 Ops/s 688.2112 Ops/s $\color{#35bf28}+0.87\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.9057ms 0.6528ms 1.5318 KOps/s 1.4357 KOps/s $\textbf{\color{#35bf28}+6.69\%}$
test_dqn_speed 14.0265ms 7.3021ms 136.9468 Ops/s 132.7510 Ops/s $\color{#35bf28}+3.16\%$
test_ddpg_speed 14.9856ms 14.2109ms 70.3685 Ops/s 67.7270 Ops/s $\color{#35bf28}+3.90\%$
test_sac_speed 29.2192ms 28.4786ms 35.1140 Ops/s 30.9936 Ops/s $\textbf{\color{#35bf28}+13.29\%}$
test_redq_speed 48.5343ms 47.6494ms 20.9866 Ops/s 20.2519 Ops/s $\color{#35bf28}+3.63\%$
test_redq_deprec_speed 24.7433ms 23.5788ms 42.4111 Ops/s 41.0181 Ops/s $\color{#35bf28}+3.40\%$
test_td3_speed 28.6351ms 19.2991ms 51.8160 Ops/s 49.9684 Ops/s $\color{#35bf28}+3.70\%$
test_cql_speed 82.2356ms 81.3049ms 12.2994 Ops/s 11.7831 Ops/s $\color{#35bf28}+4.38\%$
test_a2c_speed 27.4745ms 26.1570ms 38.2307 Ops/s 36.9817 Ops/s $\color{#35bf28}+3.38\%$
test_ppo_speed 27.7873ms 26.5121ms 37.7186 Ops/s 36.5922 Ops/s $\color{#35bf28}+3.08\%$
test_reinforce_speed 26.2688ms 25.4240ms 39.3329 Ops/s 38.0649 Ops/s $\color{#35bf28}+3.33\%$
test_iql_speed 57.5249ms 56.3739ms 17.7387 Ops/s 17.2191 Ops/s $\color{#35bf28}+3.02\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.1745ms 1.8105ms 552.3207 Ops/s 541.1532 Ops/s $\color{#35bf28}+2.06\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9473ms 0.8312ms 1.2031 KOps/s 1.1887 KOps/s $\color{#35bf28}+1.21\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 1.0066ms 0.8206ms 1.2186 KOps/s 1.2066 KOps/s $\color{#35bf28}+0.99\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 2.3847ms 1.7898ms 558.7282 Ops/s 553.5186 Ops/s $\color{#35bf28}+0.94\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0434ms 0.8218ms 1.2168 KOps/s 1.2062 KOps/s $\color{#35bf28}+0.88\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.9443ms 0.8105ms 1.2338 KOps/s 1.2221 KOps/s $\color{#35bf28}+0.96\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 2.9518ms 2.0693ms 483.2620 Ops/s 476.3162 Ops/s $\color{#35bf28}+1.46\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1457ms 0.9472ms 1.0558 KOps/s 1.0418 KOps/s $\color{#35bf28}+1.35\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1.1077ms 0.9398ms 1.0640 KOps/s 1.0509 KOps/s $\color{#35bf28}+1.25\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.1919ms 1.8136ms 551.3834 Ops/s 542.6766 Ops/s $\color{#35bf28}+1.60\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9794ms 0.8324ms 1.2014 KOps/s 1.1892 KOps/s $\color{#35bf28}+1.02\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.1143s 0.9476ms 1.0553 KOps/s 1.2028 KOps/s $\textbf{\color{#d91a1a}-12.27\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 2.4233ms 1.7923ms 557.9456 Ops/s 545.2023 Ops/s $\color{#35bf28}+2.34\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0318ms 0.8225ms 1.2158 KOps/s 1.2044 KOps/s $\color{#35bf28}+0.95\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.9540ms 0.8133ms 1.2295 KOps/s 1.2191 KOps/s $\color{#35bf28}+0.86\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.0190ms 2.0768ms 481.4999 Ops/s 478.7630 Ops/s $\color{#35bf28}+0.57\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1459ms 0.9508ms 1.0517 KOps/s 1.0390 KOps/s $\color{#35bf28}+1.23\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1.1302ms 0.9438ms 1.0596 KOps/s 1.0492 KOps/s $\color{#35bf28}+1.00\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1217s 17.9140ms 55.8223 Ops/s 53.0909 Ops/s $\textbf{\color{#35bf28}+5.14\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 18.7014ms 13.6323ms 73.3553 Ops/s 70.7254 Ops/s $\color{#35bf28}+3.72\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 5.6979ms 3.2492ms 307.7667 Ops/s 296.1958 Ops/s $\color{#35bf28}+3.91\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1229s 17.9769ms 55.6270 Ops/s 54.3223 Ops/s $\color{#35bf28}+2.40\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 16.0059ms 13.5705ms 73.6892 Ops/s 70.8529 Ops/s $\color{#35bf28}+4.00\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 5.7493ms 3.2553ms 307.1939 Ops/s 298.2148 Ops/s $\color{#35bf28}+3.01\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1244s 18.1903ms 54.9742 Ops/s 53.5936 Ops/s $\color{#35bf28}+2.58\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 15.9455ms 13.7461ms 72.7480 Ops/s 69.7971 Ops/s $\color{#35bf28}+4.23\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 5.6752ms 3.4332ms 291.2707 Ops/s 281.5518 Ops/s $\color{#35bf28}+3.45\%$

@vmoens vmoens marked this pull request as draft January 16, 2024 11:00
@vmoens vmoens changed the title [Performance, WIP] faster step_mdp [Performance, WIP] Minor improvements to step_and_maybe_reset in batched envs Jan 16, 2024
@vmoens vmoens added the performance Performance issue or suggestion for improvement label Jan 16, 2024
@vmoens vmoens marked this pull request as ready for review January 16, 2024 14:51
@vmoens vmoens changed the title [Performance, WIP] Minor improvements to step_and_maybe_reset in batched envs [Performance] Minor improvements to step_and_maybe_reset in batched envs Jan 16, 2024
@vmoens vmoens merged commit c11713a into main Jan 16, 2024
60 of 63 checks passed
@vmoens vmoens deleted the faster-stepmdp branch January 16, 2024 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. performance Performance issue or suggestion for improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants