clear replay buffer after trajectory collection #425

sidhantls · 2020-12-03T19:13:54Z

What does this PR do?

clears replay buffer (states, actions, qvals lists) after each training batch.

the num_batch_episodes parameter in reinforce controls the number of episodes to rollout in each training batch. However, since the buffer is never cleared, each training batch has all previous episodes instead of just num_batch_episodes . in consequence even the training time is abnormally large as all trajectories from start are accumulating in the training dataset, instead of just having num_batch_episodes trajectories under the current policy.

The script cited in reinforce_model.py also does clear the replay buffer as in the fix.

Fixes #399

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Is this pull request ready for review? (if not, please submit in draft mode)

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

codecov · 2020-12-03T19:15:58Z

Codecov Report

Merging #425 (2c60d3d) into master (7c2e651) will decrease coverage by 0.09%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master     #425      +/-   ##
==========================================
- Coverage   81.09%   80.99%   -0.10%     
==========================================
  Files         100      100              
  Lines        5722     5725       +3     
==========================================
- Hits         4640     4637       -3     
- Misses       1082     1088       +6

Flag	Coverage Δ
cpu	`25.22% <0.00%> (-0.02%)`	⬇️
pytest	`25.22% <0.00%> (-0.02%)`	⬇️
unittests	`80.36% <0.00%> (-0.10%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pl_bolts/models/rl/reinforce_model.py	`87.90% <0.00%> (-2.18%)`	⬇️
...l_bolts/models/rl/vanilla_policy_gradient_model.py	`91.96% <0.00%> (-2.68%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7c2e651...2c60d3d. Read the comment docs.

Borda

lgtm

akihironitta

@sid-sundrani LGTM. Thank you for your contribution!

@akihironitta

* Add DCGAN module * Undo black on conf.py * Add tests for DCGAN * Fix flake8 and codefactor * Add types and small refactoring * Make image sampler callback work * Upgrade DQN to use .log (#404) * Upgrade DQN to use .log * remove unused * pep8 * fixed other dqn * fix loss test case for batch size variation (#402) * Decouple DataModules from Models - CPCV2 (#386) * Decouple dms from CPCV2 * Update tests * Add docstrings, fix import, and update changelog * Update transforms * bugfix: batch_size parameter for DataModules remaining (#344) * bugfix: batch_size for DataModules remaining * Update sklearn datamodule tests * Fix default_transforms. Keep internal for every data module * fix typo on binary_mnist_datamodule thanks @akihironitta Co-authored-by: Akihiro Nitta <[email protected]> Co-authored-by: Akihiro Nitta <[email protected]> * Fix a typo/copy paste error (#415) * Just a Typo (#413) missing a ' at the end of dataset='stl10 * Remove unused arguments (#418) * tests: Use cached datasets in LitMNIST and the doctests (#414) * Use cached datasets * Use cached datasets in doctests * clear replay buffer after trajectory (#425) * stale: update label * bugfix: Add missing imports to pl_bolts/__init__.py (#430) * Add missing imports * Add missing imports * Apply isort * Fix CIFAR num_samples (#432) * Add static type checker mypy to the tests and pre-commit hooks (#433) * Add mypy check to GitHub Actions * Run mypy on pl_bolts only * Add mypy check to pre-commit * Add an empty line at the end of files * Update mypy config * Update mypy config * Update mypy config * show Co-authored-by: Jirka Borovec <[email protected]> * missing logo * Add type annotations to pl_bolts/__init__.py (#435) * Run mypy on pl_bolts only * Update mypy config * Add type hints to pl_bolts/__init__.py * mypy Co-authored-by: Jirka Borovec <[email protected]> * skip hanging (#437) * Option to normalize latent interpolation images (#438) * add option to normalize latent interpolation images * linspace * update Co-authored-by: ananyahjha93 <[email protected]> * 0.2.6rc1 * Warnings fix (#449) * Revert "Merge pull request #1 from ganprad/warnings_fix" This reverts commit 7c5aaf0. * Fixes warning related np.integer in SklearnDataModule Fixes this warning: ```DeprecationWarning: Converting `np.integer` or `np.signedinteger` to a dtype is deprecated. The current result is `np.dtype(np.int_)` which is not strictly correct. Note that the result depends on the system. To ensure stable results use may want to use `np.int64` or `np.int32```` * Refactor datamodules/datasets (#338) * Remove try: ... except: ... * Fix experience_source * Fix imagenet * Fix kitti * Fix sklearn * Fix vocdetection * Fix typo * Remove duplicate * Fix by flake8 * Add optional packages availability vars * binary_mnist * Use pl_bolts._SKLEARN_AVAILABLE * Apply isort * cifar10 * mnist * cityscapes * fashion mnist * ssl_imagenet * stl10 * cifar10 * dummy * fix city * fix stl10 * fix mnist * ssl_amdim * remove unused DataLoader and fix docs * use from ... import ... * fix pragma: no cover * Fix forward reference in annotations * binmnist * Same order as imports * Move vars from __init__ to utils/__init__ * Remove vars from __init__ * Update vars * Apply isort * update min requirements - PL 1.1.1 (#448) * update min requirements * rc0 * imports * isort * flake8 * 1.1.1 * flake8 * docs * Add missing optional packages to `requirements/*.txt` (#450) * Import matplotlib at the top * Add missing optional packages * Update wandb * Add mypy to requirements * update Isort (#457) * Adding flags to datamodules (#388) * Adding flags to datamodules * Finishing up changes * Fixing syntax error * More syntax errors * More * Adding drop_last flag to sklearn test * Adding drop_last flag to sklearn test * Updating doc for reflect drop_last=False * Adding flags to datamodules * Finishing up changes * Fixing syntax error * More syntax errors * More * Adding drop_last flag to sklearn test * Adding drop_last flag to sklearn test * Updating doc for reflect drop_last=False * Cleaning up parameters and docstring * Fixing syntax error * Fixing documentation * Hardcoding shuffle=False for val and test * Add DCGAN module * Small fixes * Remove DataModules * Update docs * Update docs * Update torchvision import * Import gym as optional package to build docs successfully (#458) * Import gym as optional package * Fix import * Apply isort * bugfix: batch_size parameter for DataModules remaining (#344) * bugfix: batch_size for DataModules remaining * Update sklearn datamodule tests * Fix default_transforms. Keep internal for every data module * fix typo on binary_mnist_datamodule thanks @akihironitta Co-authored-by: Akihiro Nitta <[email protected]> Co-authored-by: Akihiro Nitta <[email protected]> * Option to normalize latent interpolation images (#438) * add option to normalize latent interpolation images * linspace * update Co-authored-by: ananyahjha93 <[email protected]> * update min requirements - PL 1.1.1 (#448) * update min requirements * rc0 * imports * isort * flake8 * 1.1.1 * flake8 * docs * Apply suggestions from code review * Apply suggestions from code review * Add docs * Use LSUN instead of CIFAR10 * Update TensorboardGenerativeModelImageSampler * Update docs with lsun * Update test * Revert TensorboardGenerativeModelImageSampler changes * Remove ModelCheckpoint callback and nrow=5 arg * Apply suggestions from code review * Fix test_dcgan * Apply yapf * Apply suggestions from code review Co-authored-by: Teddy Koker <[email protected]> Co-authored-by: Sidhant Sundrani <[email protected]> Co-authored-by: Akihiro Nitta <[email protected]> Co-authored-by: Héctor Laria <[email protected]> Co-authored-by: Bartol Karuza <[email protected]> Co-authored-by: Happy Sugar Life <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: ananyahjha93 <[email protected]> Co-authored-by: Pradeep Ganesan <[email protected]> Co-authored-by: Brian Ko <[email protected]> Co-authored-by: Christoph Clement <[email protected]>

clear replay buffer after trajectory

2c60d3d

akihironitta added fix fixing issues... model labels Dec 4, 2020

Borda requested review from akihironitta and ananyahjha93 December 5, 2020 00:49

Borda approved these changes Dec 5, 2020

View reviewed changes

akihironitta approved these changes Dec 5, 2020

View reviewed changes

akihironitta merged commit cbeb143 into Lightning-Universe:master Dec 7, 2020

chris-clem pushed a commit to chris-clem/pytorch-lightning-bolts that referenced this pull request Dec 9, 2020

clear replay buffer after trajectory (Lightning-Universe#425)

2ee653a

chris-clem pushed a commit to chris-clem/pytorch-lightning-bolts that referenced this pull request Dec 16, 2020

clear replay buffer after trajectory (Lightning-Universe#425)

427b596

Borda added this to the v0.3 milestone Jan 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clear replay buffer after trajectory collection #425

clear replay buffer after trajectory collection #425

sidhantls commented Dec 3, 2020 •

edited

Loading

codecov bot commented Dec 3, 2020 •

edited

Loading

Borda left a comment

akihironitta left a comment

clear replay buffer after trajectory collection #425

clear replay buffer after trajectory collection #425

Conversation

sidhantls commented Dec 3, 2020 • edited Loading

What does this PR do?

Before submitting

PR review

Did you have fun?

codecov bot commented Dec 3, 2020 • edited Loading

Codecov Report

Borda left a comment

Choose a reason for hiding this comment

akihironitta left a comment

Choose a reason for hiding this comment

sidhantls commented Dec 3, 2020 •

edited

Loading

codecov bot commented Dec 3, 2020 •

edited

Loading