[Algorithm] Td3 #684

BY571 · 2022-11-17T15:52:45Z

Description

Adding Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm.
Creating:

Example script
Adding TD3Loss objective class
Adding make_td3_loss helper function
Adding make_td3_actor helper function
Creating TD3Model data class
Adapting the AdditiveGaussianWrapper to accept mean and std for the normal distribution
Adding TD3MlpQNet Model
Tested algorithm for convergence

Performance looks good at the beginning of the training but becomes very unstable at the end. Did try some other parameter settings but could not resolve the instabilities until now.

Motivation and Context

This PR closes an open issue: close #18.

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

…t collection

facebook-github-bot · 2022-11-17T16:56:04Z

Hi @BY571!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

facebook-github-bot · 2022-11-18T13:42:22Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

vmoens · 2022-11-22T15:03:23Z

For the formatting, refer to instructions :)

vmoens

I think you have committed a log file :)
Let me know if there's anything i can do to help -- I can do a review even if the code is still a draft if you'd like

BY571 · 2022-11-24T08:23:01Z

I think you have committed a log file :) Let me know if there's anything i can do to help -- I can do a review even if the code is still a draft if you'd like

Sure, you can review it already! Just trying to fix the last formatting issue.

codecov · 2022-11-24T09:47:07Z

Codecov Report

Merging #684 (5d1b995) into main (37da520) will increase coverage by 3.68%.
The diff coverage is 96.51%.

@@            Coverage Diff             @@
##             main     #684      +/-   ##
==========================================
+ Coverage   85.15%   88.83%   +3.68%     
==========================================
  Files         123      124       +1     
  Lines       21167    21364     +197     
==========================================
+ Hits        18024    18978     +954     
+ Misses       3143     2386     -757

Flag	Coverage Δ
habitat-gpu	`24.74% <20.83%> (-0.02%)`	⬇️
linux-brax	`29.31% <20.83%> (-0.05%)`	⬇️
linux-cpu	`85.32% <95.02%> (?)`
linux-gpu	`86.30% <96.51%> (+61.43%)`	⬆️
linux-jumanji	`30.08% <20.83%> (-0.06%)`	⬇️
linux-outdeps-gpu	`72.47% <95.02%> (+0.21%)`	⬆️
linux-stable-cpu	`85.18% <95.02%> (?)`
linux-stable-gpu	`85.95% <95.02%> (?)`
linux_examples-gpu	`42.59% <25.00%> (-0.11%)`	⬇️
macos-cpu	`85.08% <95.02%> (?)`
olddeps-gpu	`75.49% <20.39%> (-0.54%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
torchrl/objectives/utils.py	`83.44% <ø> (ø)`
torchrl/modules/tensordict_module/exploration.py	`77.77% <83.33%> (+0.31%)`	⬆️
test/test_cost.py	`96.32% <96.89%> (+0.11%)`	⬆️
torchrl/objectives/td3.py	`96.92% <96.92%> (ø)`
torchrl/objectives/__init__.py	`100.00% <100.00%> (ø)`
torchrl/envs/common.py	`83.12% <0.00%> (+0.25%)`	⬆️
torchrl/modules/distributions/continuous.py	`85.65% <0.00%> (+0.39%)`	⬆️
torchrl/data/tensor_specs.py	`84.25% <0.00%> (+1.10%)`	⬆️
examples/dreamer/dreamer_utils.py	`78.53% <0.00%> (+1.69%)`	⬆️
... and 17 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

vmoens

LGTM thanks for this!

BY571 added 9 commits November 17, 2022 10:14

update td3

e8d17fa

solve merge conflicsts and fix next observation vector problem in ini…

75e50c1

…t collection

update default config for new tests

a6c375d

add exploration sigma check

fc8f3cf

simplifiy td3 objective

29a169d

set default td3 config 2step return

e37ed00

fix adapt config

a348460

take off not used imports

ecd0229

take off not used imports

afbe84e

fix typing error

9e3c3ba

vmoens added the new algo New algorithm request or PR label Nov 18, 2022

vmoens changed the title ~~Td3~~ [Algorithm] Td3 Nov 18, 2022

BY571 added 2 commits November 18, 2022 14:06

update td3 model description, td3 config and objective actorloss problem

ee6ef03

fix next_ get_stants_random_rollout bug

31255c8

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 18, 2022

BY571 added 11 commits November 18, 2022 14:51

flake8 fixes

e155dd3

formatting changes

30a677b

fixes flake8

22813f5

update format

604e072

format fix

29ebe01

fix

b70b96b

formatting fix

7239e11

format fixes

608dd53

formatting fix

cdfc04b

test fmt skip

b08136f

take off skip ufmt

9fb91d6

vmoens reviewed Nov 23, 2022

View reviewed changes

remove logging file

101b2e6

formatting fix

dea49c4

BY571 added 12 commits December 2, 2022 10:10

update config order, td error, qloss sum

5690548

merge main into branch

6d60d75

take off td3 helper functions

a01b64b

update td3 objective

4f1c0e0

Merge branch 'main' into td3

fc8fdd3

cpu test verified

dd72e06

Merge branch 'main' into td3

76b1e19

fix flake8

50e3f06

update objectives and test

7839273

update cfg

2eec084

td3_loss_test policy_delay update

f7c315d

take off not needed actorcritic wrapper

5d1b995

BY571 marked this pull request as ready for review January 5, 2023 07:24

vmoens mentioned this pull request Jan 11, 2023

[DO NOT CLOSE] Call for contributions #509

Open

36 tasks

vmoens approved these changes Jan 17, 2023

View reviewed changes

vmoens merged commit a9cbd44 into pytorch:main Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Algorithm] Td3 #684

[Algorithm] Td3 #684

BY571 commented Nov 17, 2022 •

edited

Loading

facebook-github-bot commented Nov 17, 2022

facebook-github-bot commented Nov 18, 2022

vmoens commented Nov 22, 2022

vmoens left a comment

BY571 commented Nov 24, 2022

codecov bot commented Nov 24, 2022 •

edited

Loading

vmoens left a comment

[Algorithm] Td3 #684

[Algorithm] Td3 #684

Conversation

BY571 commented Nov 17, 2022 • edited Loading

Description

Motivation and Context

Types of changes

Checklist

facebook-github-bot commented Nov 17, 2022

Action Required

Process

facebook-github-bot commented Nov 18, 2022

vmoens commented Nov 22, 2022

vmoens left a comment

Choose a reason for hiding this comment

BY571 commented Nov 24, 2022

codecov bot commented Nov 24, 2022 • edited Loading

Codecov Report

vmoens left a comment

Choose a reason for hiding this comment

BY571 commented Nov 17, 2022 •

edited

Loading

codecov bot commented Nov 24, 2022 •

edited

Loading