-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
polish(nyz): polish dqn and ppo comments #732
Conversation
""" | ||
# Data preprocessing operations, such as stack data, cpu to cuda device | ||
data = default_preprocess_learn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
数据预处理这里或许可以详细说明,里面做了哪些操作
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
里面的细节注释以后再更新
R2D2 proposes that several tricks should be used to improve upon DRQN, | ||
namely some recurrent experience replay tricks such as burn-in. | ||
R2D2 proposes that several tricks should be used to improve upon DRQN, namely some recurrent experience replay \ | ||
tricks and the burn-in mechanism for off-policy training. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The R2D2 policy class is inspired by the paper "Recurrent Experience Replay in Distributed Reinforcement Learning". R2D2 suggests the incorporation of several enhancements over DRQN, specifically the application of novel recurrent experience replay strategies and the implementation of a burn-in mechanism for off-policy training.
- data (:obj:`List[Dict[str, Any]`): The trajectory data(a list of transition), each element is the same \ | ||
format as the return value of ``self._process_transition`` method. | ||
- transitions (:obj:`List[Dict[str, Any]`): The trajectory data (a list of transition), each element is \ | ||
the same format as the return value of ``self._process_transition`` method. | ||
Returns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trajectory data, which is a list of transitions. Each element is in the same format as the return value of the self._process_transition
method.
And the user can customize the this data processing procecure by overriding this two methods and collector \ | ||
itself. | ||
- samples (:obj:`List[Dict[str, Any]]`): The processed train samples, each element is the similar format \ | ||
as input transitions, but may contain more data for training, such as nstep reward and target obs. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The processed training samples. Each element is similar in format to the input transitions, but may contain additional data for training, such as n-step reward and target observations.
Description
Related Issue
TODO
Check List