Skip to content

Latest commit

 

History

History
86 lines (48 loc) · 2.12 KB

README.md

File metadata and controls

86 lines (48 loc) · 2.12 KB

2D_guide_env

An easy 2D guide environment for RL.

Background

The 2D coordinate of agent is (x, y), and θ=tan(y/x) shows the included angle between travel direction and positive x-axis of agent.

Target : Guide the agent to a designated point.

Action : { velocity | palstance }

State : { agent position | target postion | relative position }

Algorithm

Continuous-PPO with tricks shown in this work.

Tricks List

Trick 1—Advantage Normalization.

Trick 2—State Normalization.

Trick 4—— Reward Reward Scaling.

Trick 5—Policy Entropy.

Trick 6—Learning Rate Decay.

Trick 7—Gradient clip.

Trick 8—Orthogonal Initialization.

Trick 9—Adam Optimizer Epsilon Parameter.

Trick10—Tanh Activation Function.

Tips

An beta distribution must be used instead of Gaussian one for avoiding agent sample to much at the edge of the action space.

An action distribution sample on Beta distribution be like:

Effect of Reward Function

Two Rewards are used in this session: R = T + α * F

  1. F = -0.01 * (now distance - preious distance)
  2. T = 100(Terminal); -2(Out of Space); -1(Max Episode)

Terminal Reward effects the agents

10 times of evaluation have done for each 5e3 steps, shows the differences:

Terminal Reward = 50:

Terminal Reward = 80:

Terminal Reward = 90:

Termiinal Reward = 100:

Test

A set of 10 times evaluation had been done. The result is shown in /test_img

For example:

Tips

Beside the args of PPO, the args of normalization(mean and std) must be used.

Requirments

numpy

matplotlib

math

gym