add ddpg #62

cjcchen · 2018-05-16T16:18:45Z

add ddpg implementation to master branch

eric-heiden

Make sure you set a name on every TensorFlow placeholder and operation so the graph is readable. I'm not sure if this implementation is serializable right now, so please ensure you can run DDPG with run_experiment_lite (add a separate launcher for this). See more comments in the code.

eric-heiden · 2018-05-16T17:28:01Z

sandbox/rocky/tf/algos/ddpg/ddpg.py

+                    total_reward += reward
+                    if terminal:
+                        self._report_total_reward(total_reward, episode_step)
+                        print("epoch %d, total reward %lf\n" % (episode_step,


Use rllab's logger for this (logger.record_tabular('key', value)) and log more performance measures than reward, e.g.:

actor loss

critic loss

max and average predicted Q value

etc. (maybe you can think of more things that could help us understand the performance)

eric-heiden · 2018-05-16T17:28:41Z

sandbox/rocky/tf/algos/ddpg/ddpg.py

+        self._target_critic.name = 'target_critic'
+
+        #replay buffer
+        self._replay_buffer = ReplayBuffer(1e6)


The size of the replay buffer should be user-definable

eric-heiden · 2018-05-16T17:30:41Z

sandbox/rocky/tf/algos/ddpg/ddpg.py

+                 env,
+                 gamma=0.99,
+                 tau=0.001,
+                 observation_range=(-5, 5),


observation_range isn't used anywhere (and shouldn't be needed)

eric-heiden · 2018-05-16T17:32:16Z

sandbox/rocky/tf/algos/ddpg/ddpg.py

+        self._replay_buffer = ReplayBuffer(1e6)
+
+        if log_dir:
+            self._summary_writer = tf.summary.FileWriter(


Use rllab's logger

eric-heiden · 2018-05-16T17:35:45Z

sandbox/rocky/tf/algos/ddpg/ddpg.py

+                 critic_l2_weight_decay=0.01,
+                 action_noise=None,
+                 plot=False,
+                 check_point_dir=None,


Use logger.save_itr_params which takes care of checkpoint folders etc.

I am not sure what this function is used for yet. The checkpoint folder is used for tensorflow, it seems it does not need to save other parameters but the graph.

eric-heiden · 2018-05-16T17:42:18Z

sandbox/rocky/tf/algos/network/actor_critic_net.py

+        self._train_op = self._optimizer.apply_gradients(
+            zip(grads, self.trainable_vars))
+
+    def _build_net(self, state):


add support for user-defined number and sizes of hidden layers, e.g. hidden_layers=[64, 64]

eric-heiden · 2018-05-16T17:45:38Z

sandbox/rocky/tf/algos/ddpg/ddpg.py

+                    self._train_net()
+            self.save_session(episode_step)
+
+    def load_session(self):


We don't need this code if everything is serializable. Make sure you can run your implementation with run_experiment_lite.

eric-heiden · 2018-05-16T17:48:33Z

sandbox/rocky/tf/algos/ddpg/ddpg.py

+                 action_range=(-1, 1),
+                 actor_lr=1e-4,
+                 critic_lr=1e-3,
+                 reward_scale=1,


reward_scale should be called discount (in accordance with the other implementations). A value of 1 seems strange, are you sure it isn't <1, e.g. 0.99?

In the paper and baseline, they all set this value to be 1.

eric-heiden · 2018-05-16T17:56:59Z

sandbox/rocky/tf/algos/ddpg/ddpg.py

+                 gamma=0.99,
+                 tau=0.001,
+                 observation_range=(-5, 5),
+                 action_range=(-1, 1),


Assume that the algorithm always gets a normalized environment, where action_range is always in [-1, 1]. We need to implement normalization for gym Envs anyways as described in #64.

eric-heiden · 2018-05-16T17:59:36Z

sandbox/rocky/tf/algos/network/actor_critic_net.py

+            soft_updates.append(
+                tf.assign(target_var, (1. - tau) * target_var + tau * var))
+        assert len(soft_updates) == len(vars)
+        self._update_paras = tf.group(*soft_updates)


Make sure you set a meaningful name on every TensorFlow operation

eric-heiden · 2018-05-17T00:09:57Z

sandbox/rocky/tf/algos/network/actor_critic_net.py

+
+def _fc(x,
+        output_dim,
+        weight_initializer=tf.contrib.layers.xavier_initializer(uniform=False),


initialize weights based on seed from get_seed()

It seems that this weight does not use seed to be initialized.

eric-heiden · 2018-05-17T00:11:55Z

examples/ddpg_gym_tf_pendulum.py

+RANDOM_SEED = 1234
+
+np.random.seed(RANDOM_SEED)
+tf.set_random_seed(RANDOM_SEED)


remove these lines, call set_seed(seed) instead

eric-heiden · 2018-05-17T00:13:06Z

examples/ddpg_gym_tf_pendulum.py

+tf.set_random_seed(RANDOM_SEED)
+
+env = gym.make('Pendulum-v0')
+env.seed(RANDOM_SEED)


use get_seed()

eric-heiden · 2018-05-17T00:13:29Z

examples/ddpg_gym_tf_pendulum.py

+env.seed(RANDOM_SEED)
+
+action_dim = env.action_space.shape[-1]
+action_noise = OrnsteinUhlenbeckActionNoise(


the noise should also be based on a seed

eric-heiden · 2018-05-17T22:14:48Z

sandbox/rocky/tf/algos/network/actor_critic_net.py

@@ -64,12 +65,12 @@ def _build_net(self, state):
        with tf.variable_scope(self.name) as scope:

            with tf.variable_scope('fc1'):
-                fc1_out = _fc(state, 64)
+                fc1_out = _fc(state, self._hidden_layers[0])


There could be more than 2 hidden layers, so you should create fc layers dynamically based on the array size.

eric-heiden · 2018-05-17T22:15:12Z

sandbox/rocky/tf/algos/network/actor_critic_net.py

@@ -166,13 +168,13 @@ def _build_net(self, state, action, reuse=False):
                scope.reuse_variables()

            with tf.variable_scope('fc1'):
-                fc1_out = _fc(state, 64)
+                fc1_out = _fc(state, self._hidden_layers[0])


There could be more than 2 hidden layers, so you should create fc layers dynamically based on the array size.

eric-heiden · 2018-05-17T22:20:08Z

examples/ddpg_gym_tf_pendulum.py


 env = gym.make('Pendulum-v0')
-env.seed(RANDOM_SEED)
+env.seed(ext.get_seed())
+

 action_dim = env.action_space.shape[-1]
 action_noise = OrnsteinUhlenbeckActionNoise(


Please add another launcher script which runs your code via run_experiment_lite. It should work like this:

def run_task(*_): # initialize actor, critic, noise, env, etc. env = ... algo = DDPG(env=env, ...) algo.train() run_experiment_lite( run_task, n_parallel=20, plot=False, )

We need to make sure your DDPG implementation is serializable.

add ddpg

421253e

cjcchen mentioned this pull request May 16, 2018

add ddpg in tensorflow #51

Closed

ryanjulian added this to the Week of May 14 milestone May 16, 2018

eric-heiden requested changes May 16, 2018

View reviewed changes

eric-heiden reviewed May 17, 2018

View reviewed changes

cjcchen added 2 commits May 18, 2018 01:43

add more log

e18ea3b

add user parameters

3bf2f1d

eric-heiden requested changes May 17, 2018

View reviewed changes

add run_experiment_lite

42a4449

ryanjulian modified the milestones: Week of May 14, Week of May 21 May 21, 2018

ryanjulian assigned cjcchen May 23, 2018

ryanjulian added the algos New algorithms label May 25, 2018

ryanjulian modified the milestones: Week of May 21, Week of May 28 May 29, 2018

ryanjulian unassigned cjcchen Jun 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add ddpg #62

add ddpg #62

cjcchen commented May 16, 2018

eric-heiden left a comment

eric-heiden May 16, 2018

eric-heiden May 16, 2018

eric-heiden May 16, 2018

eric-heiden May 16, 2018

eric-heiden May 16, 2018

cjcchen May 17, 2018

eric-heiden May 16, 2018

eric-heiden May 16, 2018

eric-heiden May 16, 2018

cjcchen May 17, 2018

eric-heiden May 16, 2018

eric-heiden May 16, 2018

eric-heiden May 17, 2018

cjcchen May 17, 2018

eric-heiden May 17, 2018 •

edited

Loading

eric-heiden May 17, 2018

eric-heiden May 17, 2018

eric-heiden May 17, 2018

eric-heiden May 17, 2018

eric-heiden May 17, 2018

add ddpg #62

Are you sure you want to change the base?

add ddpg #62

Conversation

cjcchen commented May 16, 2018

eric-heiden left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-heiden May 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-heiden May 17, 2018 •

edited

Loading