tensorboard for variational inference #598

akshaykhatri639 · 2017-04-04T02:08:27Z

Add loss as a scalar and gradients and weights as histograms to the
summary

Add loss as a scalar and gradients and weights as histograms to the summary

dustinvtran

Great work so far! You should track the norm of all the gradients too.

Also, see the travis failure to adhere to PEP 8.

dustinvtran · 2017-04-04T16:19:24Z

edward/inferences/variational_inference.py


        var_list.update(get_variables(qz, collection=trainables))

      for x, qx in six.iteritems(self.data):
        if isinstance(x, RandomVariable) and \
                not isinstance(qx, RandomVariable):
          var_list.update(get_variables(x, collection=trainables))
+          data_var_list.update(get_variables(x, collection=trainables))

      var_list = list(var_list)


to make this more efficient, i think you can rewrite var_list to be the concatenation of latent_var_list and data_var_list.

dustinvtran · 2017-04-04T16:20:14Z

edward/inferences/variational_inference.py

      trainables = tf.trainable_variables()
      for z, qz in six.iteritems(self.latent_vars):
        if isinstance(z, RandomVariable):
          var_list.update(get_variables(z, collection=trainables))
+          latent_var_list.update(get_variables(z, collection=trainables))

        var_list.update(get_variables(qz, collection=trainables))

      for x, qx in six.iteritems(self.data):


a good practice is to instantiate variables closest to where they're used. e.g., instantiate data_var_list above this line.

dustinvtran · 2017-04-04T16:32:52Z

edward/inferences/variational_inference.py

+    if self.logging:
+      #TODO: when var_list is not None
+      tf.summary.scalar("loss", self.loss)
+      for var in latent_var_list:


These two double for loops could be made more efficient and readable. For example:

for grad, var in grads_and_vars: if var in latent_var_list: name = "variational" elif var in data_var_list: name = "model" else: name = "" with tf.name_scope(name): tf.summary.histogram("parameter_" + var.name, var) tf.summary.histogram("gradient_" + var.name, grad)

This also solves your TODO above.

Some other tips I added to the above: 1. Use variational over "inference", as they're variational parameters in general; "inference" is specific to inference networks in VAEs; 2. use name scopes if you notice you're using a prefix everywhere.

Also, is there a reason you used CamelCase?

No particular reason for CamelCase. I tried both and decided to stick with it just for the aesthetics. Using name scopes is a better idea though.

On running initialize(), tensorflow prints these warnings for all variables/gradients:

INFO:tensorflow:Summary name parameter_dense_1/bias:0 is illegal; using parameter_dense_1/bias_0 instead. INFO:tensorflow:Summary name gradient_gradients/inference_4608532368/0/dense_1/BiasAdd_grad/BiasAddGrad:0 is illegal; using gradient_gradients/inference_4608532368/0/dense_1/BiasAdd_grad/BiasAddGrad_0 instead.

Should we do something about it?

Yes, try iterating them with _0, _1, etc. for each unique name?

Also, can you check that the variables are given their appropriate summary names? parameter_dense_1/bias:0 sounds like a model parameter, but it's not in the model name scope. Same for the second line.

Should I simply replace ':' with '_'?
The names include the scope when they are displayed on tensorboard, they don't print the full names here.

But as seen from the screenshot, there is one other problem. Not all variables are grouped under scope 'model'. It uses 'model_1' when tf.name_scope(scope_name) is called again.

Should I add all the variables and gradients from one scope in one go? Something like:

with tf.name_scope('model'): for grad, var in grads_and_vars: if var in data_var_list: tf.summary.histogram("parameter_" + var.name, var) tf.summary.histogram("gradient_" + grad.name, grad)

and similiarly for inference.

To your two questions: yes and yes.

dustinvtran · 2017-04-04T16:35:40Z

edward/inferences/variational_inference.py

+            tf.summary.histogram("ModelWeight_" + var.name, var)
+            tf.summary.histogram("ModelGradient_" + var.name, grad_var_tuple[0])
+
+      self.summarize = tf.summary.merge_all()


You define self.summarize here, but it's also defined when using the parent method's initialize (Inference).

You're right that we need to merge the summaries after they're defined and not before. It sounds like we need to decide where to properly put the merge summary op; not sure if it's as simple as moving the call to the parent method after VariationalInference finishes its initialize instead of before.

Yes, it doesn't work when self.summarize is defined before defining the summary ops.
I tried to move the call to parent's initialize() at the end of the function. But we are using variables like self.logging which are initialized in the parent call. Here it will be easy to modify, but that may not be the case for all classes that are inherited from Inference.

Or we could call summary.merge_all() in each of the child classes and remove it from the parent. For now, we can let it be defined in both until we modify the other classes.

Maybe it would even make sense to have the summary writer object outside the inference class? Or else how does it work with composed inferences (http://edwardlib.org/api/inference-compositionality)?

@mariru. Good point. It sounds like self.summary shouldn't merge all summaries in the TensorFlow graph, but only summaries added inside its class. That way self.summary is localized and thus compatible with compositions.

I think summary.merge_all() will merge all the summaries in the graph. We could instead use tf.merge() which takes as input a list of summary buffers. Something like:
self.summarize = tf.merge(summary_objects)

tf.merge requires storing all the summary objects. However, we build summaries in various methods of the Inference class, so this is not very TensorFlow-y.

We can use a unique graph collection for each inference; then we can merge all summaries within that collection. For example:

# summaries are added like this summary_key= 'summaries_' + str(id(inference)) tf.summary.scalar(..., collections=[summary_key]) tf.summary.scalar(..., collections=[summary_key]) # when forming merge summary summary_key = 'summaries_' + str(id(inference)) self.summarize = tf.summary.merge_all(summary_key)

dustinvtran · 2017-04-06T16:55:32Z

Bump. It would be great to get this merged in before you start working on the Monte Carlo / more specific variational method stuf.

Also removes redundancy in updating var_list twice

dustinvtran · 2017-04-24T06:24:33Z

edward/inferences/gan_inference.py

@@ -101,6 +101,12 @@ def initialize(self, optimizer=None, optimizer_d=None,
    self.train_d = optimizer_d.apply_gradients(grads_and_vars_d,
                                               global_step=global_step_d)

+    if self.logging:
+      summary_key = 'summaries_' + str(id(self))
+      tf.summary.scalar('loss_fake', self.loss_d, collections=[summary_key])


Can you rename these to 'loss_discriminative' and 'loss_generative' respectively?

dustinvtran · 2017-04-24T06:25:36Z

Are you including the summary ops for the VI losses here too?

akshaykhatri639 · 2017-04-24T06:53:12Z

Yes.

dustinvtran · 2017-04-26T17:47:04Z

@akshaykhatri639 I'm not sure if I follow: do you mean they will be in this PR?

mariru · 2017-05-04T18:58:27Z

edward/inferences/variational_inference.py

+            tf.summary.scalar("gradient_norm_" +
+                              grad.name.replace(':', '_'),
+                              tf.norm(grad), collections=[summary_key])
+      # replace : with _ because tf does not allow : in var names in summaries


@akshaykhatri639 Really looking forward to this feature!
I tried it using MAP inference, but the naming turns out weird.

I would replace grad.name.replace(':', '_') with var.name.replace(':', '_') everywhere, because the gradients are named after the operations. But then by prefacing it with gradient_ it will still be clear from the name that its a gradient.

I have been using tf.name_scope('model') for the actual model.

Maybe rename to tf.name_scope('training') for logging? But that's minor. I can just rename the scope for my model.

naming of gradients messy: see comment above

tracking gradients is super useful, but it might be even more useful to look at gradients*learning_rate or (parameters - gradients*learning_rate)/parameters.
Especially with adaptive learning rates when the user doesn't know the current learning rate, it is impossible to know if the gradient is in the right ballpark from looking at the histograms.

when training blows up and some parameters become NaN or inf (e.g. because the learning rate is too large) I get an error message that is hard to decipher.

InvalidArgumentError (see above for traceback): Nan in summary histogram for: model_1/parameter_model/embeddings/alpha_0 [[Node: model_1/parameter_model/embeddings/alpha_0 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](model_1/parameter_model/embeddings/alpha_0/tag, model/embeddings/alpha/read)]]

Maybe there's a nicer way to catch this? I guess that's more an idiosyncrasy of tensorflow than of edward. But still, catching it in a more readable way would be nice

You are right about names of the summary ops. I will make the required changes.

Won't this become a bit complicated? Because each optimizer has a slightly different way of modifying weights and may be storing one or more cache/momentum matrices for each gradient. Should we implement is differently for each optimizer? Otherwise, the values in the summary still won't be close to the values that the weights were updated with.

Should I check if any values become Nan before writing the summary ops?

There's a self.debug member in Edward inferences for checking if certain ops blow up. I agree with Maja that it would be nice if we can somehow raise a more informative error. (but without running a check every iteration just to raise that error)

dustinvtran · 2017-05-28T01:33:21Z

Merging this now. I'll personally make changes that revise this slightly in another PR.

tensorboard for variational inference

43614a0

Add loss as a scalar and gradients and weights as histograms to the summary

dustinvtran reviewed Apr 4, 2017

View reviewed changes

akshaykhatri639 added 7 commits April 10, 2017 22:44

Better summary names with scopes

4e21dde

Also removes redundancy in updating var_list twice

summaries that work with composed inferences

aae5407

Resolve PEP failure on line length

5992c88

Summaries for gan_inference

e746fc3

Summaries for monte_carlo

e08f1af

Changes to pass PEP line length tests

b4fa6c5

change comment to pass PEP line length test

bb46088

dustinvtran reviewed Apr 24, 2017

View reviewed changes

Better summary op names

92756ea

akshaykhatri639 added 2 commits May 1, 2017 13:28

Summaries for klpq and klqp

c047be8

Merge remote-tracking branch 'blei-lab/master' into feature/tensorboard

7b2642b

mariru reviewed May 4, 2017

View reviewed changes

Include var names in gradient summary ops

d521bcf

closedLoop mentioned this pull request May 27, 2017

TensorBoard Integration with Notebook Example #653

Merged

dustinvtran merged commit 0f98b86 into blei-lab:master May 28, 2017

dustinvtran mentioned this pull request May 28, 2017

Prettify TensorBoard scalars and histogram summaries #654

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensorboard for variational inference #598

tensorboard for variational inference #598

akshaykhatri639 commented Apr 4, 2017

dustinvtran left a comment •

edited

Loading

dustinvtran Apr 4, 2017

dustinvtran Apr 4, 2017

dustinvtran Apr 4, 2017

akshaykhatri639 Apr 8, 2017

akshaykhatri639 Apr 9, 2017

dustinvtran Apr 9, 2017

akshaykhatri639 Apr 9, 2017 •

edited

Loading

dustinvtran Apr 10, 2017

dustinvtran Apr 4, 2017

akshaykhatri639 Apr 9, 2017

mariru Apr 11, 2017

dustinvtran Apr 12, 2017

akshaykhatri639 Apr 12, 2017

dustinvtran Apr 12, 2017 •

edited

Loading

dustinvtran commented Apr 6, 2017

dustinvtran Apr 24, 2017

akshaykhatri639 Apr 24, 2017

dustinvtran commented Apr 24, 2017

akshaykhatri639 commented Apr 24, 2017

dustinvtran commented Apr 26, 2017

mariru May 4, 2017

mariru May 4, 2017

akshaykhatri639 May 7, 2017 •

edited

Loading

dustinvtran May 9, 2017

dustinvtran commented May 28, 2017 •

edited

Loading

tensorboard for variational inference #598

tensorboard for variational inference #598

Conversation

akshaykhatri639 commented Apr 4, 2017

dustinvtran left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akshaykhatri639 Apr 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dustinvtran Apr 12, 2017 • edited Loading

Choose a reason for hiding this comment

dustinvtran commented Apr 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dustinvtran commented Apr 24, 2017

akshaykhatri639 commented Apr 24, 2017

dustinvtran commented Apr 26, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akshaykhatri639 May 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dustinvtran commented May 28, 2017 • edited Loading

dustinvtran left a comment •

edited

Loading

akshaykhatri639 Apr 9, 2017 •

edited

Loading

dustinvtran Apr 12, 2017 •

edited

Loading

akshaykhatri639 May 7, 2017 •

edited

Loading

dustinvtran commented May 28, 2017 •

edited

Loading