Write custom loss function code to match Yang's loss function #39

mwinton · 2018-11-06T08:23:43Z

No description provided.

mwinton · 2018-11-06T15:45:18Z

https://github.com/zcyang/imageqa-san/blob/master/src/san_att_conv_twolayer_theano.py#L369-L372

prob_y = prob[T.arange(prob.shape[0]), label]
pred_label = T.argmax(prob, axis=1)
# sum or mean?
cost = -T.mean(T.log(prob_y))

prob is the output of the final softmax, so prob_y is the array of probabilities for each of the 1000 labels. Then they take the mean of the log of this array.

mwinton · 2018-11-16T07:47:46Z

Yang's code is actually a shortcut to categorical cross-entropy loss. The first line of code above -- the lookup of at the label's index in prob basically eliminates all terms from the softmax output that would be zero'd out by the true labels when you multiply [0|1] * log (predicted probability).

Then T.mean() is the same as tf.reduce_mean(), just taking the mean over the batch dimension (ie. over axis=0). From what I read, this can be helpful to normalize loss by the batch when dealing with batches of different sizes.

In Keras code, categorical cross-entropy does a tf.reduce_sum(), but that is the sum of p log q for all classes for a particular sample to reduce to one loss number per sample (ie. over axis=-1). This function doesn't show how Keras/TF handle the batch dimension, but presumably they take a mean too.

mwinton · 2018-11-16T08:03:24Z

Confirmed that Keras is taking averages over batches:

"For training loss, keras does a running average over the batches. For validation loss, a conventional average over all the batches in validation data is performed. The training accuracy is the average of the accuracy values for each batch of training data during training."

keras-team/keras#10426

chjatala · 2020-05-26T23:01:58Z

Confirmed that Keras is taking averages over batches:

"For training loss, keras does a running average over the batches. For validation loss, a conventional average over all the batches in validation data is performed. The training accuracy is the average of the accuracy values for each batch of training data during training."

keras-team/keras#10426

"running average over the batches": what are the parameters for the running average?

Is it simple moving average? In that case, how many previous batches average is calculated?
Or is it cumulative moving average? In that case, is it computed from the start of training till current training step, or over a single epoch?
Or is it exponential moving average? What is discount factor?

mwinton added the models label Nov 6, 2018

mwinton added the P1 label Nov 9, 2018

mwinton added this to the Build Initial Model milestone Nov 9, 2018

mwinton removed the P1 label Nov 11, 2018

mwinton self-assigned this Nov 13, 2018

mwinton closed this as completed Nov 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write custom loss function code to match Yang's loss function #39

Write custom loss function code to match Yang's loss function #39

mwinton commented Nov 6, 2018

mwinton commented Nov 6, 2018

mwinton commented Nov 16, 2018 •

edited

Loading

mwinton commented Nov 16, 2018

chjatala commented May 26, 2020

Write custom loss function code to match Yang's loss function #39

Write custom loss function code to match Yang's loss function #39

Comments

mwinton commented Nov 6, 2018

mwinton commented Nov 6, 2018

mwinton commented Nov 16, 2018 • edited Loading

mwinton commented Nov 16, 2018

chjatala commented May 26, 2020

mwinton commented Nov 16, 2018 •

edited

Loading