Loss Functions and Metrics

CNTK contains a number of common predefined loss functions. In addition, custom loss functions can be defined as BrainScript expressions.

CrossEntropy(), CrossEntropyWithSoftmax()

Computes the cross entropy

CrossEntropy (y, p)
CrossEntropyWithSoftmax (y, z)

Parameters

y: labels (one-hot), or more generally, reference distribution. Must sum up to 1.
p (for CrossEntropy()): posterior probability distribution to score against the reference. Must sum up to 1.
z (for CrossEntropyWithSoftmax()): input to a Softmax operation to compute the posterior probability distribution to score against the reference

Return value

This operation computes the cross-entropy between y and p, which is defined as:

ce = sum_i y_i log p_i

with i iterating over all elements of y and p. For CrossEntropyWithSoftmax(), p is computed from the input parameter z as

p = Softmax (z)

Description

These functions compute the cross-entropy of two probability distributions, which is the most common training criterion (loss function), where y is a one-hot-represented categorical label.

CrossEntropyWithSoftmax() is an optimization that takes advantage of the fact that for one-hot input, the full Softmax distribution is not needed. Instead of a normalized probability, it accepts as its input the argument to the Softmax operation, which is the same as a non-normalized version of log Softmax, also known as "logit". This is the recommended way in CNTK to compute the cross-entropy criterion.

The function's result is undefined if the distributions y and p (not for CrossEntropyWithSoftmax()) do not sum up to 1. Specifically, this function cannot be used for multi-class labels, where y contains more than one 1. In this case you should consider a Sigmoid() with Logistic() loss.

Alternative definition

CrossEntropyWithSoftmax() is currently a CNTK primitive which has limitations. A more flexible, recommended alternative is to define it manually as:

CrossEntropyWithSoftmax (y, z) = ReduceLogSum (z) - TransposeTimes (y, z)

Sparse labels

To compute the cross entropy with sparse labels (e.g. read using Input(./Inputs#input){..., sparse=true}), the alternative form above must be used.

Softmax over tensors with rank>1

To compute CrossEntropyWithSoftmax() over When applied to tensors of rank>1, e.g. where the task is to determine a location on a 2D grid, yet another alternative form must be used:

CrossEntropyWithSoftmax (y, z, axis=None) = ReduceLogSum (z, axis=axis) - ReduceSum (y .* z, axis=axis)

This form also allows to apply the Softmax operation along a specific axis only. For example, if the inputs and labels have the shape [10 x 20], and the Softmax should be computed over each of the 20 columns independently, specify axis=1.

Example

labels = Input {9000}
...
z = W * h + b
ce = CrossEntropyWithSoftmax (labels, z)
criterionNodes = (ce)

The same with sparse labels:

labels = Input {9000, sparse=true}
...
z = W * h + b
ce = ReduceLogSum (z) - TransposeTimes (labels, z)
criterionNodes = (ce)

ErrorPrediction{}

Computes the error rate for prediction of categorical labels.

ErrorPrediction (y, z)

Parameters

y: categorical labels
z: vector of prediction scores, e.g. log probabilities

Return value

1 where the maximum value of z is at a position where y has a 1.

Description

This function accepts a vector of posterior probabilities, logits, or other matching scores, where each element represents the matching score of a class or category. The function determines whether the highest-scoring class is equal to the

ErrorPrediction_new (L, z, tag='')         = Minus (BS.Constants.One, TransposeTimes (L, Hardmax (z)), tag=tag)

New Documentation Site

Iteration Plans

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss Functions and Metrics

CrossEntropy(), CrossEntropyWithSoftmax()

Parameters

Return value

Description

Alternative definition

Sparse labels

Softmax over tensors with rank>1

Example

ErrorPrediction{}

Parameters

Return value

Description

Clone this wiki locally