-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Loss Functions and Metrics
CNTK contains a number of common predefined loss functions. In addition, custom loss functions can be defined as BrainScript expressions.
Computes the cross entropy
CrossEntropy (y, p)
CrossEntropyWithSoftmax (y, z)
-
y
: labels (one-hot), or more generally, reference distribution. Must sum up to 1. -
p
(forCrossEntropy()
): posterior probability distribution to score against the reference. Must sum up to 1. -
z
(forCrossEntropyWithSoftmax()
): input to a Softmax operation to compute the posterior probability distribution to score against the reference
This operation computes the cross-entropy between y
and p
, which is defined as:
ce = sum_i y_i log p_i
with i
iterating over all elements of y
and p
. For CrossEntropyWithSoftmax()
, p
is computed from the input parameter z
as
p = Softmax (z)
These functions compute the cross-entropy of two probability distributions, which is the most common
training criterion (loss function), where y
is a one-hot-represented categorical label.
CrossEntropyWithSoftmax()
is an optimization that takes advantage of the fact that for one-hot input,
the full Softmax distribution is not needed.
Instead of a normalized probability, it accepts as its input the argument to the Softmax operation,
which is the same as a non-normalized version of log Softmax, also known as "logit".
This is the recommended way in CNTK to compute the cross-entropy criterion.
The function's result is undefined if the distributions y
and p
(not for CrossEntropyWithSoftmax()
) do not sum up to 1.
Specifically, this function cannot be used for multi-class labels, where y
contains more than one 1.
In this case you should consider a Sigmoid
()
with Logistic()
loss.
CrossEntropyWithSoftmax()
is currently a CNTK primitive which has limitations.
A more flexible, recommended alternative is to define it manually as:
CrossEntropyWithSoftmax (y, z) = ReduceLogSum (z) - TransposeTimes (y, z)
To compute the cross entropy with sparse labels (e.g. read using Input
(./Inputs#input){..., sparse=true}
),
the alternative form above must be used.
To compute CrossEntropyWithSoftmax()
over When applied to tensors of rank>1, e.g. where the task is to determine a location on a 2D grid, yet another alternative form must be used:
CrossEntropyWithSoftmax (y, z, axis=None) = ReduceLogSum (z, axis=axis) - ReduceSum (y .* z, axis=axis)
This form also allows to apply the Softmax operation along a specific axis only.
For example, if the inputs and labels have the shape [10 x 20]
, and the Softmax should be computed
over each of the 20 columns independently, specify axis=1
.
labels = Input {9000}
...
z = W * h + b
ce = CrossEntropyWithSoftmax (labels, z)
criterionNodes = (ce)
The same with sparse labels:
labels = Input {9000, sparse=true}
...
z = W * h + b
ce = ReduceLogSum (z) - TransposeTimes (labels, z)
criterionNodes = (ce)
Computes the error rate for prediction of categorical labels.
ErrorPrediction (y, z)
-
y
: categorical labels -
z
: vector of prediction scores, e.g. log probabilities
1 where the maximum value of z
is at a position where y
has a 1.
This function accepts a vector of posterior probabilities, logits, or other matching scores, where each element represents the matching score of a class or category. The function determines whether the highest-scoring class is equal to the
ErrorPrediction_new (L, z, tag='') = Minus (BS.Constants.One, TransposeTimes (L, Hardmax (z)), tag=tag)