Given possible classes for the output , the possibility of belongs to class is computed through activation function as:
where
def softmax(z): """ Softmax converts a vector of values to a probability distribution. Args: z (ndarray (N,)) : input data, N features Returns: a (ndarray (N,)) : softmax of z """ e_z = np.exp(z) a = e_z / e_z.sum(axis = 0) return a
When we compute loss in , we use the intermediate value of . However, specifying this term in the loss function leads to more numerically accurate output layer: