Next, we extend the idea of NLL directly to multi-class classification with K classes, where the training label is represented with what is called a one-hot vector y=[y1,…,yK]T, where yk=1 if the example is of class k and yk=0 o
Conditional Independence