2 Matching Annotations
  1. May 2025
    1. Next, we extend the idea of NLL directly to multi-class classification with K classes, where the training label is represented with what is called a one-hot vector y=[y1,…,yK]T, where yk=1 if the example is of class k and yk=0 o

      Conditional Independence

    2. In fact, for different problem settings, we might prefer to pick a different prediction threshold. The field of decision theory considers how to make this choice. For example, if the consequences of predicting +1 when the answer should be −1 are much worse than the consequences of predicting −1 when the answer should be +1, then we might set the prediction threshold to be greater than 0.5.

      Decision of threshold will depend on problem and domain knowlegdge