1 Matching Annotations
- Oct 2015
full loss function as coming from a Gaussian prior over the weight matrix WW, where instead of MLE we are performing the Maximum a posteriori (MAP) estimation. We mention these interpretations to help your intuitions, but the full details of this derivation are beyond the scope of this class.
Can anyone provide resources where I can find this derivation? In particular, the derivation for the regularization term \(R(W)\) coming from a Gaussian prior on \(W\).