4 Matching Annotations
  1. Aug 2017
    1. The takeaway is that you should not be using smaller networks because you are afraid of overfitting. Instead, you should use as big of a neural network as your computational budget allows, and use other regularization techniques to control overfitting

      What about the rule of thumb stating that you should have roughly 5-10 times as many data points as weights in order to not overfit?

    1. KL divergence between the two distributions (a measure of distance)

      Note, that it is not a metric since

      $$D_{KL} (p || q) \ne D_{KL} (q || p)$$

    1. new set of images that it has never seen before

      Is there any other assumption made for the "new set" of images, e.g. that it comes from the same distribution as the training set ?