2 Matching Annotations
  1. Aug 2017
    1. The takeaway is that you should not be using smaller networks because you are afraid of overfitting. Instead, you should use as big of a neural network as your computational budget allows, and use other regularization techniques to control overfitting

      What about the rule of thumb stating that you should have roughly 5-10 times as many data points as weights in order to not overfit?

  2. Feb 2017
    1. SVM only cares that the difference is at least 10

      The margin seems to be manually set by the creator in the loss function. In the sample code, the margin is 1-- so the incorrect class has to be scored lower than the correct class by 1.

      How is this margin determined? It seems like one would have to know the magnitude of the scores beforehand.

      Diving deeper, is the scoring magnitude always the same if the parameters are normalized by their average and scaled to be between 0 and 1? (or -1 and -1... not sure of the correct scaling implementation)

      Coming back to the topic -- is this 'minimum margin' or delta a tune-able parameter?

      What effects do we see on the model by adjusting this parameter?

      What are best and worst case scenarios of playing with this parameter?