7 Matching Annotations
  1. Jul 2016
    1. predict_op = tf.argmax(py_x, 1)

      Goal 1: Minimizing the probability of mistake (argmax is doing this) Goal 2: minimizing the expected loss (training based on a cost function is doing this) See: http://classes.engr.oregonstate.edu/eecs/spring2013/cs534/notes/Logistic-Regression-4.pdf

    1. The combination features are crucial in linear models because they introduce moredimensions to the input, transforming it into a space where the data-points are closer tobeing linearly separable

      kernel trick

    2. presenting features as dense vectors is an integralpart of the neural network framework,

      But why people use (one hot encoding) eg in character based CNN (text from scratch paper)

    1. the LSTM are additive with respect totime, alleviating the gradient vanishing problem. Gradientexploding is still an issue, though in practice simple opti-mization strategies (such as gradient clipping) work well

      How is this problem of vanishing or exploding gradient related to eigenvalues of the W operator? Is there any research on this?

  2. Jan 2015
    1. score function that maps the raw data to class scores, and a loss function that quantifies the agreement between the predicted scores and the ground truth labels. We will then cast this as an optimization problem in which we will minimize the loss function with respect to the parameters of the score function.

      score function and loss function

    2. more powerful approach to image classification that we will eventually naturally extend to entire Neural Networks and Convolutional Neural Networks

      kNN is not powerful,