12 Matching Annotations
  1. May 2017
    1. From Section 2.1.2, and (1.45), we know thatE(x>x) =Pdi=1

    2. (x>wi)(w>ix)

      These are scalars, that's why they can be swapped

    3. Figure 1.8

      Several sets of (x, y) points, with the correlation coefficient of x and y for each set. Note that the correlation reflects the non-linearity and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom). N.B.: the figure in the center has a slope of 0 but in that case the correlation coefficient is undefined because the variance of Y is zero.

    1. the data

      Not the data them selves. The variance of the data

    2. One problem in which the Metric MDS could be used, would be to represent cities in atwo-dimensional space, simulating a map. This is a very intuitive example that makesuse of the basic Euclidean distance. If we had a matrix with distances between manycities, we could use a Metric MDS to represent them on a plane. If everything worksas expected, the distribution of the cities would be the same as the one found in a map.Another example would be to represent the digits from the previous example. Thevalues of the 784-dimensional vectors representing each digit carry information ofthe (gray) color intensity of the image; hence, a Metric MDS would be suitable torepresent the digits in a two-dimensional space, hoping to get different cluster foreach of the digits.

      Are these correct ??

  2. Nov 2016
  3. Oct 2016
    1. 3.

      It just happens that the mean and variance of the data are the exactly same values with the mean and variance of the gaussian. And this values maximize likelihood. You could take the derivative of the probability and we would find that it is maximizing using these exact mean and variances

    1. Sigmoid neurons simulating perceptrons, part I

      output={<br> 0 if w⋅x+b≤0<br> 1 if w⋅x+b>0<br> }

      So three cases wx + b is zero, negative or positive if zero then c zero = 0 and is classified as zero so nothing changes

      input negative c negative = positive negative = negative which is classified as zero so nothing changes

      input positive c positive = positive positive = positive which is classified as one so again nothing changes

    2. If you don't find this obvious, you should stop and prove to yourself that this is equivalent.

      -2 -2 bias 3 out

      • 0 0 3 -> True
      • 0 1 1 -> True
      • 1 0 1 -> True
      • 1 1 -1 -> False

      x1 x2 x1*x2(-4) bias 3 out

      • 0 0 0 3 -> True
      • 0 1 0 3 -> True
      • 1 0 0 3 -> True
      • 1 1 1 -1 -> False

      Proven :)

    3. In this way a perceptron in the second layer can make a decision at a more complex and more abstract level than perceptrons in the first layer

      University Professor claims that there are published academic papers which disprove that this is always true. Not always the layers are more abstract. Sometimes it might be vice versa. Sorry no link is provided.

    1. The first assumption we need is that the cost function can be written as an average

      How could have a cost / error function that cannot be written as an average?

  4. www.learn.ed.ac.uk www.learn.ed.ac.uk
    1. Sub-­‐tree  replacement  pruning  (WF  6.1

      Say you have a tree with 5 nodes, n1, n2 etc. First you prune n1, and check with your test data if that increased the accuracy. So that was a tree with n2, n3, n4, and n5. Then you try it with only n2 cut. Check if it increases the accuracy. Do that for all the trees with just one node cut. Whichever results in the largest increases in accuracy you prune. Then you start the process again. So say pruning n5 resulted in the biggest increased in accuracy, you start with the tree including n1, n2, n3, and n4. n5 is forever gone. Then you try all the possible trees with just one of the remaining nodes pruned. You keep doing this until cutting nodes no longer increases accuracy.