12 Matching Annotations
  1. Dec 2016
    1. Either each layer is a homeomorphism, or the layer’s weight matrix has determinant 0. If it is a homemorphism, AAA is still surrounded by BBB, and a line can’t separate them. But suppose it has a determinant of 0: then the dataset gets collapsed on some axis. Since we’re dealing with something homeomorphic to the original dataset, AAA is surrounded by BBB, and collapsing on any axis means we will have some points of AAA and BBB mix and become impossible to distinguish between.


    2. A linear transformation by the “weight” matrix WWW A translation by the vector bbb Point-wise application of tanh.


  2. Nov 2016
    1. 1.和其他方法一样,精油去黑头也是要先清洁的;然后打开毛孔,一般我们用热毛巾敷脸就行了,如果有美容院的蒸面器,就更好了。2.可以开始了,在鼻子上涂抹精油,然后轻轻按摩至少十分钟,这样才能让肌肤充分吸收精油。3.最后用清水洗净,然后拍上爽肤水,防止毛孔粗大。


    1. 什么是统计量的size和power?          size是指size of the test,就是置信水平(1 - 阿尔法)里面的那个“阿尔法”,又称“检验水平”。 power是指power of test statistic,是统计量的“统计检验力”。 在有限样本时,即使当N和T分别小于50和2时,该检验统计量仍然拥有合理的size, 特别当T大于等于10时,该检验拥有良好的power.——这句话怎样理解? "合理的size就是能够满足合理的置信水平条件,也就是犯I类错误的概率很低。良好的power是指犯II类错误的概率很低,也就是H0为假时拒绝H0的概率很高。"


    1. Softmax分类器所做的就是最小化在估计分类概率(就是 Li=efyi/∑jefjLi=efyi/∑jefjL_i =e^{f_{y_i}}/\sum_je^{f_j})和“真实”分布之间的交叉熵.

      而这样的好处,就是如果样本误分的话,就会有一个非常大的梯度。而如果使用逻辑回归误分的越严重,算法收敛越慢。比如,\(t_i=1\) 而 \(y_i=0.0000001\),cost function 为 \(E=\frac{1}{2}(t-y)^2\) 那么,\(\frac{dE}{dw_i}=-(t-y)y(1-y)x_i\).

  3. Jul 2016
    1. following equation

      $$ y={argmax} _{c_{j}\in C}\sum _{h_{i}\in H}{P(c_{j}|h_{i})P(T|h_{i})P(h_{i})} $$

      $$ ={argmax} _{c_{j}\in C}\sum _{h_{i}\in H}{P(c_{j}|h_{i})P(T,h_{i})} $$

      $$= {argmax}_{c_{j}\in C}\sum _{h_{i}\in H}{P(c_{j}|h_{i})P(h_{i}|T)}$$

      \propto doesn't work well.

    1. For example, a large gradient flowing through a ReLU neuron could cause the weights to update in such a way that the neuron will never activate on any datapoint again.

      ReLU函数在输入(wx)大于0的情况下激活。如果在后向传播的过程中,ReLU Unit 接收了一个大的梯度,使得某些w变成较大的负数。那么可能导致这个单元之后的输出都是0,它后向传播的梯度值也是0.

  4. Jun 2016
    1. Backpropagation can thus be thought of as gates communicating to each other (through the gradient signal) whether they want their outputs to increase or decrease (and how strongly), so as to make the final output value higher.


    2. The derivative on each variable tells you the sensitivity of the whole expression on its value.


    1. fine solution is .

      So, if samples is enough, PAC-learnable, machine could learn!

    2. concept class

      Concept class(maybe, version space) is a set of concepts, which satisfy the map between given samples and their corresponding given labels(i.e. target concept).

      Means concept is just a mapping function. But, every concept belong to concept class is a target class.