12 Matching Annotations
1. Dec 2016
2. colah.github.io colah.github.io
1. Either each layer is a homeomorphism, or the layer’s weight matrix has determinant 0. If it is a homemorphism, AAA is still surrounded by BBB, and a line can’t separate them. But suppose it has a determinant of 0: then the dataset gets collapsed on some axis. Since we’re dealing with something homeomorphic to the original dataset, AAA is surrounded by BBB, and collapsing on any axis means we will have some points of AAA and BBB mix and become impossible to distinguish between.

拿着样是不是神经网络每层神经元的个数都是一个：第一个隐层神经元个数最多，然后依次下降？并且第一层最多。是这样么？

2. A linear transformation by the “weight” matrix WWW A translation by the vector bbb Point-wise application of tanh.

先进行线性变换（拉伸、旋转、平移），然后用非线性激活函数实现：线性不可分->线性可分！

#### URL

3. Nov 2016
4. www.kuanshi.me www.kuanshi.me
1. 1.和其他方法一样，精油去黑头也是要先清洁的；然后打开毛孔，一般我们用热毛巾敷脸就行了，如果有美容院的蒸面器，就更好了。2.可以开始了，在鼻子上涂抹精油，然后轻轻按摩至少十分钟，这样才能让肌肤充分吸收精油。3.最后用清水洗净，然后拍上爽肤水，防止毛孔粗大。

精油去黑头。

#### URL

5. bbs.pinggu.org bbs.pinggu.org
1. 什么是统计量的size和power？          size是指size of the test，就是置信水平（1 - 阿尔法）里面的那个“阿尔法”，又称“检验水平”。 power是指power of test statistic，是统计量的“统计检验力”。 在有限样本时，即使当N和T分别小于50和2时，该检验统计量仍然拥有合理的size, 特别当T大于等于10时，该检验拥有良好的power.——这句话怎样理解？ "合理的size就是能够满足合理的置信水平条件，也就是犯I类错误的概率很低。良好的power是指犯II类错误的概率很低，也就是H0为假时拒绝H0的概率很高。"

在试验样本确定的情况下，$$\alpha$$越小，$$\beta$$就越大。

#### URL

6. roachsinai.github.io roachsinai.github.io
1. Softmax分类器所做的就是最小化在估计分类概率（就是 Li=efyi/∑jefjLi=efyi/∑jefjL_i =e^{f_{y_i}}/\sum_je^{f_j}）和“真实”分布之间的交叉熵.

而这样的好处，就是如果样本误分的话，就会有一个非常大的梯度。而如果使用逻辑回归误分的越严重，算法收敛越慢。比如，$$t_i=1$$ 而 $$y_i=0.0000001$$，cost function 为 $$E=\frac{1}{2}(t-y)^2$$ 那么，$$\frac{dE}{dw_i}=-(t-y)y(1-y)x_i$$.

#### URL

7. Jul 2016
8. en.wikipedia.org en.wikipedia.org
1. following equation

$$y={argmax} _{c_{j}\in C}\sum _{h_{i}\in H}{P(c_{j}|h_{i})P(T|h_{i})P(h_{i})}$$

$$={argmax} _{c_{j}\in C}\sum _{h_{i}\in H}{P(c_{j}|h_{i})P(T,h_{i})}$$

$$= {argmax}_{c_{j}\in C}\sum _{h_{i}\in H}{P(c_{j}|h_{i})P(h_{i}|T)}$$

\propto doesn't work well.

#### URL

9. cs231n.github.io cs231n.github.io
1. For example, a large gradient flowing through a ReLU neuron could cause the weights to update in such a way that the neuron will never activate on any datapoint again.

ReLU函数在输入(wx)大于0的情况下激活。如果在后向传播的过程中，ReLU Unit 接收了一个大的梯度，使得某些w变成较大的负数。那么可能导致这个单元之后的输出都是0，它后向传播的梯度值也是0.

#### URL

10. zhuanlan.zhihu.com zhuanlan.zhihu.com
1. z字型的下降

锯齿形状的突变而非缓慢变化。

#### URL

11. Jun 2016
12. cs231n.github.io cs231n.github.io
1. Backpropagation can thus be thought of as gates communicating to each other (through the gradient signal) whether they want their outputs to increase or decrease (and how strongly), so as to make the final output value higher.

后向传播可以视为门单元之间的通信，只要各单元值随着梯度信号方向变化（单元局部梯度为负，单元输入值就降低，反之，增加），神经网络最后的输出值就会增加。

2. The derivative on each variable tells you the sensitivity of the whole expression on its value.

函数对某变量的偏导表明该变量的变化对函数变化的影响。

#### URL

13. jeremykun.com jeremykun.com
1. fine solution is .

So, if samples is enough, PAC-learnable, machine could learn!

2. concept class

Concept class(maybe, version space) is a set of concepts, which satisfy the map between given samples and their corresponding given labels(i.e. target concept).

Means concept is just a mapping function. But, every concept belong to concept class is a target class.