6 Matching Annotations
1. Dec 2016
2. colah.github.io colah.github.io
1. Either each layer is a homeomorphism, or the layer’s weight matrix has determinant 0. If it is a homemorphism, AAA is still surrounded by BBB, and a line can’t separate them. But suppose it has a determinant of 0: then the dataset gets collapsed on some axis. Since we’re dealing with something homeomorphic to the original dataset, AAA is surrounded by BBB, and collapsing on any axis means we will have some points of AAA and BBB mix and become impossible to distinguish between.

拿着样是不是神经网络每层神经元的个数都是一个：第一个隐层神经元个数最多，然后依次下降？并且第一层最多。是这样么？

2. A linear transformation by the “weight” matrix WWW A translation by the vector bbb Point-wise application of tanh.

先进行线性变换（拉伸、旋转、平移），然后用非线性激活函数实现：线性不可分->线性可分！

#### URL

3. Oct 2016
4. www.jiqizhixin.com www.jiqizhixin.com
1. 反向传播只是在个别错误上进行梯度下降。通过比较对神经网络预期输出的预测，而后计算相对于神经网络的权重的误差梯度。然后得出了权值空间中减小误差的方向。

反向传播算法

2. 为了采用反向传播（见下文），神经网络必须使用可微的激活函数。

#### URL

5. yjango.gitbooks.io yjango.gitbooks.io
1. 每层的数学理解：用线性变换跟随着非线性变化，将输入空间投向另一个空间。

神经网络的层的解释

#### URL

6. colah.github.io colah.github.io
1. Topology of tanh Layers Each layer stretches and squishes space, but it never cuts, breaks, or folds it. Intuitively, we can see that it preserves topological properties. For example, a set will be connected afterwards if it was before (and vice versa). Transformations like this, which don’t affect topology, are called homeomorphisms. Formally, they are bijections that are continuous functions both ways. Theorem: Layers with NNN inputs and NNN outputs are homeomorphisms, if the weight matrix, WWW, is non-singular. (Though one needs to be careful about domain and range.) Proof: Let’s consider this step by step: Let’s assume WWW has a non-zero determinant. Then it is a bijective linear function with a linear inverse. Linear functions are continuous. So, multiplying by WWW is a homeomorphism. Translations are homeomorphisms tanh (and sigmoid and softplus but not ReLU) are continuous functions with continuous inverses. They are bijections if we are careful about the domain and range we consider. Applying them pointwise is a homemorphism Thus, if WWW has a non-zero determinant, our layer is a homeomorphism. ∎ This result continues to hold if we compose arbitrarily many of these layers together.

拓扑学 连续性与sigmoid函数