15 Matching Annotations
  1. Feb 2019
    1. Are All Layers Created Equal?

      Google的这文2个 idea 很简单:一个是在 trained 网络各层的参数分别换回训练前的初始参数而观察相应各层的鲁棒性;另一个是把上一个 idea 基础上把那套初始参数再从某分布中随机取一次瞅效果。此 paper 的严谨的验证试验过程是最值得学习的~[并不简单]

  2. Jan 2019
    1. Generalization in Deep Networks: The Role of Distance from Initialization

      Goodfellow 转推了此文。

      作者强调了模型的初始化参数对解释泛化能力的重要性! ​

  3. Dec 2018
    1. Generalization and Equilibrium in Generative Adversarial Nets (GANs)

      转自作者 Yi Zhang 在知乎上的回答:https://www.zhihu.com/question/60374968/answer/189371146


      老板在Simons给的talk:https://www.youtube.com/watch?v=V7TliSCqOwI

      这应该是第一个认真研究 theoretical guarantees of GANs的工作

      使用的techniques比较简单,但得到了一些insightful的结论:

      1. generalization of GANs

      在只给定training samples 而不知道true data training distribution的情况下,generator's distribution会不会converge to the true data training distribution.

      答案是否定的。 假设discriminator有p个parameters, 那么generator 使用O(p log p) samples 就能够fool discriminator, 即使有infinitely many training data。

      这点十分反直觉,因为对于一般的learning machines, getting more data always helps.

      2. Existence of equilibrium

      几乎所有的GAN papers都会提到GANs' training procedure is a two-player game, and it's computing a Nash Equilibrium. 但是没有人提到此equilibrium是否存在。

      大家都知道对于pure strategy, equilibrium doesn't always exist。很不幸的是,GANs 的结构使用的正是pure strategy。

      很自然的我们想到把GANs扩展到mixed strategy, 让equilibrium永远存在。

      In practice, 我们只能使用finitely supported mixed strategy, 即同时训练多个generator和多个discriminator。借此方法,我们在CIFAR-10上获得了比DCGAN高许多的inception score.

      3. Diversity of GANs

      通过分析GANs' generalization, 我们发现GANs training objective doesn't encourage diversity. 所以经常会发现mode collapse的情况。但至今没有paper严格定义diversity或者分析各种模型mode collapse的严重情况。

      关于这点在这片论文里讨论的不多。我们有一篇follow up paper用实验的方法估计各种GAN model的diversity, 会在这一两天挂到arxiv上。

  4. Nov 2018
    1. Interpreting Adversarial Robustness: A View from Decision Surface in Input Space

      通常人们都认为,局部最小损失的超平面在参数空间中越平坦,就意味着泛化能力越好。但此文通过可视化某种决策边界认为在原始输入空间中就可以察觉到对抗性鲁棒的端倪~(这个结论还需广泛复现吧,自己不试验下也不敢确信,毕竟缺乏理论基础~[可怜])

    2. An analytic theory of generalization dynamics and transfer learning in deep linear networks

      这是一篇谈论泛化error和Transfer L.的理论 paper. 虽实验细节还没看懂, 但结论很意义:新提出一个解析的理论方法,发现网络最首要先学到并依赖的是tast structure(通过early-stoping)而不是网络size!这也就解释了为啥随机data比real data更容易被学习,似乎存在更好的non-GD优化策略.

      关于 SNR 也有迁移实验,说可以从高 SNR 迁移到低 SNR。。。

    3. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

      这个哥们的文章和这个月内好几篇的立意基本一致(1811.03804/1811.03962/1811.04918) [抓狂] ,估计作者正写的时候,内心是崩溃的~[笑cry] 赶快强调自己有着不同的 assumption~

    4. An Information-Theoretic View for Deep Learning

      这是一篇关于DL理论的 paper,很有亮点!其给出了期望泛化误差范围内 CNN 层数的理论上界!“the deeper the better” 的神话终将是要设限的。。

    5. Generalization Error in Deep Learning

      一篇谈深度学习模型泛化能力的 review paper~ 不错。。。就是很理论。。很泛泛。。。

    6. Identifying Generalization Properties in Neural Networks

      按作者说法,证明了模型泛化能力确实与 Hessian矩阵相关,还提出了一个新 metric 来量化泛化能力。有趣的是(下方)的那一幅图,可以明显说明较平坦的局部最小对应于更好的泛化能力~~

      来自 NATURE.AI 的 summary 评论:

      This paper discusses the mathematical factors that are attributed to a deep network's generalisation ability. We have written a 2-minute summary that breakdowns the key mathematical framework that underlies this paper.

    7. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

      全篇的数学理论推导,意在回答2/3层过参的网络可以足够充分地学习和有良好的泛化表现,即使在简单的优化策略(类SGD)等假定下。(FYI: 文章可谓行云流水,直截了当,标准规范,阅读有种赏心悦目的感觉~)

  5. Oct 2018
    1. Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks

      深度神经网络训练(收敛/泛化性能)的近似Fisher信息矩阵表征,可自动优化mini-batch size/learning rate


      挺有趣的 paper,提出了从 Fisher 矩阵抽象出新的量用来衡量训练过程中的模型表现,来优化mini-batch sizes and learning rates | 另外 paper 中的figure画的很好看 | 作者认为逐步增加batch sizes的传统理解只是partially true,存在逐步递减该 size 来提高 model 收敛和泛化能力的可能。

  6. Oct 2017
    1. The most valuable insights are both general and surprising. F = ma for example.

      Why is this surprising? It is a definition. A force is what causes an acceleration.

  7. Aug 2017
    1. The takeaway is that you should not be using smaller networks because you are afraid of overfitting. Instead, you should use as big of a neural network as your computational budget allows, and use other regularization techniques to control overfitting

      What about the rule of thumb stating that you should have roughly 5-10 times as many data points as weights in order to not overfit?

  8. Sep 2016
    1. I must speak honestly about the things that I believe -- the things that we, as Americans, believe

      I think that this would be Hasty Generalization because he is saying what he believes is what all Americans believe. He is using this to bring together America as one.

    2. And in examining his life and his words, I'm sure we both realize we have more work to do to promote equality in our own countries -- to reduce discrimination based on race in our own countries.  And in Cuba, we want our engagement to help lift up the Cubans who are of African descent -- (applause) -- who’ve proven that there’s nothing they cannot achieve when given the chance.

      It is ironic that he speaks of elimination discrimination based on race, but then insinuates that Cubans specifically of African descent have proven to be superior in overcoming adversity. I could be wrong.