23 Matching Annotations
  1. Jan 2021
    1. Just saying “snaps are slow” is not helpful to anyone. Because frankly, they’re not. Some might be, but others aren’t. Using blanket statements which are wildly inaccurate will not help your argument. Bring data to the discussion, not hearsay or hyperbole.
    1. Blanket statements are never useful. They are nebulous and often send the wrong message. They seed doubt and mistrust and are usually intended to make a grand point about how right the person making the statement might be. They tend to be self-serving even when outwardly it doesn’t appear that way. In other words, we make blanket statements because we want to make a point that makes us look right and therefore look good in the position we are taking.
    1. A blanket statement is a sentence that assumes as truth that something applies to absolutely everything it is discussing. As an example: All people get angry. And the difficulty with such a statement is that the vast majority of the cases, the sentence simply isn't accurate. As we know, people, such as monks, who work to control their emotions, simply DON'T get angry. So it is a comment made to convince one of the validity of an argument when the statement itself has no validity.
    2. A blanket statement is a vague and noncommittal statement asserting a premise without providing evidence (such as specific numbers).
  2. Nov 2020
    1. Professionally our methods of transmitting and reviewing the results of research are generations old and by now are totally inadequate for their purpose. If the aggregate time spent in writing scholarly works and in reading them could be evaluated, the ratio between these amounts of time might well be startling. Those who conscientiously attempt to keep abreast of current thought, even in restricted fields, by close and continuous reading might well shy away from an examination calculated to show how much of the previous month's efforts could be produced on call. Mendel's concept of the laws of genetics was lost to the world for a generation because his publication did not reach the few who were capable of grasping and extending it; and this sort of catastrophe is undoubtedly being repeated all about us, as truly significant attainments become lost in the mass of the inconsequential.

      Specialization, although necessary, has rendered it impossible to stay up to date with the advances of a field.

  3. Aug 2020
    1. Guo, L., Boocock, J., Tome, J. M., Chandrasekaran, S., Hilt, E. E., Zhang, Y., Sathe, L., Li, X., Luo, C., Kosuri, S., Shendure, J. A., Arboleda, V. A., Flint, J., Eskin, E., Garner, O. B., Yang, S., Bloom, J. S., Kruglyak, L., & Yin, Y. (2020). Rapid cost-effective viral genome sequencing by V-seq. BioRxiv, 2020.08.15.252510. https://doi.org/10.1101/2020.08.15.252510

  4. Apr 2020
  5. Feb 2019
    1. Are All Layers Created Equal?

      Google的这文2个 idea 很简单:一个是在 trained 网络各层的参数分别换回训练前的初始参数而观察相应各层的鲁棒性;另一个是把上一个 idea 基础上把那套初始参数再从某分布中随机取一次瞅效果。此 paper 的严谨的验证试验过程是最值得学习的~[并不简单]

  6. Jan 2019
    1. Generalization in Deep Networks: The Role of Distance from Initialization

      Goodfellow 转推了此文。

      作者强调了模型的初始化参数对解释泛化能力的重要性! ​

  7. Dec 2018
    1. Generalization and Equilibrium in Generative Adversarial Nets (GANs)

      转自作者 Yi Zhang 在知乎上的回答:https://www.zhihu.com/question/60374968/answer/189371146


      这应该是第一个认真研究 theoretical guarantees of GANs的工作


      1. generalization of GANs

      在只给定training samples 而不知道true data training distribution的情况下,generator's distribution会不会converge to the true data training distribution.

      答案是否定的。 假设discriminator有p个parameters, 那么generator 使用O(p log p) samples 就能够fool discriminator, 即使有infinitely many training data。

      这点十分反直觉,因为对于一般的learning machines, getting more data always helps.

      2. Existence of equilibrium

      几乎所有的GAN papers都会提到GANs' training procedure is a two-player game, and it's computing a Nash Equilibrium. 但是没有人提到此equilibrium是否存在。

      大家都知道对于pure strategy, equilibrium doesn't always exist。很不幸的是,GANs 的结构使用的正是pure strategy。

      很自然的我们想到把GANs扩展到mixed strategy, 让equilibrium永远存在。

      In practice, 我们只能使用finitely supported mixed strategy, 即同时训练多个generator和多个discriminator。借此方法,我们在CIFAR-10上获得了比DCGAN高许多的inception score.

      3. Diversity of GANs

      通过分析GANs' generalization, 我们发现GANs training objective doesn't encourage diversity. 所以经常会发现mode collapse的情况。但至今没有paper严格定义diversity或者分析各种模型mode collapse的严重情况。

      关于这点在这片论文里讨论的不多。我们有一篇follow up paper用实验的方法估计各种GAN model的diversity, 会在这一两天挂到arxiv上。

  8. Nov 2018
    1. Interpreting Adversarial Robustness: A View from Decision Surface in Input Space


    2. An analytic theory of generalization dynamics and transfer learning in deep linear networks

      这是一篇谈论泛化error和Transfer L.的理论 paper. 虽实验细节还没看懂, 但结论很意义:新提出一个解析的理论方法,发现网络最首要先学到并依赖的是tast structure(通过early-stoping)而不是网络size!这也就解释了为啥随机data比real data更容易被学习,似乎存在更好的non-GD优化策略.

      关于 SNR 也有迁移实验,说可以从高 SNR 迁移到低 SNR。。。

    3. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

      这个哥们的文章和这个月内好几篇的立意基本一致(1811.03804/1811.03962/1811.04918) [抓狂] ,估计作者正写的时候,内心是崩溃的~[笑cry] 赶快强调自己有着不同的 assumption~

    4. An Information-Theoretic View for Deep Learning

      这是一篇关于DL理论的 paper,很有亮点!其给出了期望泛化误差范围内 CNN 层数的理论上界!“the deeper the better” 的神话终将是要设限的。。

    5. Generalization Error in Deep Learning

      一篇谈深度学习模型泛化能力的 review paper~ 不错。。。就是很理论。。很泛泛。。。

    6. Identifying Generalization Properties in Neural Networks

      按作者说法,证明了模型泛化能力确实与 Hessian矩阵相关,还提出了一个新 metric 来量化泛化能力。有趣的是(下方)的那一幅图,可以明显说明较平坦的局部最小对应于更好的泛化能力~~

      来自 NATURE.AI 的 summary 评论:

      This paper discusses the mathematical factors that are attributed to a deep network's generalisation ability. We have written a 2-minute summary that breakdowns the key mathematical framework that underlies this paper.

    7. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

      全篇的数学理论推导,意在回答2/3层过参的网络可以足够充分地学习和有良好的泛化表现,即使在简单的优化策略(类SGD)等假定下。(FYI: 文章可谓行云流水,直截了当,标准规范,阅读有种赏心悦目的感觉~)

  9. Oct 2018
    1. Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks

      深度神经网络训练(收敛/泛化性能)的近似Fisher信息矩阵表征,可自动优化mini-batch size/learning rate

      挺有趣的 paper,提出了从 Fisher 矩阵抽象出新的量用来衡量训练过程中的模型表现,来优化mini-batch sizes and learning rates | 另外 paper 中的figure画的很好看 | 作者认为逐步增加batch sizes的传统理解只是partially true,存在逐步递减该 size 来提高 model 收敛和泛化能力的可能。

  10. Oct 2017
    1. The most valuable insights are both general and surprising. F = ma for example.

      Why is this surprising? It is a definition. A force is what causes an acceleration.

  11. Sep 2016
    1. I must speak honestly about the things that I believe -- the things that we, as Americans, believe

      I think that this would be Hasty Generalization because he is saying what he believes is what all Americans believe. He is using this to bring together America as one.

    2. And in examining his life and his words, I'm sure we both realize we have more work to do to promote equality in our own countries -- to reduce discrimination based on race in our own countries.  And in Cuba, we want our engagement to help lift up the Cubans who are of African descent -- (applause) -- who’ve proven that there’s nothing they cannot achieve when given the chance.

      It is ironic that he speaks of elimination discrimination based on race, but then insinuates that Cubans specifically of African descent have proven to be superior in overcoming adversity. I could be wrong.