121 Matching Annotations
  1. Oct 2019
    1. gram matrix must be normalized by dividing each element by the total number of elements in the matrix.

      true, after downsampling your gradient will get smaller on later layers

  2. Sep 2019
    1. Deep Learning for Search - teaches you how to leverage neural networks, NLP, and deep learning techniques to improve search performance. (2019) Relevant Search: with applications for Solr and Elasticsearch - demystifies relevance work. Using Elasticsearch, it teaches you how to return engaging search results to your users, helping you understand and leverage the internals of Lucene-based search engines. (2016)
    2. Elasticsearch with Machine Learning (English translation) by Kunihiko Kido Recommender System with Mahout and Elasticsearch
  3. Jun 2019
  4. May 2019
  5. Apr 2019
  6. Mar 2019

      Deep Compression" can reduce the model sizeby 18?to 49?without hurting the prediction accuracy. We also discovered that pruning and thesparsity constraint not only applies to model compression but also applies to regularization, andwe proposed dense-sparse-dense training (DSD), which can improve the prediction accuracy for awide range of deep learning models. To efficiently implement "Deep Compression" in hardware,we developed EIE, the "Efficient Inference Engine", a domain-specific hardware accelerator thatperforms inference directly on the compressed model which significantly saves memory bandwidth.Taking advantage of the compressed model, and being able to deal with the irregular computationpattern efficiently, EIE improves the speed by 13?and energy efficiency by 3,400?over GPU

    1. A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification

    1. A Gentle Tutorial of Recurrent Neural Network with ErrorBackpropagation

      A Gentle Tutorial of Recurrent Neural Network with ErrorBackpropagation

  7. arxiv.org arxiv.org
    1. To the best of our knowl-edge, there has not been any other work exploringthe use of attention-based architectures for NMT


    1. One of the challenges of deep learning is that the gradients with respect to the weights in one layerare highly dependent on the outputs of the neurons in the previous layer especially if these outputschange in a highly correlated way. Batch normalization [Ioffe and Szegedy, 2015] was proposedto reduce such undesirable “covariate shift”. The method normalizes the summed inputs to eachhidden unit over the training cases. Specifically, for theithsummed input in thelthlayer, the batchnormalization method rescales the summed inputs according to their variances under the distributionof the data

      batch normalization的出现是为了解决神经元的输入和当前计算值交互的高度依赖的问题。因为要计算期望值,所以需要拿到所有样本然后进行计算,显然不太现实。因此将取样范围和训练时的mini-batch保持一致。但是这就把局限转移到mini-batch的大小上了,很难应用到RNN。因此需要LayerNormalization.

    2. Layer Normalization

  8. Feb 2019
    1. ecent advances of deep learning have inspiredmany applications of neural models to dialoguesystems. Wen et al. (2017) and Bordes et al.(2017) introduced a network-based end-to-endtrainable task-oriented dialogue system, whichtreated dialogue system learning as the problemof learning a mapping from dialogue histories tosystem responses, and applied an encoder-decodermodel to train the whole system



    1. BERT: Pre-training of Deep Bidirectional Transformers forLanguage Understanding

  9. Jan 2019
    1. By utilizing the Deeplearning4j library1 for model representation, learning and prediction, KNIME builds upon a well performing open source solution with a thriving community.
    2. It is especially thanks to the work of Yann LeCun and Yoshua Bengio (LeCun et al., 2015) that the application of deep neural networks has boomed in recent years. The technique, which utilizes neural networks with many layers and enhanced backpropagation algorithms for learning, was made possible through both new research and the ever increasing performance of computer chips.
  10. Dec 2018
  11. Oct 2018
    1. Many detection methods such as Faster-RCNN and YOLO, perform badly in small objects detec-tion. With some considerable improvements in the originalframework of YOLOv2, our proposed SO-YOLO can solvethis problem perfectly.



    1. As a convolutional neural network, SO-YOLO outperforms state-of-the-art detection methods both in accuracy and speed.
    2. SO-YOLO performs well in detecting small objects compared with other methods.
  12. May 2018
  13. Mar 2018
    1. artificial neural network

      El deep learning incluye redes neuronales

    2. Artificial intelligence (AI), machine learning and deep learning

      Explicación gráfica de artificial intelligence, machine learning y deep learning

  14. Sep 2017
  15. Aug 2017
    1. This is a very easy paper to follow, but it looks like their methodology is a simple way to improve performance on limited data. I'm curious how well this is reproduced elsewhere.

  16. Apr 2017
    1. areas where deep learning is currently being poorly utilized

      who is curating a list of deep learning success stories, case studies and applications?

    2. highly automated tools for training deep learning models

      such as?

    3. The best way we can help these people is by giving them the tools and knowledge to solve their own problems, using their own expertise and experience.

      Agree or disagree?

    1. Appendix A:Table of various deep learning applications

      This is a good list. Has anyone come across a comprehensive list of deep learning applications?

    1. Almost all exciting results based on recurrent neural networks are achieved with them.


    1. If we write that out as equations, we get:

      It would be easier to understand what are x and y and W here if the actual numbers were used, like 784, 10, 55000, etc. In this simple example there are 3 x and 3 y, which is misleading. In reality there are 784 x elements (for each pixel) and 55,000 such x arrays and only 10 y elements (for each digit) and then 55,000 of them.

  17. Mar 2017
    1. Consequently, our advice is simple: continue to train your networks on a single machine, until the training time becomes prohibitive.

      一定要对 数据加载时间、参数通信时间、计算时间有个明确的评估,不能为了并行而并行。能单机解决的问题就不着急上多机。

    2. odel parallelism can work well in practice, data parallelism is arguably the preferred approach for distributed systems and has been the focus of more research

      why ?

  18. Dec 2016
    1. Key points:

      1. Scale of data is especially good for large NNs
      2. Having a combination of HPC and AI skills is important to have optimal impact (handle scale challenges and bigger/complex NN)
      3. Most of the value right now comes from CNNS, FCs, RNNS. Unsupervised, GANs and others might be future but they are research topics right now.
      4. E2E DL might be relevant for some cases in future like speech -> transcript, Image -> captioning, text -> image
      5. Self driving cars might also move to E2E, but none of us have enough data image -> steer


      1. Bias = Training error - Human error. Try Bigger model, run longer, New model architecture
      2. Variance = Dev error - Train error. Try More data, Regularization, New model architecture.
      3. Conflict between bias and variance is weaker in DL. We can have bigger model with more data.

      More data:

      1. Data synthesis/augmentation is becoming useful and popular: OCR (superpose alphabets on various images), Speech (Superpose various background noises), NLP(?) But does have drawbacks, if it is not representative
      2. Unified data warehouse helps leverage data usage across company

      Data set breakdown:

      1. Dev and test should come from same distribution. As we spend a lot of time optimizing for Dev accuracy.

      Progress plateaus above Human level performance:

      • But there is theoretical optimal error rate (Bayes rate)

      What to do when bias is high:

      • Look at examples of the ones machine got it wrong
      • Get labels from humans?
      • Error analysis: Segment training - identify segments where training error is higher than human.
      • Estimate bias/variance effect?

      How do you define human level performance: Example: Error of a panel of experts

      Size of data:

      1. How do you define a NN as small vs medium vs large?
      2. Is the reason large NN can leverage bigger data is because it would not cause overfitting unlike on smaller NNs?
  19. Nov 2016
    1. Deep neural networks use multiple layers with each layer requiring it's own weight and bias.

      Every layer needs its own weights and bias. And in tensorflow, it is a good practice to put all weights inside a dictionary, which is easier for management.

  20. Oct 2016
    1. 这里要求,输入的数据时成对存在,每一对都有一个公共的label,是否是同一个类别。

      Verification signal

  21. Jul 2016
    1. half-spaces sepa-rated by a hyperplane19.


    2. Deep learning


    3. The key aspect of deep learning is that these layers of features are not designed by human engineers: they are learned from data using a general-purpose learning procedure.


    4. most practitioners use a procedure called stochastic gradient descent (SGD).


    5. , The chain rule of derivatives tells us how two small effects (that of a small change of x on y, and that of y on z) are composed.


    6. The backpropagation procedure to compute the gradient of an objective function with respect to the weights of a multilayer stack of modules is nothing more than a practical application of the chain rule for derivatives.


    1. 根据评论区 @山丹丹@啸王 的提醒,更正了一些错误(用斜体显示),在此谢谢各位。并根据自己最近的理解,增添了一些东西(用斜体显示)。如果还有错误,欢迎大家指正。第一个问题:为什么引入非线性激励函数?如果不用激励函数(其实相当于激励函数是f(x) = x),在这种情况下你每一层输出都是上层输入的线性函数,很容易验证,无论你神经网络有多少层,输出都是输入的线性组合,与没有隐藏层效果相当,这种情况就是最原始的感知机(Perceptron)了。正因为上面的原因,我们决定引入非线性函数作为激励函数,这样深层神经网络就有意义了(不再是输入的线性组合,可以逼近任意函数)。最早的想法是sigmoid函数或者tanh函数,输出有界,很容易充当下一层输入(以及一些人的生物解释balabala)。第二个问题:为什么引入Relu呢?第一,采用sigmoid等函数,算激活函数时(指数运算),计算量大,反向传播求误差梯度时,求导涉及除法,计算量相对大,而采用Relu激活函数,整个过程的计算量节省很多。第二,对于深层网络,sigmoid函数反向传播时,很容易就会出现梯度消失的情况(在sigmoid接近饱和区时,变换太缓慢,导数趋于0,这种情况会造成信息丢失,参见 @Haofeng Li 答案的第三点),从而无法完成深层网络的训练。第三,Relu会使一部分神经元的输出为0,这样就造成了网络的稀疏性,并且减少了参数的相互依存关系,缓解了过拟合问题的发生(以及一些人的生物解释balabala)。当然现在也有一些对relu的改进,比如prelu,random relu等,在不同的数据集上会有一些训练速度上或者准确率上的改进,具体的大家可以找相关的paper看。多加一句,现在主流的做法,会在做完relu之后,加一步batch normalization,尽可能保证每一层网络的输入具有相同的分布[1]。而最新的paper[2],他们在加入bypass connection之后,发现改变batch normalization的位置会有更好的效果。大家有兴趣可以看下。


    1. Unsupervised Learning of 3D Structure from Images Authors: Danilo Jimenez Rezende, S. M. Ali Eslami, Shakir Mohamed, Peter Battaglia, Max Jaderberg, Nicolas Heess (Submitted on 3 Jul 2016) Abstract: A key goal of computer vision is to recover the underlying 3D structure from 2D observations of the world. In this paper we learn strong deep generative models of 3D structures, and recover these structures from 3D and 2D images via probabilistic inference. We demonstrate high-quality samples and report log-likelihoods on several datasets, including ShapeNet [2], and establish the first benchmarks in the literature. We also show how these models and their inference networks can be trained end-to-end from 2D images. This demonstrates for the first time the feasibility of learning to infer 3D representations of the world in a purely unsupervised manner.

      The 3D representation of a 2D image is ambiguous and multi-modal. We achieve such reasoning by learning a generative model of 3D structures, and recover this structure from 2D images via probabilistic inference.

    1. When building a unified vision system or gradually adding new capabilities to a system, the usual assumption is that training data for all tasks is always available. However, as the number of tasks grows, storing and retraining on such data becomes infeasible. A new problem arises where we add new capabilities to a Convolutional Neural Network (CNN), but the training data for its existing capabilities are unavailable. We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities. Our method performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques and performs similarly to multitask learning that uses original task data we assume unavailable. A more surprising observation is that Learning without Forgetting may be able to replace fine-tuning as standard practice for improved new task performance.

      Learning w/o Forgetting: distilled transfer learning

  22. Jun 2016
  23. Apr 2016
    1. We should have control of the algorithms and data that guide our experiences online, and increasingly offline. Under our guidance, they can be powerful personal assistants.

      Big business has been very militant about protecting their "intellectual property". Yet they regard every detail of our personal lives as theirs to collect and sell at whim. What a bunch of little darlings they are.

  24. Dec 2015
    1. OpenAI is a non-profit artificial intelligence research company. Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return.