139 Matching Annotations
  1. Mar 2019
    1. A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification

  2. arxiv.org arxiv.org
    1. To the best of our knowl-edge, there has not been any other work exploringthe use of attention-based architectures for NMT


  3. Feb 2019
    1. 问答系统冠军之路:用 CNN 做问答任务的 QANet


      QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension

    1. To overcome this issue, weexplore data generation using templates and terminologies and data augmentationapproaches. Namely, we report our experiments using paraphrasing and wordrepresentations learned on a large EHR corpus with Fasttext and ELMo, to learn aNLU model without any available dataset. We evaluate on a NLU task of naturallanguage queries in EHRs divided in slot-filling and intent classification sub-tasks.On the slot-filling task, we obtain a F-score of 0.76 with the ELMo representation;and on the classification task, a mean F-score of 0.71. Our results show that thismethod could be used to develop a baseline system



  4. www.iro.umontreal.ca www.iro.umontreal.ca
    1. bi-directional Jordan-type network that takes into account both past and future dependencies among slots works best

      双向的 Jordan-type网络对槽最好用

    2. Using Recurrent Neural Networksfor Slot Filling in Spoken Language Understanding

    1. BERT: Pre-training of Deep Bidirectional Transformers forLanguage Understanding

    1. 10 Exciting Ideas of 2018 in NLP

      2018 年NLP 10大奇观

      • 1 无监督机器翻译
      • 2 预训练的语言模型
      • 3 常识推理数据集
      • 4 meta-learning
      • 5 健壮的无监督方法
      • 6 真正理解表示学习中的表示,预训练的语言模型真的表现出了和图像模型类似的功能
      • 7 多任务学习的成功应用
      • 8 半监督和迁移学习的结合
      • 9 基于大量文本的QA和推理数据集
      • 10 归纳偏差
    2. Phrase-Based & Neural Unsupervised Machine Translation (EMNLP 2018):  T
  5. Jan 2019
  6. Dec 2018
    1. A semantic treebank is a collection of natural language sentences annotated with a meaning representation. These resources use a formal representation of each sentence's semantic structure.
  7. Nov 2018
    1. Language GANs Falling Short

      此文的一个 Paper Summary 写的特别棒!

      This paper’s high-level goal is to evaluate how well GAN-type structures for generating text are performing, compared to more traditional maximum likelihood methods. In the process, it zooms into the ways that the current set of metrics for comparing text generation fail to give a well-rounded picture of how models are performing.

      In the old paradigm, of maximum likelihood estimation, models were both trained and evaluated on a maximizing the likelihood of each word, given the prior words in a sequence. That is, models were good when they assigned high probability to true tokens, conditioned on past tokens. However, GANs work in a fundamentally new framework, in that they aren’t trained to increase the likelihood of the next (ground truth) word in a sequence, but to generate a word that will make a discriminator more likely to see the sentence as realistic. Since GANs don’t directly model the probability of token t, given prior tokens, you can’t evaluate them using this maximum likelihood framework.

      This paper surveys a range of prior work that has evaluated GANs and MLE models on two broad categories of metrics, occasionally showing GANs to perform better on one or the other, but not really giving a way to trade off between the two.

      • The first type of metric, shorthanded as “quality”, measures how aligned the generated text is with some reference corpus of text: to what extent your generated text seems to “come from the same distribution” as the original. BLEU, a heuristic frequently used in translation, and also leveraged here, measures how frequently certain sets of n-grams occur in the reference text, relative to the generated text. N typically goes up to 4, and so in addition to comparing the distributions of single tokens in the reference and generated, BLEU also compares shared bigrams, trigrams, and quadgrams (?) to measure more precise similarity of text.

      • The second metric, shorthanded as “diversity” measures how different generated sentences are from one another. If you want to design a model to generate text, you presumably want it to be able to generate a diverse range of text - in probability terms, you want to fully sample from the distribution, rather than just taking the expected or mean value. Linguistically, this would be show up as a generator that just generates the same sentence over and over again. This sentence can be highly representative of the original text, but lacks diversity. One metric used for this is the same kind of BLEU score, but for each generated sentence against a corpus of prior generated sentences, and, here, the goal is for the overlap to be as low as possible

      The trouble with these two metrics is that, in their raw state, they’re pretty incommensurable, and hard to trade off against one another.

      更多需要阅读 Paper Summary 了。。。。