36 Matching Annotations
  1. Dec 2023
  2. Jun 2021
  3. Sep 2020
    1. Please focus on explaining the motivation so that if this RFC is not accepted, the motivation could be used to develop alternative solutions. In other words, enumerate the constraints you are trying to solve without coupling them too closely to the solution you have in mind.
    2. A huge part of the value on an RFC is defining the problem clearly, collecting use cases, showing how others have solved a problem, etc.
    3. An RFC can provide tremendous value without the design described in it being accepted.
  4. Jun 2019
  5. Mar 2019
    1. The goal here is explicitly not to improve the state of the art in the narrow domain of restaurantbooking, but to take a narrow domain where traditional handcrafted dialog systems are known toperform well, and use that to gauge the strengths and weaknesses of current end-to-end systemswith no domain knowledge

      本文的目标不是来提升在狭窄的酒店预定领域的效果,而是用一个传统的手工系统就有较好系统来对比没有领域知识的end-to-end系统的优劣。

      MEMORYNETWORKS

    2. LEARNING END-TO-END GOAL-ORIENTED DIALOG

    1. A Network-based End-to-End Trainable Task-oriented Dialogue System

      这个end-to-end的系统,在意图识别的阶段用的是cnn+LSTM 在状态管理(belief state tracking)也用的LSTM,在policy的时候自定义了一套算法,将前面的几个输出向量做了个线性模型,输出。

    1. Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems

      一个混合学习过程,在人类的指导教育和反馈下增强强化学习的过程

    1. Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling

      用一个模型来解决两个不同类型的问题,intent detect是分类,填槽是序列标注。都用基于attention机制的RNN来搞定了

    1. The Sogou Spoken Language Understanding System for the NLPCC 2018 Evaluation

    2. The first step is lexical analysis, i.e. word segmentation and part-of-speech (POS)tagging. The words and POS labels are used as features in the subsequent models. Forthe shared task we used HanLP [1] as our Chinese lexical analyzer.

      SLU 模型做法:

      • 1 第一步是词汇分析,也就是分词,然后词性标注。本文用的是HanLP做词性分析。

      • 2 第二步是槽位边界检测。这个任务看成一个用BILOU进行序列标注的。我们用了基于字和词的序列标注。基于字的 版本是用一个window为7的CRF,用此法特征和词典特征,另外基于词的的CRF模型是window size为5的词法特征,词性特征和词典特征。词典特征是指“当前字词是否 prefix/infix/suffix 在实体词典中某个条目关系。”每个CRF输出n(3)个输出,这整个2n个输出用到下一步。用基于字的序列标注是为了弥补分词效果差带来的可能影响。

      • 3 第三部是槽位类型识别。用的是LR+L正则分类器,预测出的slot,上下文的字词,上下文的词性标注作为特征。

      • 4 第四步是槽位纠正。这个是为了解决因为ASR导致的错误识别造成的结果。用的是一个基于搜索的方法。鉴于已经有各种槽位类型的词典,如果一个预测出来的槽位s类型T没有在对应的槽位词典中,那么就用s作为查询词来在根据最小编辑距离来查询槽位词典中的记录。这个操作会进行两次,一个是s作为中文字符,另一个是s作为拼音来查询。最好的结果是从这两个查处的结果中重新排序后得到的。

      • 5 最后一步是意图分类。用的是XGBoost及其默认参数。用到的特征是单词token,query length,以及前面步骤预测出来的槽位。

    1. TASK-ORIENTED DIALOGUESYSTEMSTask-oriented dialogue systems have been an important branchof spoken dialogue systems. In this section, we will reviewpipeline and end-to-end methods for task-oriented dialoguesystems.

      任务型对话系统整体来说可以分为两类:

      • 1 pipeline,也就是包含SLU+DST+PL+NLG
      • 2 end-to-end
  6. Feb 2019
    1. Spoken language understanding (SLU) comprises two tasks, intent identification andslot filling. That is, given the current query along with the previous queries in the samesession, an SLU system predicts the intent of the current query and also all slots (entitiesor labels) associated with the predicted intent. The significance of SLU lies in that eachtype of intent corresponds to a particular service API and the slots correspond to theparameters required by this API. SLU helps the dialog system to decide how to satisfythe user’s need by calling the right service with the right information

      SLU有俩事,意图识别+填槽。

      实践中的困难:

      • 1 意图分类的复杂性
      • 2 世界知识
      • 3 用户状态
    1. PyDial: A Multi-domain Statistical Dialogue System Toolkit

      一个开源的端到端的统计对话系统工具。

      其总的架构包含Sematic Decode,Belief Tracker,Policy Reply System,Language generator. 整体来说整个系统都支持了基于规则的判断过程,也融合了模型的支持。源码值得一看的。

    1. DocChat: An Information Retrieval Approach for Chatbot EnginesUsing Unstructured Documents

      用BM25来获取备选项。

      构建了word-level,phrase-level,sentence-level,document-level,relation-level,type-levelandtopic-level的特征来训练排序模型

      最有用的是sentence level的特征。

  7. May 2018