18 Matching Annotations
  1. Aug 2019
  2. Jun 2019
    1. MelNet: A Generative Model for Audio in the Frequency Domain

      本文的主要贡献如下:

      • 提出了 MelNet。一个语谱图的生成模型,它结合了细粒度的自回归模型和多尺度生成过程,能够同时捕获局部和全局的结构。

      • 展示了 MelNet 在长程依赖性上卓越的性能。

      • 展示了 MelNet 在多种音频生成任务上优秀的能力:无条件语音生成任务、音乐生成任务、文字转语音合成任务。而且在这些任务上,MelNet 都是端到端的实现。

  3. May 2019
    1. This is spot on. An idea on its own does nothing. Execution and actually doing the hard work are the most important thing in any creative endeavour.

      This blog is very good, high signal and low noise. The dense version of this idea that has stuck with me is that the thing we're aiming for (productivity, make-world-better-stuff, doing good) is a multiplicative-product of both hustle (physical work, pressing buttons, saying words that other people hear) and the thinking part. That is, long term goal completion is hustle (doing stuff) * thought (knowing what to do)

      I may technically disagree with the "most important thing" part, but it needs some sort of strong emphasis. Hustle modifies ideas in a times-ish (multiplying) way, so if you've got zero hustle, you don't really have anything

      One way to do world-bettering is to just have enough hustle to outsource the hustle (get other people to act on your ideas), or alternately if you have tons of hustle, then you can take good ideas which aren't going anywhere.

      Knowing the difference between bad and good ideas is one of the core problems with the super-connected society/net we're in. The solution to the problem is too large for this margin.

  4. Mar 2019
  5. Feb 2019
    1. Unsupervised speech representation learning using WaveNet autoencoders

      我们通过将自动编码神经网络应用于语音波形来考虑无监督提取有意义的语音潜在表示的任务。目标是学习能够从信号中捕获高级语义内容的表示,例如,音素身份,同时不会混淆信号中的低级细节,例如底层音高轮廓或背景噪音。自动编码器模型的行为取决于应用于潜在表示的约束类型。我们比较了三种变体:简单的降维瓶颈,高斯变分自动编码器(VAE)和离散矢量量化VAE(VQ-VAE)。我们根据说话人的独立性,预测语音内容的能力以及精确重建单个谱图帧的能力来分析学习表征的质量。此外,对于使用VQ-VAE提取的差异编码,我们测量将它们映射到电话的容易程度。我们引入了一种正则化方案,该方案强制表示集中于话语的语音内容,并报告性能与ZeroSpeech 2017无监督声学单元发现任务中的顶级条目相当。 【translated by 谷歌翻译】


      【摘要自机器之心】:

      论文《Unsupervised speech representation learning using WaveNet autoencoders》介绍了通过将自编码神经网络用到语音波形提取语音中有意义的隐藏表征的无监督任务。目的是学习到一种能够捕捉信号中高层次语义内容的表征,同时又能够对有背景噪声或者潜在基频曲线(underlying pitch contour)的信号中的扰乱信息足够稳定。自编码器模型的行为由应用到隐藏表征的约束所决定。在此论文中,作者对比了三种变体:简单降维瓶颈、高斯变分自编码器和离散向量量化VAE。而后,作者对预测语音内容的能力等进行了分析。

  6. Dec 2018
  7. Nov 2018
    1. Stochastic Adaptive Neural Architecture Search for Keyword Spotting

      一篇讲 identifying keywords in a real-time audio stream 的 paper。这和引力波探测中的数据处理很接近哦~!此文提出 end-end 的“随机自适应神经构架搜寻” (SANAS) 实现高效准确的训练效果。这显然对 real-time 特点的类型数据应用带来启发。FYI:人家源码还开放了。。。

    2. WaveGlow: A Flow-based Generative Network for Speech Synthesis

      一篇来自 NVIDIA 的小文。提出的实时生成网络 WaveGlow 结合了 Glow 和 WaveNet 的特点,实现了更快速高效准确的语音合成。

    3. Model Selection Techniques -- An Overview

      一篇关于模型选择的综述文章。涉及信号处理,图像处理等等多方面数据信息的处理。发表在信号处理的期刊杂志上。

      文中关于模型选择的大概念方向,和数学表示,是值得好好阅读的。

  8. Aug 2018
    1. Our results demonstrate the promise of spatio-temporalfiltering techniques for“tuning”measurement of hazard-related rumoring to enableobservation of rumoring at scales that have long been infeasible.

      Result: Claims spatio-temporal filtering technique to better capture signal from noise of large crisis data sets.

    2. While theapplication in the online environment is novel, the general problem is not. Interest in automated signal processing in noisyenvironments (Fawcett and Provost,1999; Hamid et al., 2005; Macleod and Congalton,1998; Ribeiro et al., 2012; Singh,1989;Stauffer and Grimson, 2000) predates the proliferation of user-generated online activity, and we can apply the lessons learnedin those contexts to the online context.

      Processing high signal-to-noise ratio communications is a long-standing problem.

  9. Mar 2017
  10. Jun 2016
  11. Mar 2016