73 Matching Annotations
  1. Oct 2020
  2. Sep 2020
    1. Bavadekar, Shailesh, Andrew Dai, John Davis, Damien Desfontaines, Ilya Eckstein, Katie Everett, Alex Fabrikant, et al. ‘Google COVID-19 Search Trends Symptoms Dataset: Anonymization Process Description (Version 1.0)’. ArXiv:2009.01265 [Cs], 2 September 2020. http://arxiv.org/abs/2009.01265.

  3. Jul 2020
  4. Jun 2020
  5. May 2020
  6. Apr 2020
    1. Salganik, M. J., Lundberg, I., Kindel, A. T., Ahearn, C. E., Al-Ghoneim, K., Almaatouq, A., Altschul, D. M., Brand, J. E., Carnegie, N. B., Compton, R. J., Datta, D., Davidson, T., Filippova, A., Gilroy, C., Goode, B. J., Jahani, E., Kashyap, R., Kirchner, A., McKay, S., … McLanahan, S. (2020). Measuring the predictability of life outcomes with a scientific mass collaboration. Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.1915006117

  7. Mar 2020
    1. ll datasets were supplied by Suther-land in the Supporting Information as 3D geometriesaligned according to the original literature, namely byflexible alignment on one or more templates obtained bycrystallographic enzyme-inhibitor complexes
    2. eight comprehensive datasets

      what are the datasets look like? this may help to understand the application domain of this tool.

    Tags

    Annotators

  8. Feb 2019
    1. Impact of Fully Connected Layers on Performance of Convolutional Neural Networks for Image Classification

      作者总结说:1)CNN 层越少,FC 层里的node 就要越多才行。相反 CNN 越深,FC node 少就够了;2)浅的 CNN 除了需要更多 FC node 外,数据集 class 类目数越多,FC 层应该越多越好,反之亦然;3)对于单个 class 内样本越多的数据集,网络越深越好,但若 class 类目数很多,浅的网络表现会更好。

    2. Do we train on test data? Purging CIFAR of near-duplicates

      作者玩了把 CIFAR 测试数据集,认为有些样本作为 test 会与 train 样本太相近而过拟合的问题,于是就自己替换了疑似问题样本提出了新 test 数据集,最后拿那些著名模型实验后,庆幸说貌似它们没有过拟合而被错误评估模型优劣~(有点打脸的感觉~)

    3. Semantic Redundancies in Image-Classification Datasets: The 10% You Don't Need

      深度神经网络版的“特征工程”技术~ [doge]

    4. Deep Learning on Small Datasets without Pre-Training using Cosine Loss

      在当代深度学习中,有两件事似乎无可争议:

      1. softmax激活后的分类交叉熵损失是分类的首选方法;
      2. 在小型数据集上从零开始训练CNN分类器效果不佳。在本文中作者证明,当处理小数据样本类时余弦损失函数比交叉上能够提供更好的性能。
  9. Jan 2019
    1. Fitting A Mixture Distribution to Data: Tutorial

      目测是一篇很有爱的教程!

    2. Optimization Models for Machine Learning: A Survey

      感觉此文于我而言真正有价值的恐怕只有文末附录的 Dataset tables 汇总整理了。。。。。

  10. Dec 2018
    1. Are All Training Examples Created Equal? An Empirical Study

      从此paper了解到了叫 Active learning 的有趣概念,这似乎和自己设计的连续参数训练数据采样池很接近。。。。

      这篇文章的主要工作是给出了一个在图像分类中关于训练样本重要性的研究,对于样本的重要度采用基于梯度的方法进行度量。文章的结论可能表明在深度学习中主动学习或许并不总是有效的。

    2. Image Score: How to Select Useful Samples

      提出的 semi-supervised learning 这个概念比较有趣。给数据集每个 sample 打分或许对 interpretability 有点帮助吧。。。。

  11. Nov 2018
    1. Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift

      该文做的实验是探索对数据集进行 shifts (某种可控的扰动) 后的模型表现,提出了classifier-based的方法/pipeline 来观察和评价:

      这对于我的引力波数据研究来说,可以借鉴其数据的 shift 方法以及评价机制 (two-sample tests)。

    2. Training neural audio classifiers with few data

      这是一个比较初步的简单实验。

      图像结论其实并不意外:数据量越多当然表现越好;迁移学习在极小量数据上表现良好;Prototypical 模型可能因结构的特异性会表现出一定程度上的优势;数据量越小,过拟合问题越严重。。。

  12. Sep 2016
    1. UK Biobank

      Large UK dataset containing extensive phenotypic, genotypic, and neuroimaging data.

      License: Unclear, but restrictive. Access: Human, ? Needs data use agreement: Yes Needs institutional signature for access: No (?)

    1. View Data Sets

      Public fMRI dataset repository.

      • License: PDDL v.1.0
      • Access: Human, s3 Needs data use agreement: No Needs institutional signature for access: No
    1. Brain Genomics Superstruct Project (GSP)

      License: Data use agreement Access: Human, API Needs data use agreement: Yes Needs institutional signature for access: No

    1. What is studyforrest?

      Rich multimodal dataset on naturalistic stimuli

      • License: PDDL v.10
      • Access: Human, rsync, git annex
      • Needs data use agreement: No
      • Needs institutional signature for access: No
      • License: PDDL v.10
      • Access: Human, s3, openfmri
      • Needs data use agreement: No
      • Needs institutional signature for access: No
  13. May 2016
  14. Aug 2015
    1. the definition of a “dataset,”

      this is interesting, and will be interesting to track within and across disciplines