  1. Jun 2021
  2. May 2021
    1. To investigate these hypotheses, I created an election-year-country dataset covering the period from the early 1990s to the present for all post- communist democracies.7 The dataset is structured as a quasi-time series of 93 parliamentary elections in 17 countries from 1991 to 2012, and the depen-dent variable is the natural log of the radical right party’s combined vote share in elections held at time t.

      this is the data, her explanation of the dataset she created



  3. Apr 2021
    1. approaches to understanding and action, and to challenge unquestioned patterns of response to the crises of the times. It helps clarify the conceptual challenge of interrelating and disparate, or even contradictory information, within our complex societies
      • [[knowledge]]
    1. associated with the early development of the international futures research movement


    1. tool for studies and research in all subjects of civil society activities, including Political Science, Law, International Studies, International Relations, Sociology, Demography and Peace Studies.


      • where to get?
    1. record and map the relationships between any strategies and solutions that humanity actually or potentially uses, in the hopes that a better overall understanding of which would greatly enhance our ability to formulate effective strategies to global problems.


    1. interdependence demonstrated among world problems in every sector, emphasis is placed on the need for approaches which are sufficiently complex to encompass the factions, conflicts and rival worldviews that undermine collective initiative towards a promising future.


  4. Mar 2021
    1. 14 of which were sampled at multiple timepoints
    2. RNA sequencing on samples from 46 individuals with PCR-positive, symptomatic SARS-CoV-2 infection
    3. 77 peripheral blood samples across 46 subjects with COVID-19 and compared them to subjects with seasonal coronavirus, influenza, bacterial pneumonia, and healthy controls.
    4. seasonal coronavirus (n=59)
    5. divided based on disease severity and time from symptom onset
    6. elucidate novel aspects of the host response to SARS-CoV-2
    7. influenza (n=17)
    8. bacterial pneumonia (n=20)
    9. healthy controls (n=19)
    1. elucidate key pathways in the host transcriptome of patients infected with SARS-CoV-2, we used RNA sequencing (RNA Seq) to analyze nasopharyngeal (NP) swab and whole blood (WB) samples from 333 COVID-19 patients and controls, including patients with other viral and bacterial infections.
    2. host response biosignature for COVID-19 from RNA profiling of nasal swabs and blood
  5. Dec 2020
    1. Databases If databases data is stored on a ZFS filesystem, it’s better to create a separate dataset with several tweaks: zfs create -o recordsize=8K -o primarycache=metadata -o logbias=throughput -o mountpoint=/path/to/db_data rpool/db_data recordsize: match the typical RDBMSs page size (8 KiB) primarycache: disable ZFS data caching, as RDBMSs have their own logbias: essentially, disabled log-based writes, relying on the RDBMSs’ integrity measures (see detailed Oracle post)
  6. Oct 2020
  7. Sep 2020
    1. Bavadekar, Shailesh, Andrew Dai, John Davis, Damien Desfontaines, Ilya Eckstein, Katie Everett, Alex Fabrikant, et al. ‘Google COVID-19 Search Trends Symptoms Dataset: Anonymization Process Description (Version 1.0)’. ArXiv:2009.01265 [Cs], 2 September 2020. http://arxiv.org/abs/2009.01265.

  8. Jul 2020
  9. Jun 2020
  10. May 2020
  11. Apr 2020
    1. Salganik, M. J., Lundberg, I., Kindel, A. T., Ahearn, C. E., Al-Ghoneim, K., Almaatouq, A., Altschul, D. M., Brand, J. E., Carnegie, N. B., Compton, R. J., Datta, D., Davidson, T., Filippova, A., Gilroy, C., Goode, B. J., Jahani, E., Kashyap, R., Kirchner, A., McKay, S., … McLanahan, S. (2020). Measuring the predictability of life outcomes with a scientific mass collaboration. Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.1915006117

  12. Mar 2020
    1. ll datasets were supplied by Suther-land in the Supporting Information as 3D geometriesaligned according to the original literature, namely byflexible alignment on one or more templates obtained bycrystallographic enzyme-inhibitor complexes
    2. eight comprehensive datasets

      what are the datasets look like? this may help to understand the application domain of this tool.



  13. Feb 2019
    1. Impact of Fully Connected Layers on Performance of Convolutional Neural Networks for Image Classification

      作者总结说:1)CNN 层越少,FC 层里的node 就要越多才行。相反 CNN 越深,FC node 少就够了;2)浅的 CNN 除了需要更多 FC node 外,数据集 class 类目数越多,FC 层应该越多越好,反之亦然;3)对于单个 class 内样本越多的数据集,网络越深越好,但若 class 类目数很多,浅的网络表现会更好。

    2. Do we train on test data? Purging CIFAR of near-duplicates

      作者玩了把 CIFAR 测试数据集,认为有些样本作为 test 会与 train 样本太相近而过拟合的问题,于是就自己替换了疑似问题样本提出了新 test 数据集,最后拿那些著名模型实验后,庆幸说貌似它们没有过拟合而被错误评估模型优劣~(有点打脸的感觉~)

    3. Semantic Redundancies in Image-Classification Datasets: The 10% You Don't Need

      深度神经网络版的“特征工程”技术~ [doge]

    4. Deep Learning on Small Datasets without Pre-Training using Cosine Loss


      1. softmax激活后的分类交叉熵损失是分类的首选方法;
      2. 在小型数据集上从零开始训练CNN分类器效果不佳。在本文中作者证明,当处理小数据样本类时余弦损失函数比交叉上能够提供更好的性能。
  14. Jan 2019
    1. Fitting A Mixture Distribution to Data: Tutorial


    2. Optimization Models for Machine Learning: A Survey

      感觉此文于我而言真正有价值的恐怕只有文末附录的 Dataset tables 汇总整理了。。。。。

  15. Dec 2018
    1. Are All Training Examples Created Equal? An Empirical Study

      从此paper了解到了叫 Active learning 的有趣概念,这似乎和自己设计的连续参数训练数据采样池很接近。。。。


    2. Image Score: How to Select Useful Samples

      提出的 semi-supervised learning 这个概念比较有趣。给数据集每个 sample 打分或许对 interpretability 有点帮助吧。。。。

  16. Nov 2018
    1. Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift

      该文做的实验是探索对数据集进行 shifts (某种可控的扰动) 后的模型表现,提出了classifier-based的方法/pipeline 来观察和评价:

      这对于我的引力波数据研究来说,可以借鉴其数据的 shift 方法以及评价机制 (two-sample tests)。

    2. Training neural audio classifiers with few data


      图像结论其实并不意外:数据量越多当然表现越好;迁移学习在极小量数据上表现良好;Prototypical 模型可能因结构的特异性会表现出一定程度上的优势;数据量越小,过拟合问题越严重。。。

  17. Sep 2016
    1. UK Biobank

      Large UK dataset containing extensive phenotypic, genotypic, and neuroimaging data.

      License: Unclear, but restrictive. Access: Human, ? Needs data use agreement: Yes Needs institutional signature for access: No (?)

    1. View Data Sets

      Public fMRI dataset repository.

      • License: PDDL v.1.0
      • Access: Human, s3 Needs data use agreement: No Needs institutional signature for access: No
    1. Brain Genomics Superstruct Project (GSP)

      License: Data use agreement Access: Human, API Needs data use agreement: Yes Needs institutional signature for access: No

    1. What is studyforrest?

      Rich multimodal dataset on naturalistic stimuli

      • License: PDDL v.10
      • Access: Human, rsync, git annex
      • Needs data use agreement: No
      • Needs institutional signature for access: No
      • License: PDDL v.10
      • Access: Human, s3, openfmri
      • Needs data use agreement: No
      • Needs institutional signature for access: No
  18. May 2016
  19. Aug 2015
    1. the definition of a “dataset,”

      this is interesting, and will be interesting to track within and across disciplines