- Feb 2021
-
twitter.com twitter.com
-
ReconfigBehSci on Twitter: ‘the SciBeh initiative is about bringing knowledge to policy makers and the general public, but I have to say this advert I just came across worries me: Where are the preceding data integrity and data analysis classes? Https://t.co/5LwkC1SVyF’ / Twitter. (n.d.). Retrieved 18 February 2021, from https://twitter.com/SciBeh/status/1362344945697308674
-
- Dec 2020
-
saveriomiroddi.github.io saveriomiroddi.github.io
-
Databases If databases data is stored on a ZFS filesystem, it’s better to create a separate dataset with several tweaks: zfs create -o recordsize=8K -o primarycache=metadata -o logbias=throughput -o mountpoint=/path/to/db_data rpool/db_data recordsize: match the typical RDBMSs page size (8 KiB) primarycache: disable ZFS data caching, as RDBMSs have their own logbias: essentially, disabled log-based writes, relying on the RDBMSs’ integrity measures (see detailed Oracle post)
-
- Oct 2020
-
ourworldindata.org ourworldindata.org
-
docs.google.com docs.google.com
-
publications clinical trials datasets
-
-
www.kaggle.com www.kaggle.com
-
github.com github.com
-
storymaps.arcgis.com storymaps.arcgis.com
-
nextstrain.org nextstrain.org
-
www.arcgis.com www.arcgis.com
-
441187 total confirmed cases 111933 recovered 19784 deadhs
-
- Sep 2020
-
github.com github.com
-
I forgot to mention in the original issue way back that I have a lot of data. Like 1 to 3 MB that is being passed around via export let foo.
-
-
arxiv.org arxiv.org
-
Bavadekar, Shailesh, Andrew Dai, John Davis, Damien Desfontaines, Ilya Eckstein, Katie Everett, Alex Fabrikant, et al. ‘Google COVID-19 Search Trends Symptoms Dataset: Anonymization Process Description (Version 1.0)’. ArXiv:2009.01265 [Cs], 2 September 2020. http://arxiv.org/abs/2009.01265.
-
- Jul 2020
-
osf.io osf.io
-
Morgan, L., Protopopova, A., Birkler, R. I. D., Itin-Shwartz, B., Sutton, G. A., gamliel, alexandra, Yakobson, B., & Raz, T. (2020). Human-dog relationships during COVID-19 pandemic; booming dog adoption during social isolation [Preprint]. SocArXiv. https://doi.org/10.31235/osf.io/s9k4y
-
-
psyarxiv.com psyarxiv.com
-
Schelhorn, I., Ecker, A., Bereznai, J., Tran, T., Rehm, S., Lugo, R., Sütterlin, S., Kinateder, M., & Shiban, Y. (2020). Depression symptoms during the COVID-19 pandemic in different regions in Germany. [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/p9wz8
-
- Jun 2020
-
www.youtube.com www.youtube.com
-
EU Datathon 2020—Webinar on COVID-19 and media and data monitoring. (2020, April 22). https://www.youtube.com/watch?v=wyNgmEfi_vk&feature=youtu.be
-
-
www.youtube.com www.youtube.com
-
EU Datathon 2020—Webinar on COVID-19 and media and data monitoring. (2020, April 22). https://www.youtube.com/watch?v=wyNgmEfi_vk&feature=youtu.be
Tags
Annotators
URL
-
-
www.youtube.com www.youtube.com
-
EU Datathon 2020—Webinar dedicated to COVID-19 data. (2020, April 9). https://www.youtube.com/watch?v=JIy6NO7QRQM&list=PLT5rARDev_rlAZ21iedz0ynnN4Na3UIoW&index=14&t=270s
Tags
Annotators
URL
-
-
-
Cheng, C., Barceló, J., Hartnett, A. S., Kubinec, R., & Messerschmidt, L. (2020). COVID-19 Government Response Event Dataset (CoronaNet v.1.0). Nature Human Behaviour, 1–13. https://doi.org/10.1038/s41562-020-0909-7
-
-
eml.berkeley.edu eml.berkeley.edu
-
DellaVigna, S & Linos E. (2020). RCTs to scale: Comprehensive evidence from two nudge units. UC Berkeley. https://eml.berkeley.edu/~sdellavi/wp/NudgeToScale2020-03-20.pdf
-
-
psyarxiv.com psyarxiv.com
-
Yamada, Y., Ćepulić, D.-B., Coll-Martín, T., Debove, S., Gautreau, G., Han, H., Rasmussen, J., Tran, T. P., Travaglino, G. A., & Lieberoth, A. (2020). COVIDiSTRESS Global Survey dataset on psychological and behavioural consequences of the COVID-19 outbreak [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/v7cep
-
- May 2020
-
docs.google.com docs.google.com
-
www.ukcdr.org.uk www.ukcdr.org.uk
-
UKCDR - COVID-19 Research Project Tracker
-
-
ai.googleblog.com ai.googleblog.com
-
Tsitsulin, A. & Perozzi B. Understanding the Shape of Large-Scale Data. (2020 May 05). Google AI Blog. http://ai.googleblog.com/2020/05/understanding-shape-of-large-scale-data.html
-
-
www.kaggle.com www.kaggle.com
-
COVID-19 Open Research Dataset Challenge (CORD-19). (n.d.). Retrieved May 6, 2020, from https://kaggle.com/allen-institute-for-ai/CORD-19-research-challenge
-
-
leoferres.info leoferres.info
-
Ferres, L. (2020 April 10). COVID19 mobility reports. Leo's Blog. https://leoferres.info/blog/2020/04/10/covid19-mobility-reports/
-
-
coviz.apps.allenai.org coviz.apps.allenai.orgAbout1
-
About. (n.d.). Retrieved May 6, 2020, from https://coviz.apps.allenai.org/
-
-
epjdatascience.springeropen.com epjdatascience.springeropen.com
-
Vilella, S., Paolotti, D., Ruffo, G. et al. News and the city: understanding online press consumption patterns through mobile data. EPJ Data Sci. 9, 10 (2020). https://doi.org/10.1140/epjds/s13688-020-00228-9
-
- Apr 2020
-
rajpurkar.github.io rajpurkar.github.io
-
-
-
Killeen, B.D., et al. (2020, April 1). A country-level dataset for informing the United States' response to COVID-19. Cornel University. arXiv:2004.00756.
-
-
www.ofcom.org.uk www.ofcom.org.uk
-
Ofcom. (2020 April 09). Covid-19 news and information: consumption and attitudes. https://www.ofcom.org.uk/research-and-data/tv-radio-and-on-demand/news-media/coronavirus-news-consumption-attitudes-behaviour
Tags
- lang:en
- COVID-19
- survey
- consumption
- BARB
- news
- access
- interactive
- comScore
- response
- attitude
- misinformation
- is:webpage
- dataset
- information
Annotators
URL
-
-
www.pnas.org www.pnas.org
-
Salganik, M. J., Lundberg, I., Kindel, A. T., Ahearn, C. E., Al-Ghoneim, K., Almaatouq, A., Altschul, D. M., Brand, J. E., Carnegie, N. B., Compton, R. J., Datta, D., Davidson, T., Filippova, A., Gilroy, C., Goode, B. J., Jahani, E., Kashyap, R., Kirchner, A., McKay, S., … McLanahan, S. (2020). Measuring the predictability of life outcomes with a scientific mass collaboration. Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.1915006117
-
-
trello.com trello.com
-
Collective Intelligence and COVID-19 | Trello. (n.d.). Retrieved April 20, 2020, from https://trello.com/b/STdgEhvX/collective-intelligence-and-covid-19
-
-
arxiv.org arxiv.org
-
Alam, F., Sajjad, H., Imran, M., & Ofli, F. (2020). Standardizing and Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing. ArXiv:2004.06774 [Cs]. http://arxiv.org/abs/2004.06774
-
-
github.com github.com
-
experience.arcgis.com experience.arcgis.com
- Mar 2020
-
Local file Local file
-
ll datasets were supplied by Suther-land in the Supporting Information as 3D geometriesaligned according to the original literature, namely byflexible alignment on one or more templates obtained bycrystallographic enzyme-inhibitor complexes
-
eight comprehensive datasets
what are the datasets look like? this may help to understand the application domain of this tool.
-
-
ourworldindata.org ourworldindata.org
-
favorito,data_science
-
-
multimedia.scmp.com multimedia.scmp.com
-
unidad_COVID2019,favorita
-
-
www.visualcapitalist.com www.visualcapitalist.com
-
favorito,hermoso
-
-
coronavirus.thebaselab.com coronavirus.thebaselab.com
Tags
Annotators
URL
-
-
www.apprise.org.au www.apprise.org.au
-
www.gov.uk www.gov.uk
-
github.com github.com
-
unidad_COVID2019
-
-
coronavirus.jhu.edu coronavirus.jhu.edu
-
unidad_COVID2019
-
-
www.worldometers.info www.worldometers.info
-
bnonews.com bnonews.com
-
linea_tiempo
-
-
covid2019.app covid2019.app
-
acceso_abierto
Tags
Annotators
URL
-
-
www.consulta.mx www.consulta.mx
-
unidad_COVID2019,encuesta
-
-
coronavirus-disasterresponse.hub.arcgis.com coronavirus-disasterresponse.hub.arcgis.com
-
unidad_COVID2019,imprescindible
-
-
www.kff.org www.kff.org
- Feb 2019
-
iphysresearch.github.io iphysresearch.github.io
-
Impact of Fully Connected Layers on Performance of Convolutional Neural Networks for Image Classification
作者总结说:1)CNN 层越少,FC 层里的node 就要越多才行。相反 CNN 越深,FC node 少就够了;2)浅的 CNN 除了需要更多 FC node 外,数据集 class 类目数越多,FC 层应该越多越好,反之亦然;3)对于单个 class 内样本越多的数据集,网络越深越好,但若 class 类目数很多,浅的网络表现会更好。
-
Do we train on test data? Purging CIFAR of near-duplicates
作者玩了把 CIFAR 测试数据集,认为有些样本作为 test 会与 train 样本太相近而过拟合的问题,于是就自己替换了疑似问题样本提出了新 test 数据集,最后拿那些著名模型实验后,庆幸说貌似它们没有过拟合而被错误评估模型优劣~(有点打脸的感觉~)
-
Semantic Redundancies in Image-Classification Datasets: The 10% You Don't Need
深度神经网络版的“特征工程”技术~ [doge]
-
Deep Learning on Small Datasets without Pre-Training using Cosine Loss
在当代深度学习中,有两件事似乎无可争议:
- softmax激活后的分类交叉熵损失是分类的首选方法;
- 在小型数据集上从零开始训练CNN分类器效果不佳。在本文中作者证明,当处理小数据样本类时余弦损失函数比交叉上能够提供更好的性能。
-
-
towardsdatascience.com towardsdatascience.com
-
Top Sources For Machine Learning Datasets
-
- Jan 2019
-
iphysresearch.github.io iphysresearch.github.io
-
Fitting A Mixture Distribution to Data: Tutorial
目测是一篇很有爱的教程!
-
Optimization Models for Machine Learning: A Survey
感觉此文于我而言真正有价值的恐怕只有文末附录的 Dataset tables 汇总整理了。。。。。
-
- Dec 2018
-
iphysresearch.github.io iphysresearch.github.io
-
Are All Training Examples Created Equal? An Empirical Study
从此paper了解到了叫 Active learning 的有趣概念,这似乎和自己设计的连续参数训练数据采样池很接近。。。。
这篇文章的主要工作是给出了一个在图像分类中关于训练样本重要性的研究,对于样本的重要度采用基于梯度的方法进行度量。文章的结论可能表明在深度学习中主动学习或许并不总是有效的。
-
Image Score: How to Select Useful Samples
提出的 semi-supervised learning 这个概念比较有趣。给数据集每个 sample 打分或许对 interpretability 有点帮助吧。。。。
-
- Nov 2018
-
iphysresearch.github.io iphysresearch.github.io
-
Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift
该文做的实验是探索对数据集进行 shifts (某种可控的扰动) 后的模型表现,提出了classifier-based的方法/pipeline 来观察和评价:
这对于我的引力波数据研究来说,可以借鉴其数据的 shift 方法以及评价机制 (two-sample tests)。
-
Training neural audio classifiers with few data
这是一个比较初步的简单实验。
图像结论其实并不意外:数据量越多当然表现越好;迁移学习在极小量数据上表现良好;Prototypical 模型可能因结构的特异性会表现出一定程度上的优势;数据量越小,过拟合问题越严重。。。
-
- Sep 2016
-
www.ukbiobank.ac.uk www.ukbiobank.ac.uk
-
UK Biobank
Large UK dataset containing extensive phenotypic, genotypic, and neuroimaging data.
License: Unclear, but restrictive. Access: Human, ? Needs data use agreement: Yes Needs institutional signature for access: No (?)
Tags
Annotators
URL
-
-
openfmri.org openfmri.orgOpenfMRI1
-
View Data Sets
Public fMRI dataset repository.
- License: PDDL v.1.0
- Access: Human, s3 Needs data use agreement: No Needs institutional signature for access: No
-
-
dataverse.harvard.edu dataverse.harvard.edu
-
Brain Genomics Superstruct Project (GSP)
License: Data use agreement Access: Human, API Needs data use agreement: Yes Needs institutional signature for access: No
Tags
Annotators
URL
-
-
studyforrest.org studyforrest.org
-
What is studyforrest?
Rich multimodal dataset on naturalistic stimuli
- License: PDDL v.10
- Access: Human, rsync, git annex
- Needs data use agreement: No
- Needs institutional signature for access: No
-
-
myconnectome.org myconnectome.org
-
- License: PDDL v.10
- Access: Human, s3, openfmri
- Needs data use agreement: No
- Needs institutional signature for access: No
Tags
Annotators
URL
-
- May 2016
-
www.jstage.jst.go.jp www.jstage.jst.go.jp
-
Bird song data set
-
- Aug 2015
-
europepmc.org europepmc.org
-
the definition of a “dataset,”
this is interesting, and will be interesting to track within and across disciplines
Tags
Annotators
URL
-