SWE-chat is a living dataset; our collection pipeline automatically and continually discovers and processes sessions from public repositories
大多数人认为AI研究数据集是静态的、一次性的收集,但作者提出'活数据集'概念,强调数据需要持续更新才能反映真实使用情况。这挑战了传统AI评估中依赖静态基准测试的做法,主张需要动态、持续的数据收集方法。
SWE-chat is a living dataset; our collection pipeline automatically and continually discovers and processes sessions from public repositories
大多数人认为AI研究数据集是静态的、一次性的收集,但作者提出'活数据集'概念,强调数据需要持续更新才能反映真实使用情况。这挑战了传统AI评估中依赖静态基准测试的做法,主张需要动态、持续的数据收集方法。
Some privacy related extensions may cause issues on x.com.
这句话暗示了隐私保护工具与主流社交平台之间的潜在冲突。这反映了数字隐私与平台商业利益之间的张力。用户安装隐私扩展通常是为了保护数据不被收集,但平台可能将这些工具视为干扰其数据收集和分析的障碍。这种冲突预示着未来网络环境中隐私保护与平台功能之间的持续博弈。
Some privacy related extensions may cause issues on x.com.
这是一个令人惊讶的声明,暗示社交媒体平台可能主动阻止用户使用隐私保护工具。这可能表明X平台的数据收集策略与用户隐私保护之间存在根本冲突,值得深入研究其商业模式与用户权利的平衡问题。
We need, like, a Manhattan Project to collect this... Fields that are not exposed now will become exposed in the future, so you just want to track these statistics across the entire economy.
大多数人认为应对AI就业影响应该专注于当前受威胁最大的行业,但作者认为我们需要像曼哈顿计划一样全面收集所有行业的价格弹性数据,包括目前尚未受到AI影响的领域。这种前瞻性视角挑战了危机应对的常规思维。
(The more modi cation a library demands of eachMARC record, the more it costs.) In Harvard’s case she typicallyaccepts the record as is, even when the original card bearsadditional subject headings or enriching notes of various kinds.
Information loss in digitizing catalog cards...
个人学习可能取决于他人行为的主张突出了将学习环境视为一个涉及多个互动参与者的系统的重要性
only by examining a constellation of metrics in tension can we understand and influence developer productivity
I love this framing! In my experience companies don't generally acknowledge that metrics can be in tension, which usually means they're only tracking a subset of the metrics they ought to be if they want to have a more complete/realistic understanding of the state of things.
data collection to inform decision making is a much more prudent approach.
data collection
Inside America’s Covid-reporting breakdown—POLITICO. (n.d.). Retrieved August 23, 2021, from https://www.politico.com/news/2021/08/15/inside-americas-covid-data-gap-502565
Moss, A. J., Rosenzweig, C., Jaffe, S. N., Gautam, R., Robinson, J., & Litman, L. (2021). Bots or inattentive humans? Identifying sources of low-quality data in online platforms [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/wr8ds
Stokel-Walker, C. (n.d.). Concerns raised about pubs collecting data for coronavirus tracing. New Scientist. Retrieved June 25, 2020, from https://www.newscientist.com/article/2246965-concerns-raised-about-pubs-collecting-data-for-coronavirus-tracing/
ReconfigBehSci on Twitter: ‘RT @d_spiegel: Excellent new Covid RED dashboard from UCL https://t.co/wHMG8LzTUb Would be good to also know (a) how many contacts isolate…’ / Twitter. (n.d.). Retrieved 6 March 2021, from https://twitter.com/SciBeh/status/1323316018484305920
Lakens, D. (2021). Sample Size Justification. PsyArXiv. https://doi.org/10.31234/osf.io/9d3yf
Perez Santangelo, A., & Solovey, G. (2020, November 9). Time to Shine: Reliable Response-Timing Using R-Shiny for Online Experiments. https://doi.org/10.31234/osf.io/nuxdg
American Psychological Association. (2020). Adapting your research methods in response to COVID-19.
Online Research: From Funding to Data Collection. (n.d.). Association for Psychological Science - APS. Retrieved September 25, 2020, from https://www.psychologicalscience.org/news/online-research.html
Romanini, Daniele, Sune Lehmann, and Mikko Kivelä. ‘Privacy and Uniqueness of Neighborhoods in Social Networks’. ArXiv:2009.09973 [Physics], 21 September 2020. http://arxiv.org/abs/2009.09973.
Susan Athey, July 22, 2020. (2020, August 2). https://www.youtube.com/watch?v=hqTOPrUxDzM
University_covid_dashboards. (n.d.). Google Docs. Retrieved August 29, 2020, from https://docs.google.com/spreadsheets/d/1orYcRrRTQ6SiCJ7GXObZg1el70YeIHmjJEkrYFj40DA/edit?usp=sharing&usp=embed_facebook
COVID-19 Social Science Tracker - Google Sheets
Betsch, C. How behavioural science data helps mitigate the COVID-19 crisis. Nat Hum Behav (2020). https://doi.org/10.1038/s41562-020-0866-1
Maarten van Smeden on Twitter: “This is a kind reminder that most issues with data (e.g. measurement error, incomplete data, confounding, selection) do not disappear just because you have N = ginormous” / Twitter. (n.d.). Twitter. Retrieved July 19, 2020, from https://twitter.com/MaartenvSmeden/status/1283313496382373890
Coronavirus News and Coverage. (n.d.). Science. Retrieved July 18, 2020, from https://www.nationalgeographic.com/science/coronavirus-coverage/
Coping with the Crisis | Introduction. (n.d.). The Innovation in Politics Institute. Retrieved July 18, 2020, from https://innovationinpolitics.eu/en/coping-with-the-coronavirus-crisis/introduction/
Yamada, Y. (2020). Micropublishing during and after the COVID-19 era [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/8fum4
Using Smartphone, Social Media, and Sensor Data for Psychological Research (May 13, 2020). (n.d.). Retrieved June 25, 2020, from https://www.youtube.com/watch?time_continue=3&v=vSvnJzCfstU&feature=emb_logo
Register here: (n.d.). Google Docs. Retrieved May 5, 2020, from https://docs.google.com/forms/d/e/1FAIpQLSdqXWlf0sbRR9wSH_42shm4vU4tHcCe0bQZuC-6ngHaI4I32w/viewform??embedded=true&usp=embed_facebook
Carrying out qualitative research under lockdown – Practical and ethical considerations. (2020, April 20). Impact of Social Sciences. https://blogs.lse.ac.uk/impactofsocialsciences/2020/04/20/carrying-out-qualitative-research-under-lockdown-practical-and-ethical-considerations/
Raihani, N., & de-Wit, L. (2020, April 21). Factors Associated With Concern, Behaviour & Policy Support in Response to SARS-CoV-2. https://doi.org/10.31234/osf.io/8jpzc
Zelner, J., Riou, J., Etzioni, R., & Gelman, A. (2020). Accounting for Uncertainty During a Pandemic. ArXiv:2006.08745 [Physics, q-Bio, Stat]. http://arxiv.org/abs/2006.08745
OSF Coronavirus Outbreak Research Collection
Okan, Y. (2020, May 22). From a tweet to Reddit and beyond: The road to a global behavioral science SWAT team. Psychonomic Society Featured Content. https://featuredcontent.psychonomic.org/from-a-tweet-to-reddit-and-beyond-the-road-to-a-global-behavioral-science-swat-team/
Koerth, M. (2020, March 31). Why It’s So Freaking Hard To Make A Good COVID-19 Model. FiveThirtyEight. https://fivethirtyeight.com/features/why-its-so-freaking-hard-to-make-a-good-covid-19-model/
Cheshire, J. (2020, May 18). "John Snow's map of cholera looked as dull as (cholera filled) dishwater compared to his competitors...His brilliance was a solid data collection & then a simple map presenting what he knew. Each death marked in black and white. Here's a lesson for COVID-19 dataviz... 1/11" Twitter. https://twitter.com/spatialanalysis/status/1262338373253042178
Crchartier, ~. (2020, March 13). The PSA Calls for Rapid and Impactful Study Proposals on COVID-19. Psychological Science Accelerator. https://psysciacc.org/2020/03/13/the-psa-calls-for-rapid-and-impactful-study-proposals-on-covid-19/
Lourenco, S. F., & Tasimi, A. (2020). No Participant Left Behind: Conducting Science During COVID-19. Trends in Cognitive Sciences, S1364661320301157. https://doi.org/10.1016/j.tics.2020.05.003
Lean Data Practices Staying lean and being smart about how you collect data can build trust with your users and ultimately help grow your business.
Mullard, A. (2020). Flooded by the torrent: The COVID-19 drug pipeline. The Lancet, 395(10232), 1245–1246. https://doi.org/10.1016/S0140-6736(20)30894-1
Olapegba, P. O., Ayandele, O., Kolawole, S. O., Oguntayo, R., Gandi, J. C., Dangiwa, A. L., … Iorfa, S. K. (2020, April 12). COVID-19 Knowledge and Perceptions in Nigeria. https://doi.org/10.31234/osf.io/j356x
Psychological Science Accelerator. (2020 March 21). Join the PSA's rapid-response COVID-19 project. Psysciacc.org. https://psysciacc.org/2020/03/21/join-the-psas-rapid-response-covid-19-project/.
James, E. Tutorial Home. Github.io. https://emljames.github.io/GorillaR/
Nanni, M., Andrienko, G., Boldrini, C., Bonchi, F., Cattuto, C., Chiaromonte, F., Comandé, G., Conti, M., Coté, M., Dignum, F., Dignum, V., Domingo-Ferrer, J., Giannotti, F., Guidotti, R., Helbing, D., Kertesz, J., Lehmann, S., Lepri, B., Lukowicz, P., … Vespignani, A. (2020). Give more data, awareness and control to individual citizens, and they will help COVID-19 containment. ArXiv:2004.05222 [Cs]. http://arxiv.org/abs/2004.05222
Callaghan, S. (2020). COVID-19 Is a Data Science Issue. Patterns, 100022. https://doi.org/10.1016/j.patter.2020.100022
Nielsen, R.K., Fletcher, R., Newman, N., Brennen, S., Howard, P.N. (2020 April 15). Navigating the ‘infodemic’: how people in six countries access and rate news and information about coronavirus. Reuters Institute. https://reutersinstitute.politics.ox.ac.uk/infodemic-how-people-six-countries-access-and-rate-news-and-information-about-coronavirus
Bird, S., Nielsen, B. (2020 April 20). Now-casting of Covid-19 deaths in English Hospitals. http://users.ox.ac.uk/~nuff0078/Covid/index.htm
Holmes, E. A., O’Connor, R. C., Perry, V. H., Tracey, I., Wessely, S., Arseneault, L., Ballard, C., Christensen, H., Cohen Silver, R., Everall, I., Ford, T., John, A., Kabir, T., King, K., Madan, I., Michie, S., Przybylski, A. K., Shafran, R., Sweeney, A., … Bullmore, E. (2020). Multidisciplinary research priorities for the COVID-19 pandemic: A call for action for mental health science. The Lancet Psychiatry, S2215036620301681. https://doi.org/10.1016/S2215-0366(20)30168-1
Atchison, C. J., Bowman, L., Vrinten, C., Redd, R., Pristera, P., Eaton, J. W., & Ward, H. (2020). Perceptions and behavioural responses of the general public during the COVID-19 pandemic: A cross-sectional survey of UK Adults [Preprint]. Public and Global Health. https://doi.org/10.1101/2020.04.01.20050039
COVID-19. (n.d.). Retrieved April 17, 2020, from https://www.wwtf.at/covid/index.php?lang=EN
GIS and Spatial Analytics Market: Global Size
the absence of a social contract
actual level of consent of individuals being documented (and by whom? by private corporations, mostly)
A Python script was used to download tweets through Twitter API.
Data collection method to collect data/tweets from Twitter. They say that the data set has 125,907 unique tweets. I think they didn't include any re-tweets as well.