Andrej Karpathy built a simple automation pipeline for AI agents to optimize training in 5-minute increments.
这个案例展示了AI系统在自动化研究中的应用,5分钟的增量优化时间是一个精细的时间尺度,表明AI系统已经能够进行快速迭代的实验。61K+的GitHub星标表明这种方法在AI研究社区中引起了广泛关注。
Andrej Karpathy built a simple automation pipeline for AI agents to optimize training in 5-minute increments.
这个案例展示了AI系统在自动化研究中的应用,5分钟的增量优化时间是一个精细的时间尺度,表明AI系统已经能够进行快速迭代的实验。61K+的GitHub星标表明这种方法在AI研究社区中引起了广泛关注。
Scientific publication compresses a branching, iterative research process into a linear narrative, discarding the majority of what was discovered along the way.
大多数人认为科学论文完整记录了研究过程,但作者认为传统科学论文实际上丢弃了大部分发现,只呈现线性叙事,这构成了所谓的'故事税'。这种观点挑战了学术界对出版物完整性的普遍认知。
An AI researcher subsequently gifted them each a ChatGPT Pro subscription to encourage their 'vibe mathing.'
大多数人认为严肃的数学研究需要严谨的方法和深厚的专业知识,但作者使用'vibe mathing'这种非正式术语描述这种研究方式,挑战了学术研究方法论的传统规范。
two participants gave it 9/10 and one "11/10"
一个 2 小时的桌游式推演,三位顶级 AI 安全研究员给出了 9-11 分的评价——这本身就是一个信号:严肃的 AI 研究机构正在用「角色扮演」的方式准备未来。这种方法论(预演未来能力下的工作流)在其他领域有先例——军事桌游、灾难演习、情景规划——但将其用于 AI 能力演进,是 METR 独特的研究品味的体现。
Large language models (LLMs) sometimes appear to exhibit emotional reactions. We investigate why this is the case in Claude Sonnet 4.5 and explore implications for alignment-relevant behavior.
【启发】这句话提示了一种全新的 AI 研究范式:与其问「模型能做什么」,不如问「模型为什么这样做」。把情绪作为切入口去理解模型行为,本质上是把心理学方法论引入了 AI 可解释性研究。这对从业者的启发是:未来最有价值的 AI 研究,可能不在算法创新,而在「为已知现象寻找机制性解释」——就像这篇论文做的那样。
suggests quantitative methods wrt predicting future tech impact on behaviour/socialaspects, in contrast with the usual qualitative narrative methods (futurism, narrative inquiry, scenarios presumably) The Science Fiction Science Method as PDF in Zotero PDF available CC BY at https://www.researchgate.net/publication/394323287_The_Science_Fiction_Science_Method
via Bruce Sterling (Mastodon)
educationaldesign research methodology.
Educational design research methodology.
Replicating scientific results is tough—But essential. (2021). Nature, 600(7889), 359–360. https://doi.org/10.1038/d41586-021-03736-4
(the VTA is also part ofthis system, but is too small to image with standard fMRImethods, but see [35] for successful imaging methods).
All imaging studies face questions of validity and should (and many do) link to comprehensive details on instrumentation, methodology, and interpretation. Apparently, the professional consensus remains that, properly executed and interpreted, fMRI and other functional imaging techniques based on detection of oxygenation can lead to highly valid conclusions. (See Nautil.us article.)
How to Conduct Agile Market Research for Your Digital Product
Logg, Jennifer M., and Charles A. Dorison. “Pre-Registration: Weighing Costs and Benefits for Researchers.” Organizational Behavior and Human Decision Processes 167 (November 1, 2021): 18–27. https://doi.org/10.1016/j.obhdp.2021.05.006.
Metascience 2021. (n.d.). Retrieved June 27, 2021, from https://metascience2021.org/
Calster, B. V., Wynants, L., Riley, R. D., Smeden, M. van, & Collins, G. S. (2021). Methodology over metrics: Current scientific standards are a disservice to patients and society. Journal of Clinical Epidemiology, 0(0). https://doi.org/10.1016/j.jclinepi.2021.05.018
Baghal, T. A., Wenz, A., Sloan, L., & Jessop, C. (2021). Linking Twitter and survey data: Asymmetry in quantity and its impact. EPJ Data Science, 10(1), 1–20. https://doi.org/10.1140/epjds/s13688-021-00286-7
Online Research Tools and Techniques. (2020, September 16). https://www.youtube.com/watch?v=wGWqBtDkOFs
Health Nerd on Twitter. (n.d.). Twitter. Retrieved October 17, 2020, from https://twitter.com/GidMK/status/1316511734115385344
Online Research: From Funding to Data Collection. (n.d.). Association for Psychological Science - APS. Retrieved September 25, 2020, from https://www.psychologicalscience.org/news/online-research.html
Puthillam, Arathy. ‘Too WEIRD, Too Fast: Preprints about COVID-19 in the Psychological Sciences’. Preprint. PsyArXiv, 10 June 2020. https://doi.org/10.31234/osf.io/5w7du.