the robustness of these reasoning behaviors remains underexplored
「推理行为的鲁棒性尚未被充分探索」——这句话是整个推理模型研究领域的集体盲点声明。过去两年,测试时计算(test-time compute)、长思维链(CoT)、o1/R1 类推理模型吸引了巨大关注,但几乎所有评测都在「孤立问题」环境下进行。在真实 Agent 部署场景中,「能否保持推理深度」这个最基本的可靠性问题,直到这篇论文才开始被系统研究。
the robustness of these reasoning behaviors remains underexplored
「推理行为的鲁棒性尚未被充分探索」——这句话是整个推理模型研究领域的集体盲点声明。过去两年,测试时计算(test-time compute)、长思维链(CoT)、o1/R1 类推理模型吸引了巨大关注,但几乎所有评测都在「孤立问题」环境下进行。在真实 Agent 部署场景中,「能否保持推理深度」这个最基本的可靠性问题,直到这篇论文才开始被系统研究。
Racine, N., Madigan, S., Cardinal, S., Hartwick, C., Leslie, M., Motz, M., & Pepler, D. (2021). Community-Based Research: Perspectives of Psychology Researchers and Community Partners. PsyArXiv. https://doi.org/10.31234/osf.io/cxrmt
At the time of the beginning of the research, very little had been written on middle- aged women; collectively as social scientists we knew next to nothing about the middle years of adult life. We were critical of what little literature existed and were skeptical of widely held assumptions about women of this age.
Social science literature absent the experience of middle-aged women. Interregate empty next syndrome.
Hopes UK trial will allay pregnant women’s Covid vaccine concerns. (2021, August 3). The Guardian. http://www.theguardian.com/world/2021/aug/03/hopes-uk-trial-will-allay-pregnant-womens-covid-vaccine-concerns
Hammerstein, S., König, C., Dreisoerner, T., & Frey, A. (2021). Effects of COVID-19-Related School Closures on Student Achievement—A Systematic Review [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/mcnvk
Maxmen, A. (2021). Will COVID force public health to confront America’s epic inequality?. Nature, 592(7856), 674-680.