the robustness of these reasoning behaviors remains underexplored
「推理行为的鲁棒性尚未被充分探索」——这句话是整个推理模型研究领域的集体盲点声明。过去两年,测试时计算(test-time compute)、长思维链(CoT)、o1/R1 类推理模型吸引了巨大关注,但几乎所有评测都在「孤立问题」环境下进行。在真实 Agent 部署场景中,「能否保持推理深度」这个最基本的可靠性问题,直到这篇论文才开始被系统研究。
the robustness of these reasoning behaviors remains underexplored
「推理行为的鲁棒性尚未被充分探索」——这句话是整个推理模型研究领域的集体盲点声明。过去两年,测试时计算(test-time compute)、长思维链(CoT)、o1/R1 类推理模型吸引了巨大关注,但几乎所有评测都在「孤立问题」环境下进行。在真实 Agent 部署场景中,「能否保持推理深度」这个最基本的可靠性问题,直到这篇论文才开始被系统研究。
Michael McCarthy. (2021, August 27). #COVD19 vax rates are still increasing in Australia. The over 70s are approaching 90% with first doses. The red line is the rate for all people >=16. That should reach 60% soon. Https://t.co/NH3Q7jumgT [Tweet]. @mickresearch. https://twitter.com/mickresearch/status/1431185640884826112
In particular, concurrency control problems arise when the software, data,and interface are distributed over several computers. Time delays when ex-changing potentially conflicting actions are especially worrisome. ... Ifconcurrency control is not established, people may invoke conflicting ac-tions. As a result, the group may become confused because displays are incon-sistent, and the groupware document corrupted due to events being handledout of order. (p. 207)
This passage helps to explain the emphasis in CSCW papers on time/duration as a system design concern for workflow coordination (milliseconds between MTurk hits) versus time/representation considerations for system design
eople prefer to know who else is present in a shared space, and they usethis awareness to guide their work
Awareness, disclosure, and privacy concerns are key cognitive/perception needs to integrate into technologies. Social media and CMCs struggle with this knife edge a lot.
It's also seems to be a big factor in SBTF social coordination that leads to over-compensating and pluritemporal loading of interactions between volunteers.