18 Matching Annotations
  1. Apr 2026
    1. the robustness of these reasoning behaviors remains underexplored

      「推理行为的鲁棒性尚未被充分探索」——这句话是整个推理模型研究领域的集体盲点声明。过去两年,测试时计算(test-time compute)、长思维链(CoT)、o1/R1 类推理模型吸引了巨大关注,但几乎所有评测都在「孤立问题」环境下进行。在真实 Agent 部署场景中,「能否保持推理深度」这个最基本的可靠性问题,直到这篇论文才开始被系统研究。

  2. Oct 2022
  3. Dec 2021
  4. Nov 2021
  5. Sep 2021
    1. Haber, N. A., Wieten, S. E., Rohrer, J. M., Arah, O. A., Tennant, P. W. G., Stuart, E. A., Murray, E. J., Pilleron, S., Lam, S. T., Riederer, E., Howcutt, S. J., Simmons, A. E., Leyrat, C., Schoenegger, P., Booman, A., Dufour, M.-S. K., O’Donoghue, A. L., Baglini, R., Do, S., … Fox, M. P. (2021). Causal and Associational Linking Language From Observational Research and Health Evaluation Literature in Practice: A systematic language evaluation [Preprint]. Epidemiology. https://doi.org/10.1101/2021.08.25.21262631

  6. Aug 2021
  7. Jul 2021
  8. Apr 2021
  9. Sep 2020
  10. Jul 2020
    1. Mulligan, M. J., Lyke, K. E., Kitchin, N., Absalon, J., Gurtman, A., Lockhart, S. P., Neuzil, K., Raabe, V., Bailey, R., Swanson, K. A., Li, P., Koury, K., Kalina, W., Cooper, D., Fonter-Garfias, C., Shi, P.-Y., Tuereci, O., Tompkins, K. R., Walsh, E. E., … Jansen, K. U. (2020). Phase 1/2 Study to Describe the Safety and Immunogenicity of a COVID-19 RNA Vaccine Candidate (BNT162b1) in Adults 18 to 55 Years of Age: Interim Report. MedRxiv, 2020.06.30.20142570. https://doi.org/10.1101/2020.06.30.20142570

  11. Jun 2020
  12. Apr 2019