Hypothesis

3 Matching Annotations

May 2026
epoch.ai epoch.ai

RIP Classic Reasoning Benchmarks. What's Next? - Epoch AI

1
1. fxp007 07 May 2026
  
  in Public
  
  The next generation of benchmarks needs to be harder, more realistic, and less gameable
  
  【洞察】「更难、更真实、更不可刷题」——这三条标准本质上是在要求 benchmark 向「真实工作」靠拢，而非向「考试题」收敛。但这恰恰引出了一个悖论：越真实的 benchmark，越难自动化评分，越贵（METR 每题 8000 美元），越慢发布。AI 评测体系正在面临「评测速度 vs 评测质量」的根本性权衡。
  
  benchmark-design next-generation evaluation-paradox insight
Visit annotations in context

Tags

next-generation

evaluation-paradox

insight

benchmark-design

Annotators

fxp007

URL

epoch.ai/gradient-updates/rip-classic-benchmarks
Aug 2020
osf.io osf.io

Validity of energy social research during and after COVID-19: challenges, considerations, and responses

1
1. ErikStuchly 28 Aug 2020
  
  in BehSci
  
  Fell, M. J., Pagel, L., Chen, C., Goldberg, M. H., Herberz, M., Huebner, G., Sareen, S., & Hahnel, U. J. J. (2020). Validity of energy social research during and after COVID-19: Challenges, considerations, and responses [Preprint]. SocArXiv. https://doi.org/10.31235/osf.io/pe6cd
  
  is:preprint lang:en COVID-19 transmission reduction social research energy internal validity external validity scientific practice recommendation guidance contextual data vulnerability longitudinal element insight generation
Visit annotations in context

Tags

is:preprint

longitudinal element

contextual data

vulnerability

internal validity

transmission reduction

recommendation

scientific practice

COVID-19

lang:en

external validity

energy

social research

insight generation

guidance

Annotators

ErikStuchly

URL

osf.io/preprints/socarxiv/pe6cd/
Jul 2020
osf.io osf.io

The COVID-19 pandemic is changing the way people recreate outdoors: Preliminary report on a national survey of outdoor enthusiasts amid the COVID-19 pandemic

1
1. ErikStuchly 23 Jul 2020
  
  in BehSci
  
  Rice, W. L., Meyer, C., Lawhon, B., Taff, B. D., Mateer, T., Reigner, N., & Newman, P. (2020). The COVID-19 pandemic is changing the way people recreate outdoors: Preliminary report on a national survey of outdoor enthusiasts amid the COVID-19 pandemic [Preprint]. SocArXiv. https://doi.org/10.31235/osf.io/prnz9
  
  is:preprint lang:en COVID-19 outdoors recreation longitudinal change daily life survey report public lands nature insight generation preliminary findings
Visit annotations in context

Tags

is:preprint

recreation

outdoors

survey

preliminary findings

public lands

longitudinal change

nature

COVID-19

daily life

lang:en

insight generation

report

Annotators

ErikStuchly

URL

osf.io/preprints/socarxiv/prnz9/