When evaluated directly in the Codex app, best-of-ten model submissions ranked above the 95th percentile of human experts on the prediction task and around the 84th percentile of human experts on the sequence generation task.
这一性能指标令人震惊,表明AI在某些任务上已超越95%的人类专家。这不仅是技术进步的标志,也引发了对专业科学家角色和未来就业市场的深刻思考。