2 Matching Annotations
  1. Last 7 days
    1. METR's confidence interval for Claude Opus 4.6 ranges from 5 hours to 66 hours.

      置信区间从 5 小时到 66 小时——这个跨度本身就令人震惊。5 小时和 66 小时是 13 倍的差距,却是对「同一个模型」的同一项测量。当一个数字被广泛引用为「Claude Opus 4.6 的时间地平线是 12 小时」时,真相是这个数字的不确定性区间宽达一个数量级。这是整个 AI 能力评测领域目前面临的核心危机:我们在用极度不精确的测量数字来驱动极其重要的决策。

  2. Jan 2022
    1. Goodhart's law is an adage often stated as "When a measure becomes a target, it ceases to be a good measure".[1] It is named after British economist Charles Goodhart, who advanced the idea in a 1975 article on monetary policy in the United Kingdom:[2][3] .mw-parser-output .templatequote{overflow:hidden;margin:1em 0;padding:0 40px}.mw-parser-output .templatequote .templatequotecite{line-height:1.5em;text-align:left;padding-left:1.6em;margin-top:0}Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.

      We measure what we find important.

      Measures can and often become self-fulfilling targets. (read: Rankings and Reactivity by W. Espeland and M. Sauder https://www.stmarys-ca.edu/sites/default/files/attachments/files/rankings-and-reactivity-2007.pdf)

      When a measure becomes a target it ceases to be a good measure.

      So why measure?


      Is observation and measurement part of a larger complex process which isn't finished until the process itself is finished?


      This seems related to the measurement problem in quantum mechanics, Schrödinger's cat, the Heisenberg uncertainty principle, and the observer effect).