The three metrics where we find acceleration are concentrated in programming and mathematics. These are areas that labs have explicitly targeted for improvement, and they share an important property: correctness is easy to verify automatically.
大多数人可能认为AI能力的加速是跨领域普遍发生的,但作者指出加速主要集中在编程和数学领域,因为这些领域正确性容易自动验证。这一发现挑战了人们对AI能力普遍提升的假设,暗示加速可能是有选择性的。