The three metrics where we find acceleration are concentrated in programming and mathematics. These are areas that labs have explicitly targeted for improvement, and they share an important property: correctness is easy to verify automatically.
主流观点可能认为AI能力在各个领域的提升是均衡的,但作者指出加速现象主要集中在编程和数学领域,因为这些领域的正确性容易自动验证。这暗示AI进步可能不是普遍性的,而是集中在特定可量化的领域。