While model capabilities have improved dramatically for use cases like codegen and mathematical reasoning, they still lag behind on the data side (as evidenced through SQL benchmarks like Spider 2.0 and Bird Bench).
这一观点提供了令人惊讶的事实:尽管模型在代码生成和数学推理方面取得了显著进步,但在数据处理方面仍然落后。这挑战了模型能力全面提升的假设,暗示了数据推理可能需要特殊的处理方法。