Using these ability scores, the method predicts performance on new tasks with ~88% accuracy, including for models such as GPT-4o and Llama-3.1.
88%的预测准确率是一个令人印象深刻的数据点,表明ADeLe不仅能够解释现有性能,还能可靠预测模型在新任务上的表现。这一准确率远超传统方法,为AI系统的可靠部署提供了强有力的预测工具,可能是AI评估领域的重要突破。