the overall accuracy of predicting the risk of natural disaster—aggregated across 20 categories such as wildfires, floods, and tornadoes—was increased by 5%.
5%的灾害预测准确率提升虽然看似不大,但这是针对20种不同灾害类别的综合提升,对于灾害预警系统而言具有重要价值。这种提升可能挽救生命并减少经济损失,特别是在高风险地区。
the overall accuracy of predicting the risk of natural disaster—aggregated across 20 categories such as wildfires, floods, and tornadoes—was increased by 5%.
5%的灾害预测准确率提升虽然看似不大,但这是针对20种不同灾害类别的综合提升,对于灾害预警系统而言具有重要价值。这种提升可能挽救生命并减少经济损失,特别是在高风险地区。
Using these ability scores, the method predicts performance on new tasks with ~88% accuracy, including for models such as GPT-4o and Llama-3.1.
令人惊讶的是:ADeLe方法能够以约88%的准确度预测AI模型在新任务上的表现,这包括像GPT-4o和Llama-3.1这样先进的大模型。这种预测能力远超传统评估方法,为AI性能评估提供了革命性的突破,使研究人员能够更可靠地预见模型在未见过的任务上的表现。
Gu, T., Wang, L., Xie, N., Meng, X., Li, Z., Postlethwaite, A. E., Aleya, L., Howard, S., Gu, W., & Wang, Y. (2021). Towards a country-based prediction model of COVID-19 infections and deaths between disease apex and end: -Evidence from countries with contained numbers of COVID-19. Frontiers in Medicine, 8. https://doi.org/10.3389/fmed.2021.585115
Martin, G. P., Sperrin, M., & Sotgiu, G. (2020). Performance of Prediction Models for Covid-19: The Caudine Forks of the External Validation. European Respiratory Journal. https://doi.org/10.1183/13993003.03728-2020
COVID Projections Tracker. (n.d.). Retrieved September 7, 2020, from https://www.covid-projections.com/
Shi, W., Wang, L., & Qin, J. (2020). Extracting user influence from ratings and trust for rating prediction in recommendations. Scientific Reports, 10(1), 13592. https://doi.org/10.1038/s41598-020-70350-1
Manski, C. F. (2020). Bounding the Predictive Values of COVID-19 Antibody Tests (Working Paper No. 27226; Working Paper Series). National Bureau of Economic Research. https://doi.org/10.3386/w27226
Is the “science” behind the lockdown any good? (n.d.). Financial Times. Retrieved June 2, 2020, from http://ftalphaville.ft.com/2020/05/21/1590091709000/It-s-all-very-well--following-the-science---but-is-the-science-any-good--/