DeepSeek-V4-Pro-Max beats GPT-5.2 and Gemini 3.0-Pro on standard reasoning benchmarks and lands just behind GPT-5.4 and Gemini 3.1-Pro
DeepSeek V4-Pro-Max在标准推理基准测试中超越了GPT-5.2和Gemini 3.0-Pro,这表明了开源模型在性能上的巨大提升。
DeepSeek-V4-Pro-Max beats GPT-5.2 and Gemini 3.0-Pro on standard reasoning benchmarks and lands just behind GPT-5.4 and Gemini 3.1-Pro
DeepSeek V4-Pro-Max在标准推理基准测试中超越了GPT-5.2和Gemini 3.0-Pro,这表明了开源模型在性能上的巨大提升。
DeepSeek V4 exceeds them all on coding, math, and STEM problems, making it one of the strongest open-source models ever released.
大多数人认为开源AI模型在性能上无法匹敌闭源商业模型,但作者认为DeepSeek V4在多个关键领域超越了其他开源模型,甚至与顶级闭源模型相当。这挑战了'开源必然意味着性能妥协'的行业共识,暗示开源模型正在迅速缩小与商业模型的差距。
Gemma 4 E4B matches or exceeds GPT-4o across multiple benchmarks including MATH, GSM8K, GPQA Diamond & HumanEval
这一性能对比结果令人惊讶,表明开源模型已经能够闭源模型的性能,这可能打破AI领域的封闭生态,促进更广泛的研究合作和创新,同时降低企业采用AI的门槛。
Byte for byte, the most capable open models
大多数人认为开源模型在性能上无法与闭源/专有模型相提并论,但作者声称Gemma 4是'字节对字节最强大的开源模型',挑战了这一行业共识。这暗示开源模型在特定指标上已经超越了商业闭源模型,是一个非传统的观点。
Above: the time to do a production bundle
Nice way to demonstrate and let people feel how slow the competition is!
McCabe, Stefan, Leo Torres, Timothy LaRock, Syed Arefinul Haque, Chia-Hung Yang, Harrison Hartle, and Brennan Klein. ‘Netrd: A Library for Network Reconstruction and Graph Distances’. ArXiv:2010.16019 [Physics], 29 October 2020. http://arxiv.org/abs/2010.16019.