These papers suggest that strategic data engineering and inference-time optimization can substitute for raw parameter count.
这一观点提出了通过数据工程和推理时间优化来提高模型性能的新方法,为模型优化提供了新的思路。
These papers suggest that strategic data engineering and inference-time optimization can substitute for raw parameter count.
这一观点提出了通过数据工程和推理时间优化来提高模型性能的新方法,为模型优化提供了新的思路。
In 23 months, the same capability that needed 1.8 trillion parameters now fits in 4 billion parameters. A 450x compression.
令人惊讶的是:AI模型参数量在短短23个月内实现了450倍的压缩,这意味着原本需要超级计算机才能运行的强大AI模型现在可以完全在手机上运行。这种技术进步的速度远超摩尔定律,展示了算法优化和模型压缩技术的惊人突破。
After compressing, the model again extends its solutions to achieve stronger performance.
令人惊讶的是:Muse Spark在测试时展现出一种独特的'思想压缩'能力,模型在最初通过延长思考时间提高性能后,会在时间惩罚机制下自发压缩推理过程,然后再扩展解决方案以获得更强的性能。这种动态的自我优化机制在AI模型中前所未见。
NVFP4 enables 4-bit precision while maintaining nearly identical accuracy to 8-bit precision, increasing performance per watt and lowering cost per token.
大多数人认为降低模型精度会显著牺牲性能,但作者声称Gemma 4通过NVFP4量化技术实现了4位精度与8位精度几乎相同的准确率。这一反直觉的结论挑战了传统量化会大幅降低模型性能的认知,暗示NVIDIA可能在量化技术方面取得了突破性进展。
The similarity is because they are all saying roughly the same thing: Total (result) = Kinetic (cost) + Potential (benefit) Cost is either imaginary squared or negative (space-like), benefit is real (time-like), result is mass-like. Just like physics, the economic unfavourable models are the negative results. In economics, diversity of products is a strength as it allows better recovery from failure of any one, comically DEI of people fails miserably at this, because all people are not equal. Here are some other examples you will know if you do physics: E² + (ipc)² = (mc²)² (relativistic Einstein equation), mass being the result, energy time-like (potential), momentum the space-like (kinetic). ∇² - 1/c² ∂²/∂t² = (mc/ℏ)² (Klein-Gordon equation), mass is the result, ∂²/∂t² potential, ∇² is kinetic. Finally we have Dirac equation, which unlike the previous two as "sum of squares" is more like vector addition (first order differentials, not second). iℏγ⁰∂₀ψ + iℏγⁱ∂ᵢψ = mcψ First part is still the time-like potential, second part is the space-like kinetic, and the mass is still the result though all the same. This is because energy is all forms, when on a flat (free from outside influence) worksheet, acts just like a triangle between potential, kinetic and resultant energies. E.g. it is always of the form k² + p² = r², quite often kinetic is imaginary to potential (+,-,-,-) spacetime metric, quaternion mathematics. So the r² can be negative, or imaginary result if costs out way benefits, or work in is greater than work out. Useless but still mathematical solution. Just like physics, you always want the mass or result to be positive and real, or your going to lose energy to the surrounding field, with negative returns. Economic net loss do not last long, just like imaginary particles in physics.
in reply to Cesar A. Hidalgo at https://x.com/realAnthonyDean/status/1844409919161684366
via Anthony Dean @realAnthonyDean
Ben-David, S. (2018). Clustering—What Both Theoreticians and Practitioners are Doing Wrong. ArXiv:1805.08838 [Cs, Stat]. http://arxiv.org/abs/1805.08838