Hypothesis

5 Matching Annotations

Last 7 days
quesma.com quesma.com

Qwen 3.6 27B is the sweet spot for local development - Quesma Blog

1
1. fxp007 03 Jul 2026
  
  in Public
  
  A common 8-bit quantization saves half the space at almost no cost to quality. Going further down the road, models are smaller (and potentially - faster), but at the cost of quality
  
  这里提供了关于模型量化的关键数据和最佳实践。8-bit（BF16到Q8）是性价比极高的“甜点”区间，能在节省一半内存的同时几乎不损失质量。而追求更激进的量化（如4-bit）则必须面对质量下降的权衡。初学者应以此为基准来选择适合自身硬件的模型版本。
  
  quantization-data best-practice trade-offs
Visit annotations in context

Tags

trade-offs

best-practice

quantization-data

Annotators

fxp007

URL

quesma.com/blog/qwen-36-is-awesome/
Apr 2026
huggingface.co huggingface.co

https://huggingface.co/papers/2604.04514

1
1. fxp007 24 Apr 2026
  
  in Public
  
  Fisher-Rao Quantization-Aware Distance (FRQAD) -- a new metric on the Gaussian statistical manifold achieving 100% precision at preferring high-fidelity embeddings over quantized ones (vs 85.6% for cosine), with zero prior art.
  
  这项声称100%精度的FRQAD指标令人惊讶，因为它远超传统余弦相似度的85.6%。如果属实，这将彻底改变我们处理嵌入向量压缩和相似度计算的方式，挑战当前广泛使用的余弦相似度在信息检索领域的统治地位。
  
  breakthrough quantization precision
Visit annotations in context

Tags

breakthrough

quantization

precision

Annotators

fxp007

URL

huggingface.co/papers/2604.04514
x.com x.com

https://x.com/i/web/status/2044553427448172741

1
1. fxp007 16 Apr 2026
  
  in Public
  
  a quantized 1.7B model (just 290MB in size) can run at ~100 tokens per second entirely in your browser
  
  令人惊讶的是：如此庞大的语言模型（17亿参数）可以被压缩到仅290MB，并在浏览器中以每秒100个token的速度运行，这展示了模型量化技术的惊人进步，使得复杂的AI模型可以在普通设备上高效运行。
  
  surprising quantization browser-ai
Visit annotations in context

Tags

surprising

quantization

browser-ai

Annotators

fxp007

URL

x.com/i/web/status/2044553427448172741
developer.nvidia.com developer.nvidia.com

https://developer.nvidia.com/blog/bringing-ai-closer-to-the-edge-and-on-device-with-gemma-4/

1
1. fxp007 08 Apr 2026
  
  in Public
  
  NVFP4 enables 4-bit precision while maintaining nearly identical accuracy to 8-bit precision, increasing performance per watt and lowering cost per token.
  
  大多数人认为降低模型精度会显著牺牲性能，但作者声称Gemma 4通过NVFP4量化技术实现了4位精度与8位精度几乎相同的准确率。这一反直觉的结论挑战了传统量化会大幅降低模型性能的认知，暗示NVIDIA可能在量化技术方面取得了突破性进展。
  
  non-consensus quantization model-optimization
Visit annotations in context

Tags

non-consensus

model-optimization

quantization

Annotators

fxp007

URL

developer.nvidia.com/blog/bringing-ai-closer-to-the-edge-and-on-device-with-gemma-4/
Nov 2023
zayunsna.github.io zayunsna.github.io

KoAlpaca 에 대해 공부 - 1

1
1. polarislee 11 Nov 2023
  
  in Public
  
  KoAlpaca Study NLP tokenizing fine-tuning 4bit nf4-quantization
Visit annotations in context

Tags

Study

KoAlpaca

nf4-quantization

NLP

fine-tuning

4bit

tokenizing

Annotators

polarislee

URL

zayunsna.github.io/blog/2023-08-01-koalpaka/

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL