Hypothesis

5 Matching Annotations

Apr 2026
blog.google blog.google

https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/eighth-generation-tpu-agentic-era/

1
1. fxp007 23 Apr 2026
  
  in Public
  
  TPU 8i is designed with more memory bandwidth to serve the most latency-sensitive inference workloads, which is critical because interactions between agents at scale magnify even small inefficiencies.
  
  通常认为内存带宽是通用硬件的需求，但作者提出TPU 8i针对低延迟推理进行了优化，这与通用硬件设计追求平衡的常规做法不同。
  
  non-consensus memory-bandwidth inference-optimization
Visit annotations in context

Tags

non-consensus

memory-bandwidth

inference-optimization

Annotators

fxp007

URL

blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/eighth-generation-tpu-agentic-era/
blog.skypilot.co blog.skypilot.co

https://blog.skypilot.co/research-driven-agents/

2
1. fxp007 17 Apr 2026
  
  in Public
  
  A 606 MiB model at ~49 tokens/s consumes ~30 GB/s of memory bandwidth, close to the c6i.2xlarge's DRAM limit. No amount of SIMD tricks will help when the CPU is stalled waiting for model weights to arrive from DRAM.
  
  这一数据揭示了现代CPU推理的关键瓶颈：内存带宽限制。代理最初尝试的SIMD微优化无法突破这一根本限制，这表明理解硬件特性和系统瓶颈对于有效优化至关重要。这一发现挑战了传统上认为计算是主要瓶颈的观念，强调了内存效率在AI推理中的核心地位。
  
  hardware-bottleneck memory-bandwidth system-optimization
2. fxp007 16 Apr 2026
  
  in Public
  
  The variance is also worth noting: baseline+FA TG has ±19 t/s of noise, while optimized+FA has ±0.59 t/s on x86.
  
  令人惊讶的是：优化后的代码不仅提高了性能，还显著减少了结果方差（从±19 t/s降至±0.59 t/s）。这表明AI代理的优化不仅关注速度，还考虑了内存访问模式的可预测性，这种全面性思维令人印象深刻。
  
  surprising performance memory-optimization
Visit annotations in context

Tags

memory-optimization

system-optimization

surprising

memory-bandwidth

performance

hardware-bottleneck

Annotators

fxp007

URL

blog.skypilot.co/research-driven-agents/
x.com x.com

https://x.com/berryxia/status/2042017501253661059

1
1. fxp007 16 Apr 2026
  
  in Public
  
  KV Cache 内存占用降低 10.7 倍
  
  令人惊讶的是：KV Cache内存占用降低了惊人的10.7倍，这一数字远超普通技术优化的幅度。KV Cache是大模型推理中的主要内存消耗部分，如此大幅度的减少意味着同样的硬件可以处理更长的上下文，或者同时运行更多模型实例。
  
  surprising memory-efficiency kv-cache-optimization
Visit annotations in context

Tags

surprising

kv-cache-optimization

memory-efficiency

Annotators

fxp007

URL

x.com/berryxia/status/2042017501253661059
Apr 2022
www.cs.sfu.ca www.cs.sfu.ca

Randal E. Bryant, David R. O’Hallaron - Computer Systems. A Programmer’s Perspective [3rd ed.] (2016, Pearson)Global Edition.pdf

1
1. sherlockliao 29 Apr 2022
  
  in Public
  
  If a compiler cannotdetermine whether or not two pointers may be aliased, it must assume that eithercase is possible, limiting the set of possible optimizations.
  
  pointer alias 的 optimization block 怎么理解？
  
  memory aliasing pointer optimization block
Visit annotations in context

Tags

optimization block

pointer

memory aliasing

Annotators

sherlockliao

URL

cs.sfu.ca/~ashriram/Courses/CS295/assets/books/CSAPP_2016.pdf

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL