2 Matching Annotations
  1. Jun 2026
    1. the lack of KV sharing across requests leads to redundant prefill computation and wasted memory.

      KV sharing across concurrent requests is a non-obvious efficiency lever: if two users send similar prompts, their prefill KV states are computed independently. CXL's shared memory pool makes cross-request KV reuse architecturally possible for the first time without expensive GPU-to-GPU transfers.

  2. Jun 2017