10 Matching Annotations
  1. Jun 2026
    1. the lack of KV sharing across requests leads to redundant prefill computation and wasted memory.

      KV sharing across concurrent requests is a non-obvious efficiency lever: if two users send similar prompts, their prefill KV states are computed independently. CXL's shared memory pool makes cross-request KV reuse architecturally possible for the first time without expensive GPU-to-GPU transfers.

  2. Mar 2025
  3. Feb 2025
  4. Jan 2022
  5. Jul 2020
  6. Feb 2018
  7. Jun 2017
  8. May 2017