As model sizes and the number of embeddings grow, vector databases become more memory-intensive and harder to scale efficiently using only DRAM or GPU memory. Keeping vectors in slower storage tiers increases retrieval latency and limits throughput.
Vector DB memory pressure is a sleeper problem in RAG deployments: billion-scale embedding indices require terabytes of memory that neither GPU VRAM nor DRAM can economically provide. CXL memory's terabyte-scale capacity at near-DRAM latency could be the missing tier that makes in-memory vector search viable at enterprise scale.