Owning a $5M data center
- comma.ai operates its own $5M data center in-office to handle model training, metrics, and data storage, avoiding the "cloud tax."
- The facility consumes approximately 450kW at peak; power costs in San Diego (over 40c/kWh) totaled over $540,000 in 2025.
- Cooling is achieved using pure outside air with dual 48” intake and exhaust fans, utilizing a PID loop to manage temperature and humidity.
- The compute cluster consists primarily of 600 GPUs across 75 "TinyBox Pro" machines built in-house for cost efficiency and easier repairability.
- Storage is handled by several racks of Dell R630/R730 servers with ~4PB of total SSD storage, favoring speed and random access over redundancy.
- The software stack is kept simple to ensure 99% uptime, utilizing Ubuntu (pxeboot), Salt for management, and "minikeyvalue" for distributed storage.
- By owning their hardware, comma.ai estimates they saved $20M+ compared to equivalent compute costs in a public cloud environment.
Hacker News Discussion
- Users discussed the spectrum of infrastructure, ranging from pure Cloud (low cap-ex, high op-ex) to colocation and on-prem (high cap-ex, high skill requirement).
- A primary concern raised was "brain drain"—on-prem setups can become "legacy debt" if the senior engineers who built the custom systems leave without documenting unwritten knowledge.
- Commenters noted that AWS and other cloud providers are incentivized to keep architectures complex (microservices, serverless) to increase billing, whereas on-prem encourages efficiency.
- There was a debate regarding "software freedom" and the "WhatsApp effect," where small, highly motivated teams can outperform massive corporations by using lean, self-hosted stacks.
- Some users highlighted that while AWS pricing is expected to rise due to hardware costs, the "Quality of Life" and managed services still justify the cost for many startups without comma's scale.