Feb 16, 2026150 min readfoundation
LLM Inference: From Black Box to Production
A ground-up explanation of LLM inference, from black box to production optimizations. Covers tokenization, attention, KV cache, memory bottlenecks, batching, PagedAttention, quantization, and more. No code, just diagrams and TinyLlama as our running example.