Vinit Vyas

Feb 26, 2026·208 min read·foundation

LLM Inference: From Black Box to Production

A ground-up explanation of LLM inference, from black box to production optimizations. Covers tokenization, embeddings, attention, KV cache, memory bottlenecks, batching, PagedAttention, and quantization, using TinyLlama 1.1B as the running example.