LLM Inference: From Black Box to Production
A ground-up explanation of LLM inference, from black box to production optimizations. Covers tokenization, attention, KV cache, memory bottlenecks, batching, PagedAttention, quantization, and more. No code, just diagrams and TinyLlama as our running example.