Vinit Vyas
WritingToolsTopicsAbout

Topics

memory

1 post

Feb 26, 2026·208 min read·foundation

LLM Inference: From Black Box to Production

A ground-up explanation of LLM inference, from black box to production optimizations. Covers tokenization, embeddings, attention, KV cache, memory bottlenecks, batching, PagedAttention, and quantization, using TinyLlama 1.1B as the running example.

2026
GitHubXLinkedIn