Vinit Vyas
WritingToolsTopicsAbout

Topics

attention

2 posts

Feb 26, 2026·208 min read·foundation

LLM Inference: From Black Box to Production

A ground-up explanation of LLM inference, from black box to production optimizations. Covers tokenization, embeddings, attention, KV cache, memory bottlenecks, batching, PagedAttention, and quantization, using TinyLlama 1.1B as the running example.

Oct 27, 2025·112 min read·intermediate

Backpropagation Part 2: Patterns, Architectures, and Training

Every gradient rule, from convolutions to attention, follows one pattern: the vector-Jacobian product. See past the memorized formulas to the unifying abstraction, understand how residuals and normalization tame deep networks, and learn why modern architectures are really just careful gradient engineering.

2026
GitHubXLinkedIn