Vinit Vyas
WritingToolsTopicsAbout

Topics

production

2 posts

Feb 26, 2026·208 min read·foundation

LLM Inference: From Black Box to Production

A ground-up explanation of LLM inference, from black box to production optimizations. Covers tokenization, embeddings, attention, KV cache, memory bottlenecks, batching, PagedAttention, and quantization, using TinyLlama 1.1B as the running example.

Oct 30, 2025·93 min read·advanced

Backpropagation Part 3: Systems, Stability, Interpretability, Frontiers

Theory assumes infinite precision; hardware delivers float16. Bridge the gap between mathematical backprop and production systems. In this post, we cover a lot of "practical" ground from PyTorch's tape to mixed precision training, from numerical disasters to systematic testing, from gradient monitoring to interpretability. What breaks, why, and how to fix it.

2026
GitHubXLinkedIn