Floating Point: Designing a Number System from 32 Bits
Derive IEEE 754 float32 from a blank 32-bit register, then connect representation to arithmetic: alignment, rounding, ULPs, epsilon, and when real-number identities break.
Exploring the mathematical foundations and practical implementations of deep learning and the AI solutions space.
Derive IEEE 754 float32 from a blank 32-bit register, then connect representation to arithmetic: alignment, rounding, ULPs, epsilon, and when real-number identities break.
A ground-up explanation of LLM inference, from black box to production optimizations. Covers tokenization, embeddings, attention, KV cache, memory bottlenecks, batching, PagedAttention, and quantization, using TinyLlama 1.1B as the running example.
Theory assumes infinite precision; hardware delivers float16. Bridge the gap between mathematical backprop and production systems. In this post, we cover a lot of "practical" ground from PyTorch's tape to mixed precision training, from numerical disasters to systematic testing, from gradient monitoring to interpretability. What breaks, why, and how to fix it.
Every gradient rule, from convolutions to attention, follows one pattern: the vector-Jacobian product. See past the memorized formulas to the unifying abstraction, understand how residuals and normalization tame deep networks, and learn why modern architectures are really just careful gradient engineering.
Backprop computes a million gradients for the price of two forward passes. From computational graphs to adjoints, from chain rule to a working neural network, this is the algorithm that made deep learning possible; and is demystified here step by step.
Why does a 3-layer network solve problems a 1000-neuron single layer cannot? Understanding forward propagation, the exponential efficiency of depth, and how simple operations compose into hierarchical reasoning.
Why does walking downhill in parameter space solve everything from linear regression to GPT? A rigorous treatment of gradient descent: convergence theory, variants and the challenges of real-world optimization.
Understanding the fundamental building block of neural networks through intuition, mathematics, and implementation.
A narrative of the conditions, failed alternatives, and timing that made MCP the standard way to connect LLMs to real tools and data.