·110 min read·foundation
Gradient Descent: Theory, Mathematics, and Implementation
Why does walking downhill in parameter space solve everything from linear regression to GPT? A rigorous treatment of gradient descent: convergence theory, variants and the challenges of real-world optimization.