Vinit Vyas
WritingToolsTopicsAbout

Topics

batch-normalization

1 post

Oct 27, 2025·112 min read·intermediate

Backpropagation Part 2: Patterns, Architectures, and Training

Every gradient rule, from convolutions to attention, follows one pattern: the vector-Jacobian product. See past the memorized formulas to the unifying abstraction, understand how residuals and normalization tame deep networks, and learn why modern architectures are really just careful gradient engineering.

2026
GitHubXLinkedIn