Vinit Vyas

Oct 27, 2025112 min readintermediate

Backpropagation Part 2: Patterns, Architectures, and Training

Every gradient rule, from convolutions to attention, follows one pattern: the vector-Jacobian product. See past the memorized formulas to the unifying abstraction, understand how residuals and normalization tame deep networks, and learn why modern architectures are really just careful gradient engineering.

vjp batch-normalization residual-connections attention rnns lstms initialization optimization backpropagation

Posts tagged “attention”

Backpropagation Part 2: Patterns, Architectures, and Training