Step through the motivation for pipeline parallelism — from single-GPU overflow to tensor parallel limits to the depth-axis split that makes large models trainable.
Navigate through 4 steps to see how a 32-layer model that can't fit on one GPU gets partitioned across the depth axis.