Model Distillation is a technique where a smaller “student” model is trained to reproduce the behavior of a larger “teacher” model by learning from its softened outputs or intermediate representations, enabling efficient inference while retaining much of the original model’s performance.

Sources: