Knowledge Distillation

Training a smaller 'student' model to mimic a larger 'teacher' model, creating more efficient models with similar performance.