Mystery 1: Ensemble1

Mystery 2: Knowledge distillation 2

Mystery 3: sefl distillation3

F(1) F(1) and F(10) WideResNet-28-10 architecture shown, trained on the CIFAR-100 dataset. Text above the three seeds reads

This image has three parts. The top third shows Figure 1, F1 through F10 seeds. Read Figure 1 for reference. The middle third shows what happens if this process is adjusted to train a single model to


  1. Hagen, A. (2021, January 19). 3 deep learning mysteries: Ensemble, knowledge- and self-distillation. Microsoft Research. ↩︎

  2. Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the Knowledge in a Neural Network (arXiv:1503.02531). arXiv. ↩︎

  3. Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., & Ma, K. (2019). Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation (arXiv:1905.08094). arXiv. ↩︎