Mystery 1: Ensemble1

Mystery 2: Knowledge distillation 2

Mystery 3: sefl distillation3

F(1) F(1) and F(10) WideResNet-28-10 architecture shown, trained on the CIFAR-100 dataset. Text above the three seeds reads

This image has three parts. The top third shows Figure 1, F1 through F10 seeds. Read Figure 1 for reference. The middle third shows what happens if this process is adjusted to train a single model to

Reference:


  1. Hagen, A. (2021, January 19). 3 deep learning mysteries: Ensemble, knowledge- and self-distillation. Microsoft Research. https://www.microsoft.com/en-us/research/blog/three-mysteries-in-deep-learning-ensemble-knowledge-distillation-and-self-distillation/ ↩︎

  2. Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the Knowledge in a Neural Network (arXiv:1503.02531). arXiv. https://doi.org/10.48550/arXiv.1503.02531 ↩︎

  3. Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., & Ma, K. (2019). Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation (arXiv:1905.08094). arXiv. https://doi.org/10.48550/arXiv.1905.08094 ↩︎